Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BGP connection / session metrics (up, prefix_count, updates, withdraws) #94

Closed
frittentheke opened this issue Oct 31, 2023 · 29 comments

Comments

@frittentheke
Copy link

Thanks for this awesome exporter!

I just switched over from https://github.com/nshttpd/mikrotik-exporter/ as that once was not really compatible with RouterOS 7.x anymore. I am really happy with mktxp so far, but noticed that mktxp does not yet provide BGP session metrics like the mikrotik-exporter did. See https://github.com/nshttpd/mikrotik-exporter/blob/e1b06c6ebe6e71a5661326b3a33afe2fd741283d/collector/bgp_collector.go#L24

Here are some example of the metrics I had before:

mikrotik_bgp_up{address="10.1.2.3", asn="65009", instance="rt-01", job="mikrotik", name="rt-01", session="rt-02"}	1
mikrotik_bgp_prefix_count{address="10.1.2.3", asn="65009", instance="rt-01", job="mikrotik", name="rt-01", session="rt-02"}	669103
mikrotik_bgp_updates_received{address="10.1.2.3", asn="65009", instance="rt-01", job="mikrotik", name="rt-01", session="rt-02"}	25914343
mikrotik_bgp_updates_sent{address="10.1.2.3", asn="65009", instance="rt-01", job="mikrotik", name="rt-01", session="rt-02"}	16409310
mikrotik_bgp_withdrawn_received{address="10.1.2.3", asn="65009", instance="rt-01", job="mikrotik", name="rt-01", session="rt-02"}	6477261
mikrotik_bgp_withdrawn_sent{address="10.1.2.3", asn="65009", instance="rt-01", job="mikrotik", name="rt-01", session="rt-02"}	6297995

The BGP configuration on Mikrotik devices as changed quite a lot with RouterOS 7.x, thus the old exporter also did not provide them anymore. Would be awesome to have those metrics available again.

@endreszabo
Copy link

I'm quite sure that with ROS 7.x not all of those metrics can be extracted. Here is what I get with the REST API call for a session. The prefix count also seems to be underrun somehow for an added run.

{
  ".id": "*280002A",
  "ebgp": "",
  "established": "true",
  "hold-time": "3m",
  "input.filter": "s2s_import_v4",
  "input.last-notification": "ffffffffffffffffffffffffffffffff0015030600",
  "input.procid": "32",
  "keepalive-time": "1m",
  "last-started": "2023-11-01 00:18:14",
  "last-stopped": "2023-11-01 00:18:08",
  "local.address": "100.100.1.0",
  "local.as": "27152",
  "local.bytes": "1803479",
  "local.capabilities": "mp,rr,gr,as4",
  "local.eor": "",
  "local.id": "44.128.3.255",
  "local.messages": "13623",
  "multihop": "true",
  "name": "atvie1_vsh01_v4_in_v6-1",
  "output.filter-chain": "s2s_export_med121",
  "output.keep-sent-attributes": "true",
  "output.procid": "32",
  "prefix-count": "4294967221",
  "remote.address": "100.100.0.35",
  "remote.afi": "ip,ipv6",
  "remote.as": "27187",
  "remote.bytes": "2277993",
  "remote.capabilities": "mp,rr,gr,as4,err,llgr",
  "remote.eor": "ip",
  "remote.gr-time": "120",
  "remote.hold-time": "4m",
  "remote.id": "44.128.143.255",
  "remote.messages": "16967",
  "uptime": "1w9h52m40s110ms",
  "use-bfd": "true"
}

@frittentheke
Copy link
Author

frittentheke commented Nov 8, 2023

@endreszabo thanks for looking into this. Could you maybe disclose which API calls you made here and how?
I shall then provide some examples from what I see for my devices.

Edit:

I suppose you use the Python Example client at https://help.mikrotik.com/docs/display/ROS/API#API-Exampleclient ?
BGP on RouteOS 7 now has "sessions" and "connections". From what you got that seems to be detailed list of /routing/bgp/sessions

@timcole
Copy link

timcole commented Dec 11, 2023

Hello, I'm currently also in the process of moving away from nshttpd/mikrotik-exporter but information about our BGP sessions is important to us. I'm not versed in Python enough to contribute to code but if someone wants to tackle this and needs an output example happy to run whatever command is needed for them.

@akpw
Copy link
Owner

akpw commented Dec 11, 2023

Adding BGP collector/metrics is not a big deal by itself, the main problem is that I'm myself not doing much with BGP right now and so it'd be a bit awkward for me to test the functionality. Any ideas on a how to get over this? An accessible BGP endpoint in a test environment would be ideal, if someone would be willing to set it up for me

@timcole
Copy link

timcole commented Dec 11, 2023

Using dn42 as a BGP test bed can be an good way to play around with BGP configurations. However, setting up dn42 involves considerable manual labor and coordination.

Alternatively, a simpler method could just be peering two MikroTik devices (or CHR containers) together and announcing bogon addresses from one to the other.

Start by creating one host and making an address-list that will be used as the output network:

/ip firewall address-list
add address=0.0.0.0/8 list=bogon
add address=10.0.0.0/8 list=bogon
add address=100.64.0.0/10 list=bogon
add address=127.0.0.0/8 list=bogon
add address=169.254.0.0/16 list=bogon
add address=172.16.0.0/12 list=bogon
add address=192.0.0.0/24 list=bogon
add address=192.0.2.0/24 list=bogon
add address=192.168.0.0/16 list=bogon
add address=198.18.0.0/15 list=bogon
add address=198.51.100.0/24 list=bogon
add address=203.0.113.0/24 list=bogon
add address=224.0.0.0/3 list=bogon

On the same host, add routes to the routing table as blackholes since RouterOS requires routes to be present in the routing table before distribution:

:foreach item in=[/ip/firewall/address-list/find list=bogon] do={
  /ip/route/add dst-address=[/ip firewall address-list get $item address] blackhole
}

Create this side of the peer on the same host as the first two steps:

/routing/bgp/connection/add output.network=bogon name=bogon as=65532 local.role=ebgp local.address=CURRENT_DEVICES_IP remote.as=65533 remote.address=OTHER_ROUTER_IP

On the second host, create the other side of the peer

/routing/bgp/connection/add name=bogon as=65533 local.role=ebgp local.address=CURRENT_DEVICES_IP remote.as=65532 remote.address=OTHER_ROUTER_IP

Now, on the second host, you should see your session come online and prefixes get loaded into your RIB. Check using /routing/bgp/session/print detail.

@frittentheke
Copy link
Author

Adding BGP collector/metrics is not a big deal by itself, the main problem is that I'm myself not doing much with BGP right now and so it'd be a bit awkward for me to test the functionality. Any ideas on a how to get over this? An accessible BGP endpoint in a test environment would be ideal, if someone would be willing to set it up for me

@akpw I can't offer you public access to a router, but I could offer to test this on multiple CCRs running RouterOS 7 with BGP peerings (IPv4 and IPv6).

@timcole seems to have offered the same. But I like the simply test setup he described even more.

@akpw
Copy link
Owner

akpw commented Mar 1, 2024

@frittentheke OK so I put two CHRs on QEMU/KVM and set them up according to @timcole suggestions.
Here is a sample output from calling /routing/bgp/session:

{
   'id': '*2800001',
   'name': 'bogon-1',
   'remote.address': '10.10.100.56',
   'remote.as': '65533',
   'remote.id': '10.10.100.56',
   'remote.capabilities': 'mp,rr,gr,as4',
   'remote.messages': '54',
   'remote.bytes': '1026',
   'remote.eor': '',
   'local.address': '10.10.100.57',
   'local.as': '65532',
   'local.id': '10.10.100.57',
   'local.capabilities': 'mp,rr,gr,as4',
   'local.messages': '55',
   'local.bytes': '1109',
   'local.eor': '',
   'output.procid': '20',
   'output.network': 'bogon',
   'input.procid': '20',
   'ebgp': '',
   'hold-time': '3m',
   'keepalive-time': '1m',
   'uptime': '53m10s720ms',
   'last-started': '2024-03-01 13:13:32',
   'prefix-count': '0',
   'established': 'true'
 }

Would this kind of info be useful to what you guys need? Feel free to suggest what other related info would be of interest.

@frittentheke
Copy link
Author

frittentheke commented Mar 1, 2024

@frittentheke OK so I put two CHRs on QEMU/KVM and set them up according to @timcole suggestions.

That's awesome @akpw, thanks a lot for caring and then implementing this!

Here is a sample output from calling /routing/bgp/session:
[...]
Would this kind of info be useful to what you guys need? Feel free to suggest what other related info would be of interest.

Yes, definitely! Let me pick and choose a little .....

Something to identify an individual session:

'name': 'bogon-1',
'remote.as': '65533',
'local.as': '65532',

If only one label was extracted, name would be it as it is uniquely identified a session
(on a single instance / router). But having the AS numbers would be good.

Please also add the address family (IPv4 or IPv6):

remote.afi=ipv6
local.afi=ipv6
as labels as otherwise one only has the name to distinguish the two sessions with the same AS / remote.

Also having 'ebgp' or 'ibgp' reported is nice, but that could be extracted from the fact that the local and remote AS are the same.

The most valuable numeric metrics seem to be:

As counters:

'remote.messages': '54',
'remote.bytes': '1026',
'local.messages': '55',
'local.bytes': '1109','

As gauges:

'prefix-count': '0',
'established': 'true'

'uptime': '53m10s720ms',

(If there is any chance, please kindly normalize the uptime into somemetricname_seconds and export as gauge, because then it's easy to use as alert indication (session is up < 5 minutes) and compare it with e.g. router uptime to suppress one reboots.)

@timcole
Copy link

timcole commented Mar 1, 2024

If you're able to use the router-id aka local.id as a label that would be useful too since equal router-ids are also used to group peers into one instance. Which is used determine best path in the same instance when receiving the same routes from multiple peers.

But prefix-count and uptime is what I'm after the most. 🙏🏻

@akpw
Copy link
Owner

akpw commented Mar 3, 2024

@frittentheke @timcole OK, so this is the resulting mktxp output so far:

# HELP mktxp_bgp_sessions_info_info BGP sessions info
# TYPE mktxp_bgp_sessions_info_info gauge
mktxp_bgp_sessions_info_info{local.afi="",local.as="65532",name="bogon-1",remote.address="10.10.0.56",remote.afi="",remote.as="65533",routerboard_address="10.100.0.57",routerboard_name="MKT-Test"} 1.0

# HELP mktxp_bgp_remote_messages_total Number of remote messages
# TYPE mktxp_bgp_remote_messages_total counter
mktxp_bgp_remote_messages_total{local.afi="",local.as="65532",name="bogon-1",remote.address="10.10.0.56",remote.afi="",remote.as="65533",routerboard_address="10.100.0.57",routerboard_name="MKT-Test"} 26.0

# HELP mktxp_bgp_local_messages_total Number of local messages
# TYPE mktxp_bgp_local_messages_total counter
mktxp_bgp_local_messages_total{local.afi="",local.as="65532",name="bogon-1",remote.address="10.10.0.56",remote.afi="",remote.as="65533",routerboard_address="10.100.0.57",routerboard_name="MKT-Test"} 27.0

# HELP mktxp_bgp_remote_bytes_total Number of remote bytes
# TYPE mktxp_bgp_remote_bytes_total counter
mktxp_bgp_remote_bytes_total{local.afi="",local.as="65532",name="bogon-1",remote.address="10.10.0.56",remote.afi="",remote.as="65533",routerboard_address="10.100.0.57",routerboard_name="MKT-Test"} 494.0

# HELP mktxp_bgp_local_bytes_total Number of local bytes
# TYPE mktxp_bgp_local_bytes_total counter
mktxp_bgp_local_bytes_total{local.afi="",local.as="65532",name="bogon-1",remote.address="10.10.0.56",remote.afi="",remote.as="65533",routerboard_address="10.100.0.57",routerboard_name="MKT-Test"} 577.0

# HELP mktxp_bgp_prefix_count BGP prefix count
# TYPE mktxp_bgp_prefix_count gauge
mktxp_bgp_prefix_count{local.afi="",local.as="65532",name="bogon-1",remote.address="10.10.0.56",remote.afi="",remote.as="65533",routerboard_address="10.100.0.57",routerboard_name="MKT-Test"} 0.0

# HELP mktxp_bgp_established BGP established
# TYPE mktxp_bgp_established gauge
mktxp_bgp_established{local.afi="",local.as="65532",name="bogon-1",remote.address="10.10.0.56",remote.afi="",remote.as="65533",routerboard_address="10.100.0.57",routerboard_name="MKT-Test"} 1.0

# HELP mktxp_bgp_uptime BGP uptime in milliseconds
# TYPE mktxp_bgp_uptime gauge
mktxp_bgp_uptime{local.afi="",local.as="65532",name="bogon-1",remote.address="10.10.0.56",remote.afi="",remote.as="65533",routerboard_address="10.100.0.57",routerboard_name="MKT-Test"} 1.50265e+06

Let me know if anything needs further tuning, I will also add this to the repo for you guys to play with in a more realistic setup.

@akpw
Copy link
Owner

akpw commented Mar 3, 2024

also add this to the repo

done. Before testing, just activate in your ~/mktxp/mktxp.conf:

bgp = True                     # BGP sessions metrics

@timcole
Copy link

timcole commented Mar 3, 2024

Hey @akpw, Prometheus is throwing an error expected equal, got "." ("INVALID") while parsing: "mktxp_bgp_sessions_info_info{local." with bgp = True on 1.2.3 seems it does not like having periods in label names maybe we can replace the period with an underscore?

@akpw
Copy link
Owner

akpw commented Mar 3, 2024

@timcole interesting, I did expect problems with the '.' in label names but somehow it seemed to work fine for me.
Anyway not a big deal, will just need to add a translation

@akpw
Copy link
Owner

akpw commented Mar 3, 2024

should be fixed in the latest

@timcole
Copy link

timcole commented Mar 3, 2024

This is great; thank you!! 🎉

@akpw
Copy link
Owner

akpw commented Mar 5, 2024

@frittentheke anything else to do here, or shall we close?

@frittentheke
Copy link
Author

@frittentheke anything else to do here, or shall we close?

First of all let me thank you again for implementing this @akpw!
I just ran the new BGP module against our routers and all the metrics look fine!

But I am wondering though, what your idea behind mktxp_bgp_sessions_info_info is?
Usually those kind of info metrics (
https://prometheus.io/docs/practices/naming/#metric-names or https://www.robustperception.io/why-info-style-metrics-have-a-value-of-1/ ) are used to hold lots of additional labels to NOT have them on all metrics individually and to only join them (https://www.robustperception.io/left-joins-in-promql/) if needed via their minimal set of common unique labels, instance and name in this case.
This approach is usually used to allow for the simple addition of more info (labels). The approach of simply adding all of them to each and every time series, does not scale forever and adding new ones does create lots of time-series churn.

@akpw
Copy link
Owner

akpw commented Mar 5, 2024

But I am wondering though, what your idea behind

well, mainly your specification of "Something to identify an individual session:" :)

@frittentheke
Copy link
Author

But I am wondering though, what your idea behind
well, mainly your specification of "Something to identify an individual session:" :)

As unique identified the name is likely enough, so combined with instance to distinguish between routers this should be enough. But yes, all the other details are important and required to know what kind of session one is looking at.
I was more wondering, why you introduced an info metric, but then also placed all the labels also on the numeric metrics.

But in any case @akpw , the state of things is totally fine, but since you asked I just felt the urge to comment ;-)
Could you maybe make a release at some point? We quite like the pip package (https://pypi.org/project/mktxp/) which does not have the BGP metrics yet :-)

@akpw
Copy link
Owner

akpw commented Mar 6, 2024

the info metric gathers essential session labels not imposing high cardinality, can be extended if needed with additional labels of the same kind. The reason for including extra labels in the other metrics, as mentioned above, is due to your initial specification of what reliably constitutes a unique session ID. If the combination of session name and instance is sufficient for this, I can remove the extra ones in the next update

@frittentheke
Copy link
Author

If the combination of session name and instance is sufficient for this, I can remove the other ones in the next update

Let's hear what @timcole says. I mean it's convenient not having to join in some info metric, but is it also best-practice and future proof?

@timcole
Copy link

timcole commented Mar 6, 2024

Personally, I'm fine with it either way as long as I have the ability to group by remote AS.

In Grafana, I have a variable that uses label_values(mktxp_bgp_established,remote_as) and rows that repeat on as, so in the rows I can use things like mktxp_bgp_prefix_count{remote_as="$as"} to get our v4 and v6 sessions together.

@frittentheke
Copy link
Author

Whatever you decide @akpw I am fine either way.

May I gently nag you again about doing a release (to pypi)?

@akpw
Copy link
Owner

akpw commented Mar 7, 2024

Personally I'd leave the info metric as is and remove the extra labels from all other metrics. @timcole would that be OK, or would you rather prefer to have AS in all of them? This should be mostly about complexity of the grouping queries, not sure about potential impact on their performance etc.

@akpw
Copy link
Owner

akpw commented Mar 7, 2024

a release (to pypi)?

@frittentheke I planned to do a few more small things and then release at the end of this week, if this is good enough for you

@timcole
Copy link

timcole commented Mar 7, 2024

It should still be possible to do what I'm after joining info instead, so whatever you decide works for me 👍🏻

akpw added a commit that referenced this issue Mar 10, 2024
@akpw akpw closed this as completed Mar 13, 2024
@savitarMK
Copy link

Personalmente, estoy de acuerdo con esto siempre que tenga la capacidad de agrupar por AS remoto.

En Grafana, tengo una variable que usa label_values(mktxp_bgp_established,remote_as)filas que se repiten como, por lo que en las filas puedo usar cosas como mktxp_bgp_prefix_count{remote_as="$as"}para reunir nuestras sesiones v4 y v6.

Hello! How are you doing? I know that I already know this, but I wanted to know if it is possible that you could share the code of your dashboard that you show in your publication!

If so, I would appreciate it very much.

@lanrat
Copy link
Contributor

lanrat commented Sep 29, 2024

Personally, I'm fine with it either way as long as I have the ability to group by remote AS.

In Grafana, I have a variable that uses label_values(mktxp_bgp_established,remote_as) and rows that repeat on as, so in the rows I can use things like mktxp_bgp_prefix_count{remote_as="$as"} to get our v4 and v6 sessions together.

@timcole would you mind sharing your grafana dashboard with the mktxp BGP metrics? Yours looks very nice.

@timcole
Copy link

timcole commented Sep 29, 2024

@lanrat,

There are two issues with giving it to you. First, I never got around to updating that dashboard after the March 10 update, so it won’t work with the newest version. Second, we only have one site that uses a MikroTik device (we use Arista elsewhere, which is also why we haven't bothered to update the dashboard yet). It’s very specific to the peers’ naming convention, where v6 peers end with "-v6." But here’s the export: https://gist.github.com/timcole/6e90b45d17973714f5d1c1e6c5dab747. Feel free to do whatever you want with it, but no support will be provided—it's as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants