BGP Monitoring Overview

Prev Next

Border Gateway Protocol (BGP) is the routing protocol for the Internet. Its main role is to exchange reachability information with other BGP systems in order to efficiently direct traffic. BGP has two main applications: External Border Gateway Protocol (eBGP), used for routing between publicly accessible (internet) autonomous systems, and Internal Border Gateway Protocol (iBGP), used for routing between internal networks.

BGP monitoring is critical because it is a highly vulnerable protocol, subject to route leaks and hijacks. Misconfigured or malicious sources can propagate bogus routing information widely, causing ripple effects across the Internet. As use of the cloud to build apps and services has become more prevalent, it has become more important than ever to monitor BGP.

Catchpoint's BGP Monitor allows you to monitor your announced prefixes (eBGP) from hundreds of autonomous system (AS) peers, which share their full BGP tables with Catchpoint monitoring nodes located around the world.

Key Concepts

Autonomous System (AS) – a group of connected IP routing prefixes, which fall under the management of a single entity or organization.

Peer – an AS that shares its BGP table from a given geographical location. There can be multiple peers for the same AS in different geographical areas (e.g., Cogent in the UK and Cogent in the US).

Prefix – a subnet, representing a block of IPv4 or IPv6 addresses, that is being announced by an AS via BGP.

Route – a prefix, along with a path of AS numbers, which indicate the specific ASes that traffic must pass through in order to reach the announced address block. An IPv4 BGP prefix looks something like this: 206.24.14.0/24 701 1239 42.

Registry – There are five regional Internet registries (RIRs). These are organizations that manage the distribution and registration of Internet number resources within different world regions:

  • AFRINIC – The African Network Information Center, which serves Africa.
  • ARIN – The American Registry for Internet Numbers, which serves Antarctica, Canada, some of the Caribbean and the United States.
  • APNIC – The Asia-Pacific Network Information Centre, which serves East Asia, Oceania, South Asia, and Southeast Asia.
  • LACNIC – The Latin America and Caribbean Network Information Centre, which serves the rest of the Caribbean and the whole of Latin America.
  • RIPE NCC – The Réseaux IP Européens Network Coordination Centre, which serves Europe, Central Asia, Russia and West Asia.

How Catchpoint Collects BGP Data

Catchpoint collects BGP data from peers that share their data with the following organizations:

  • RIPE NCC's Routing Information Service (RIS) – a public data source for RIPE NCC, the European Internet Registrar.
  • Route Views Project – a public data source based out of the University of Oregon (the oldest running university project) with data published every 15 minutes.
  • Catchpoint – our private data source, through which Catchpoint has established its own BGP connections to derive full BGP tables in real-time.

The BGP monitor collects the following information for each monitored prefix:

  • AS – the name and autonomous system number (ASN) of the entity sharing the BGP tables.
  • Registry – the Internet registry the AS of the peer is registered with.
  • Geographical information:
    • Continent – the continent where the router sharing the BGP tables is located.
    • Country – the country where the router sharing the BGP tables is located.
  • Origin – the AS announcing reachability information for the prefix.
  • Neighbor – the neighbor to the origin AS, or the next in path.

Setup

To monitor a prefix, you first need to Create a Test and select the BGP test type. This test type simply requires the specification of a valid IPv4 prefix and an alert configuration.
image.png

BGP Alerts

The following table shows the alert types available for BGP monitoring.

Alert Type Description Peer Threshold Options Example
Test Failure An alert is triggered when the system cannot find a path for the given prefix
  • Runs – the system looks either at the first time there is a state change from it containing a path to no path, or at least once hourly (if the state has been no path within an hour)
  • Peer – the system looks at the number of unique peers that have no path for the given prefix
Availability: Test* Alert based on % availability of the prefix; i.e. the percentage of time in which there was a path to a particular prefix
  • Average Across Peers – it looks at the data across all peers for the time threshold
  • Peer – it looks at availability metrics on a per-peer basis, and you can specify how many peers must breach the trigger
Availability: % Downtime* Alert based on % downtime of the prefix, in other words the percentage of the time that there was NO path
  • Average Across Peers – it looks at the data across all peers for the time threshold
  • Peer – it looks at availability metrics on a per-peer basis, and you can specify how many peers must breach the trigger
AS Number: Origin AS* Alert if the origin of the prefix does not match, or matches a given list of ASNs
  • Runs – it looks at each individual event where the state of the patch has changed
  • Peer – it looks at each peer and the events where the state of the patch changes
AS Number: Origin Neighbor* Alert if the neighbors of the origin that the prefix is announced for does not match or does match a given list of ASNs
  • Runs – it looks at each individual event where the state of the patch has changed
  • Peer – it looks at each peer and the events where the state of the patch changes
Prefix monitored: 8.8.8.0/24
Alert is set to trigger if the neighbor is not AS 3356 or 6939
Peers provide routes showing answers containing:
  • 3356
  • 6939
  • 6461
In this case an alert would be triggered for 6461 as it does not match the alert trigger.

Example: AS Number/Prefix Mismatch

The system expects that the specified prefix is “specifically” announced. Any prefixes that are even more specific are considered a mismatch. This means that if the prefix is “/23”, we would expect an announcement for it, and any “/24” announcements would be considered a mismatch and trigger an alert.

  • Trigger:
    • Equals
    • Not Equal to
  • Peer Threshold:
    • Runs – it looks at each individual event where the state of the path changes
    • Peer – it looks at each peer and the events where the state of the path changes
  • Example:
    • Prefix monitored 8.8.8.0/23
    • Peers provide routes showing answers containing both 8.8.8.0/23 and 8.8.8.0/24 (a subnet of /23)
    • In this case, a “Prefix Mismatch” would be triggered.

Analysis

The BGP Monitor collects the following metrics, which can be viewed using Catchpoint's analysis tools:

  • # Announcements – the number of announcement events counted for the given prefix.
  • # Withdrawals – the number of withdrawal events counted for the given prefix.
  • Routing Events – the combined number of announcements and withdrawals counted for the prefix.
  • Availability – the % of time that we detected a path from the peer to the prefix.

BGP Overview Dashboard

The BGP Overview Dashboard gives a birds-eye view into the health of all the prefixes being monitored. It includes the following sections:

  • Summary – Cards displaying aggregated RKI Status, Reachabilty, Hijack, Neighboring Peers, and Prefixes Withdrawn data for all monitored prefixes.
  • BGP Origins – A tabular view of all origins, neighbors, and prefixes.
  • Tile View - Displays color-coded tiles representing each monitored prefix.
  • Map View - A world map with color-coding indicating any availability issues for your monitored prefixes in any specific country.

Overview - Demo Account Data 1.png{height="" width=""}
Overview - Demo Account Data 2.png

Coloring Rules:

  • Green: 100% availability
  • Orange: at least 90% but less than 100% availability
  • Red: less than 90% availability

BGP Smartboard

In a single view, the Smartboard surfaces key issues and trends for a given test. Each test has its own Smartboard. Utilizing the filters and widgets on the page, you can easily understand where performance is changing most, what is causing downtime, who is impacted, and which services are having issues.

image.png

Point Consumption

BGP tests work similarly to other synthetic tests, in that they consume points for usage. However, unlike other synthetic tests, which take frequency and runs into account for point consumption, a BGP test always consumes a fixed number of points per hour.

The reason for this difference is that, for BGP, there is no concept of frequency. Instead, we always monitor all data sources to gather routing events for each prefix.