BGP BFD and Graceful Restart
BGP BFD and Graceful Restart
Section titled “BGP BFD and Graceful Restart”RouterOS provides two complementary mechanisms to improve BGP reliability: Bidirectional Forwarding Detection (BFD) for detecting unplanned failures in milliseconds, and Graceful Restart (GR) for preserving forwarding state during planned restarts. Used together, they significantly reduce the impact of BGP events on traffic forwarding.
Overview
Section titled “Overview”Standard BGP failure detection relies on the hold timer — if a router stops receiving keepalives, it declares the session down after the hold timer expires (default 3 minutes). This is intentionally conservative to avoid false positives on slow or congested links, but it means that undetected hardware or link failures can black-hole traffic for minutes.
BFD and Graceful Restart address two distinct scenarios:
| Scenario | Mechanism | Outcome |
|---|---|---|
| Unplanned link or hardware failure | BFD | Session detected down in <1 second; traffic rerouted immediately |
| Planned BGP process restart (software upgrade, config reload) | Graceful Restart | Forwarding preserved using stale routes while BGP reconverges |
Both are optional and independently configurable per BGP peer.
BFD for Fast BGP Failure Detection
Section titled “BFD for Fast BGP Failure Detection”How BGP BFD Works
Section titled “How BGP BFD Works”When BFD is enabled on a BGP connection, RouterOS creates a BFD session to the peer address alongside the BGP TCP session. BFD sends lightweight UDP control packets at the configured interval (default 200ms). If the router misses packets for longer than the detection time (interval × multiplier), it immediately marks the BFD session down and notifies BGP — which resets the session without waiting for the hold timer.
This reduces failure detection from minutes to hundreds of milliseconds.
Enabling BFD on a BGP Connection
Section titled “Enabling BFD on a BGP Connection”BFD is enabled per BGP connection using the use-bfd parameter on the BGP template, or bfd=yes directly on a connection:
# Enable BFD on a specific BGP connection/routing/bgp/connection set [find name=ebgp-isp] use-bfd=yes
# Or set it when creating a new connection/routing/bgp/connection add name=ebgp-isp \ remote.address=203.0.113.1 \ remote.as=65000 \ use-bfd=yesFor iBGP peers where you want consistent BFD behavior across all peers, configure it on a BGP template:
# Create a template with BFD enabled/routing/bgp/template add name=ibgp-peers \ hold-time=3m \ use-bfd=yes
# Apply template to connections/routing/bgp/connection add name=ibgp-rr1 \ remote.address=10.0.0.2 \ remote.as=65001 \ templates=ibgp-peers
/routing/bgp/connection add name=ibgp-rr2 \ remote.address=10.0.0.3 \ remote.as=65001 \ templates=ibgp-peersBFD Timer Configuration
Section titled “BFD Timer Configuration”BFD timers are configured under /routing/bfd/peer or via a BFD template. When BGP integration is enabled, RouterOS automatically creates a BFD peer for the BGP neighbor address — you can then adjust the timers for that peer.
Default timers:
- Transmit interval: 200ms
- Minimum receive interval: 200ms
- Detection multiplier: 3
- Detection time: 600ms (200ms × 3)
Tuning for faster detection:
# View auto-created BFD peer for a BGP neighbor/routing/bfd/peer/print detail
# Adjust timers for a specific BGP peer/routing/bfd/peer/set [find remote-address=203.0.113.1] \ interval=100ms \ min-rx=100ms \ multiplier=3With interval=100ms and multiplier=3, detection time is 300ms — suitable for data center or metro-Ethernet links with low jitter. For WAN links with variable latency, use longer intervals to avoid false positives:
# Conservative WAN settings/routing/bfd/peer/set [find remote-address=203.0.113.1] \ interval=500ms \ min-rx=500ms \ multiplier=3Detection time here is 1.5 seconds — still much faster than the default 3-minute hold timer.
Using BFD templates for consistent per-link-type settings:
# Fast template for LAN/DC links/routing/bfd/template add name=fast-lan \ interval=100ms \ min-rx=100ms \ multiplier=3
# Conservative template for WAN links/routing/bfd/template add name=wan-stable \ interval=500ms \ min-rx=500ms \ multiplier=5
# Apply to peers/routing/bfd/peer/set [find remote-address=10.0.0.2] template=fast-lan/routing/bfd/peer/set [find remote-address=203.0.113.1] template=wan-stableMulti-Hop BFD
Section titled “Multi-Hop BFD”By default, BFD expects single-hop peers (directly connected). For iBGP peers using loopback addresses, or eBGP multihop sessions, configure multi-hop BFD:
# Multi-hop BGP peer via loopback/routing/bgp/connection add name=ibgp-loopback \ remote.address=10.255.0.1 \ remote.as=65001 \ multihop=yes \ use-bfd=yes
# Adjust BFD peer for multi-hop/routing/bfd/peer/set [find remote-address=10.255.0.1] \ multihop=yes \ interval=300ms \ min-rx=300msMulti-hop BFD uses UDP port 4784 instead of 3784. Use longer intervals to accommodate additional hop latency.
Verifying BFD Status
Section titled “Verifying BFD Status”When BFD is working correctly, the BGP session output will show the BFD state:
# View BGP sessions — BFD state shown when enabled/routing/bgp/session print detailWhen BFD detects a failure before the hold timer expires, the session output includes a note:
0 E ;;; BFD session down name="ebgp-isp-1" remote.address=203.0.113.1 .as=65000 hold-time=infinity use-bfd=yes uptime=3sCheck BFD session state directly:
# View all BFD sessions/routing/bfd/peer/print detail
# Monitor BFD session in real time/routing/bfd/peer/monitor [find remote-address=203.0.113.1]
# View BFD session statistics/routing/bfd/session/print statsBGP Graceful Restart
Section titled “BGP Graceful Restart”What Graceful Restart Does
Section titled “What Graceful Restart Does”Standard BGP session teardown causes the peer to immediately withdraw all received routes, triggering a full reconvergence. This means even a brief planned event — software upgrade, routing daemon restart — removes traffic forwarding state from the network.
BGP Graceful Restart (RFC 4724) allows a restarting router to signal its intention to restart gracefully. The peer (acting as GR helper) continues forwarding traffic using stale routes until the session re-establishes, or until the restart timer expires. This preserves traffic forwarding across planned restarts.
Two roles in a Graceful Restart:
| Role | Description |
|---|---|
| Initiator (restarting router) | Signals GR capability in OPEN message; requests that peers retain stale routes during restart |
| Helper (peer of restarting router) | Receives GR notification; retains stale routes for the duration of the restart timer instead of withdrawing them |
RouterOS supports both roles simultaneously — it can act as initiator when it restarts, and as helper when its peers restart.
Enabling Graceful Restart
Section titled “Enabling Graceful Restart”Graceful Restart is enabled per BGP connection:
# Enable graceful restart on a connection/routing/bgp/connection set [find name=ebgp-isp] \ graceful-restart=yes
# Or set during connection creation/routing/bgp/connection add name=ebgp-isp \ remote.address=203.0.113.1 \ remote.as=65000 \ graceful-restart=yesBoth peers should have Graceful Restart enabled for it to function. When only the helper has it configured, it can still act as a helper for a restarting peer that signals GR capability.
Graceful Restart Timers
Section titled “Graceful Restart Timers”The key timers governing Graceful Restart behavior:
Restart time — Advertised by the restarting router in the GR capability; tells the helper how long to retain stale routes. The helper will withdraw stale routes if the session does not re-establish within this period.
Stale routes timer — Controls how long the helper waits after session loss before purging stale forwarding entries.
# View current GR configuration/routing/bgp/connection print detail where graceful-restart=yesGraceful Restart with Planned Maintenance
Section titled “Graceful Restart with Planned Maintenance”When performing a planned BGP restart (e.g., RouterOS upgrade), the sequence is:
- Router signals GR capability with a restart time during normal session establishment
- At restart time, the TCP session drops (BGP NOTIFICATION or session timeout)
- The helper retains stale routes for up to the advertised restart time
- The restarting router re-establishes the BGP session
- Once the session is re-established and EOR (End-of-RIB) is received, stale routes are replaced with fresh entries
To verify that GR capability was negotiated:
# Check session capabilities — look for "gr" in capabilities list/routing/bgp/session print detail where state=establishedOutput showing GR negotiated:
remote.capabilities=mp,rr,gr,as4Combining BFD and Graceful Restart
Section titled “Combining BFD and Graceful Restart”BFD and Graceful Restart serve different purposes and can be used together:
- BFD handles unplanned failures — it detects hardware or link failures fast and immediately resets the BGP session for rapid rerouting.
- Graceful Restart handles planned restarts — it preserves forwarding state so that a controlled restart does not cause a traffic outage.
Using both simultaneously:
/routing/bgp/connection set [find name=core-peer] \ use-bfd=yes \ graceful-restart=yesImportant: During a graceful restart event, BFD should detect the link as still up (the forwarding path remains available, only the routing process restarted). If BFD is also detecting the session down, it may conflict with GR by resetting the session before re-establishment. If you experience this, increase BFD timers or disable BFD on peers where GR is the primary reliability mechanism.
Complete Configuration Example
Section titled “Complete Configuration Example”ISP Edge Router: BFD + GR on eBGP Upstreams
Section titled “ISP Edge Router: BFD + GR on eBGP Upstreams”# BGP instance/routing/bgp/instance add name=main as=65001 router-id=10.255.0.1
# eBGP template for ISP uplinks/routing/bgp/template add name=isp-uplinks \ hold-time=90s \ keepalive-time=30s \ use-bfd=yes \ graceful-restart=yes
# ISP A connection/routing/bgp/connection add name=isp-a \ remote.address=203.0.113.1 \ remote.as=65000 \ templates=isp-uplinks \ nexthop-choice=force-self
# ISP B connection/routing/bgp/connection add name=isp-b \ remote.address=198.51.100.1 \ remote.as=65100 \ templates=isp-uplinks \ nexthop-choice=force-self
# Tune BFD timers for WAN links/routing/bfd/peer/set [find remote-address=203.0.113.1] \ interval=300ms min-rx=300ms multiplier=3
/routing/bfd/peer/set [find remote-address=198.51.100.1] \ interval=300ms min-rx=300ms multiplier=3
# Verify sessions/routing/bgp/session print/routing/bfd/peer/printData Center: Fast BFD on iBGP Spine Peers
Section titled “Data Center: Fast BFD on iBGP Spine Peers”# iBGP template with fast BFD for DC fabric/routing/bgp/template add name=dc-spine \ hold-time=infinity \ use-bfd=yes \ graceful-restart=yes
# Spine connections (loopback-to-loopback iBGP)/routing/bgp/connection add name=spine-1 \ remote.address=10.0.0.1 \ remote.as=65001 \ multihop=yes \ templates=dc-spine
/routing/bgp/connection add name=spine-2 \ remote.address=10.0.0.2 \ remote.as=65001 \ multihop=yes \ templates=dc-spine
# Fast BFD for fabric links (multi-hop via loopback)/routing/bfd/peer/set [find remote-address=10.0.0.1] \ multihop=yes interval=100ms min-rx=100ms multiplier=3
/routing/bfd/peer/set [find remote-address=10.0.0.2] \ multihop=yes interval=100ms min-rx=100ms multiplier=3Troubleshooting
Section titled “Troubleshooting”BFD Session Not Establishing
Section titled “BFD Session Not Establishing”Check that UDP port 3784 (single-hop) or 4784 (multi-hop) is not blocked:
# Test reachability to BGP peer/tool/ping address=203.0.113.1 count=5
# Check if BFD peer exists for the BGP neighbor/routing/bfd/peer/print
# Confirm use-bfd is set on the BGP connection/routing/bgp/connection print detail where name=ebgp-ispBFD Flapping on WAN Links
Section titled “BFD Flapping on WAN Links”Increase the detection interval to reduce sensitivity to transient jitter:
# Increase interval to tolerate higher latency variation/routing/bfd/peer/set [find remote-address=203.0.113.1] \ interval=500ms min-rx=500ms multiplier=5Detection time becomes 2.5 seconds — still much faster than hold-timer expiry.
Graceful Restart Not Negotiated
Section titled “Graceful Restart Not Negotiated”Both sides must support GR capability. Verify by checking session capabilities:
# Look for "gr" in remote.capabilities/routing/bgp/session print detailIf gr is absent from remote capabilities, the peer may not support GR or may have it disabled.
Stale Routes Not Cleared After Restart
Section titled “Stale Routes Not Cleared After Restart”If stale routes persist after a GR event, check that the restarting router sent an End-of-RIB marker after reconvergence:
# View session EOR status/routing/bgp/session print detail where state=establishedThe eor="" field shows which address families have received EOR. If EOR is missing, stale routes will be cleared only when the restart timer expires.
See Also
Section titled “See Also”- BGP Peering: eBGP and iBGP Configuration — Establishing BGP sessions and route advertisement
- BGP Route Filtering and Communities — Controlling which routes are exchanged
- BFD — Full BFD configuration reference including static routes and OSPF integration