BGP BFD and Graceful Restart

RouterOS provides two complementary mechanisms to improve BGP reliability: Bidirectional Forwarding Detection (BFD) for detecting unplanned failures in milliseconds, and Graceful Restart (GR) for preserving forwarding state during planned restarts. Used together, they significantly reduce the impact of BGP events on traffic forwarding.

Overview

Standard BGP failure detection relies on the hold timer — if a router stops receiving keepalives, it declares the session down after the hold timer expires (default 3 minutes). This is intentionally conservative to avoid false positives on slow or congested links, but it means that undetected hardware or link failures can black-hole traffic for minutes.

BFD and Graceful Restart address two distinct scenarios:

Scenario	Mechanism	Outcome
Unplanned link or hardware failure	BFD	Session detected down in <1 second; traffic rerouted immediately
Planned BGP process restart (software upgrade, config reload)	Graceful Restart	Forwarding preserved using stale routes while BGP reconverges

Both are optional and independently configurable per BGP peer.

BFD for Fast BGP Failure Detection

How BGP BFD Works

When BFD is enabled on a BGP connection, RouterOS creates a BFD session to the peer address alongside the BGP TCP session. BFD sends lightweight UDP control packets at the configured interval (default 200ms). If the router misses packets for longer than the detection time (interval × multiplier), it immediately marks the BFD session down and notifies BGP — which resets the session without waiting for the hold timer.

This reduces failure detection from minutes to hundreds of milliseconds.

Enabling BFD on a BGP Connection

BFD is enabled per BGP connection using the use-bfd parameter on the BGP template, or bfd=yes directly on a connection:

# Enable BFD on a specific BGP connection
/routing/bgp/connection set [find name=ebgp-isp] use-bfd=yes

# Or set it when creating a new connection
/routing/bgp/connection add name=ebgp-isp \
    remote.address=203.0.113.1 \
    remote.as=65000 \
    use-bfd=yes

For iBGP peers where you want consistent BFD behavior across all peers, configure it on a BGP template:

# Create a template with BFD enabled
/routing/bgp/template add name=ibgp-peers \
    hold-time=3m \
    use-bfd=yes

# Apply template to connections
/routing/bgp/connection add name=ibgp-rr1 \
    remote.address=10.0.0.2 \
    remote.as=65001 \
    templates=ibgp-peers

/routing/bgp/connection add name=ibgp-rr2 \
    remote.address=10.0.0.3 \
    remote.as=65001 \
    templates=ibgp-peers

BFD Timer Configuration

BFD timers are configured under /routing/bfd/peer or via a BFD template. When BGP integration is enabled, RouterOS automatically creates a BFD peer for the BGP neighbor address — you can then adjust the timers for that peer.

Default timers:

Transmit interval: 200ms
Minimum receive interval: 200ms
Detection multiplier: 3
Detection time: 600ms (200ms × 3)

Tuning for faster detection:

# View auto-created BFD peer for a BGP neighbor
/routing/bfd/peer/print detail

# Adjust timers for a specific BGP peer
/routing/bfd/peer/set [find remote-address=203.0.113.1] \
    interval=100ms \
    min-rx=100ms \
    multiplier=3

With interval=100ms and multiplier=3, detection time is 300ms — suitable for data center or metro-Ethernet links with low jitter. For WAN links with variable latency, use longer intervals to avoid false positives:

# Conservative WAN settings
/routing/bfd/peer/set [find remote-address=203.0.113.1] \
    interval=500ms \
    min-rx=500ms \
    multiplier=3

Detection time here is 1.5 seconds — still much faster than the default 3-minute hold timer.

Using BFD templates for consistent per-link-type settings:

# Fast template for LAN/DC links
/routing/bfd/template add name=fast-lan \
    interval=100ms \
    min-rx=100ms \
    multiplier=3

# Conservative template for WAN links
/routing/bfd/template add name=wan-stable \
    interval=500ms \
    min-rx=500ms \
    multiplier=5

# Apply to peers
/routing/bfd/peer/set [find remote-address=10.0.0.2] template=fast-lan
/routing/bfd/peer/set [find remote-address=203.0.113.1] template=wan-stable

Multi-Hop BFD

By default, BFD expects single-hop peers (directly connected). For iBGP peers using loopback addresses, or eBGP multihop sessions, configure multi-hop BFD:

# Multi-hop BGP peer via loopback
/routing/bgp/connection add name=ibgp-loopback \
    remote.address=10.255.0.1 \
    remote.as=65001 \
    multihop=yes \
    use-bfd=yes

# Adjust BFD peer for multi-hop
/routing/bfd/peer/set [find remote-address=10.255.0.1] \
    multihop=yes \
    interval=300ms \
    min-rx=300ms

Multi-hop BFD uses UDP port 4784 instead of 3784. Use longer intervals to accommodate additional hop latency.

Verifying BFD Status

When BFD is working correctly, the BGP session output will show the BFD state:

# View BGP sessions — BFD state shown when enabled
/routing/bgp/session print detail

When BFD detects a failure before the hold timer expires, the session output includes a note:

 0 E ;;; BFD session down
     name="ebgp-isp-1"
     remote.address=203.0.113.1 .as=65000
     hold-time=infinity use-bfd=yes uptime=3s

Check BFD session state directly:

# View all BFD sessions
/routing/bfd/peer/print detail

# Monitor BFD session in real time
/routing/bfd/peer/monitor [find remote-address=203.0.113.1]

# View BFD session statistics
/routing/bfd/session/print stats

BGP Graceful Restart

What Graceful Restart Does

Standard BGP session teardown causes the peer to immediately withdraw all received routes, triggering a full reconvergence. This means even a brief planned event — software upgrade, routing daemon restart — removes traffic forwarding state from the network.

BGP Graceful Restart (RFC 4724) allows a restarting router to signal its intention to restart gracefully. The peer (acting as GR helper) continues forwarding traffic using stale routes until the session re-establishes, or until the restart timer expires. This preserves traffic forwarding across planned restarts.

Two roles in a Graceful Restart:

Role	Description
Initiator (restarting router)	Signals GR capability in OPEN message; requests that peers retain stale routes during restart
Helper (peer of restarting router)	Receives GR notification; retains stale routes for the duration of the restart timer instead of withdrawing them

RouterOS supports both roles simultaneously — it can act as initiator when it restarts, and as helper when its peers restart.

Enabling Graceful Restart

Graceful Restart is enabled per BGP connection:

# Enable graceful restart on a connection
/routing/bgp/connection set [find name=ebgp-isp] \
    graceful-restart=yes

# Or set during connection creation
/routing/bgp/connection add name=ebgp-isp \
    remote.address=203.0.113.1 \
    remote.as=65000 \
    graceful-restart=yes

Both peers should have Graceful Restart enabled for it to function. When only the helper has it configured, it can still act as a helper for a restarting peer that signals GR capability.

Graceful Restart Timers

The key timers governing Graceful Restart behavior:

Restart time — Advertised by the restarting router in the GR capability; tells the helper how long to retain stale routes. The helper will withdraw stale routes if the session does not re-establish within this period.

Stale routes timer — Controls how long the helper waits after session loss before purging stale forwarding entries.

# View current GR configuration
/routing/bgp/connection print detail where graceful-restart=yes

Graceful Restart with Planned Maintenance

When performing a planned BGP restart (e.g., RouterOS upgrade), the sequence is:

Router signals GR capability with a restart time during normal session establishment
At restart time, the TCP session drops (BGP NOTIFICATION or session timeout)
The helper retains stale routes for up to the advertised restart time
The restarting router re-establishes the BGP session
Once the session is re-established and EOR (End-of-RIB) is received, stale routes are replaced with fresh entries

To verify that GR capability was negotiated:

# Check session capabilities — look for "gr" in capabilities list
/routing/bgp/session print detail where state=established

Output showing GR negotiated:

     remote.capabilities=mp,rr,gr,as4

Combining BFD and Graceful Restart

BFD and Graceful Restart serve different purposes and can be used together:

BFD handles unplanned failures — it detects hardware or link failures fast and immediately resets the BGP session for rapid rerouting.
Graceful Restart handles planned restarts — it preserves forwarding state so that a controlled restart does not cause a traffic outage.

Using both simultaneously:

/routing/bgp/connection set [find name=core-peer] \
    use-bfd=yes \
    graceful-restart=yes

Important: During a graceful restart event, BFD should detect the link as still up (the forwarding path remains available, only the routing process restarted). If BFD is also detecting the session down, it may conflict with GR by resetting the session before re-establishment. If you experience this, increase BFD timers or disable BFD on peers where GR is the primary reliability mechanism.

Complete Configuration Example

ISP Edge Router: BFD + GR on eBGP Upstreams

# BGP instance
/routing/bgp/instance add name=main as=65001 router-id=10.255.0.1

# eBGP template for ISP uplinks
/routing/bgp/template add name=isp-uplinks \
    hold-time=90s \
    keepalive-time=30s \
    use-bfd=yes \
    graceful-restart=yes

# ISP A connection
/routing/bgp/connection add name=isp-a \
    remote.address=203.0.113.1 \
    remote.as=65000 \
    templates=isp-uplinks \
    nexthop-choice=force-self

# ISP B connection
/routing/bgp/connection add name=isp-b \
    remote.address=198.51.100.1 \
    remote.as=65100 \
    templates=isp-uplinks \
    nexthop-choice=force-self

# Tune BFD timers for WAN links
/routing/bfd/peer/set [find remote-address=203.0.113.1] \
    interval=300ms min-rx=300ms multiplier=3

/routing/bfd/peer/set [find remote-address=198.51.100.1] \
    interval=300ms min-rx=300ms multiplier=3

# Verify sessions
/routing/bgp/session print
/routing/bfd/peer/print

Data Center: Fast BFD on iBGP Spine Peers

# iBGP template with fast BFD for DC fabric
/routing/bgp/template add name=dc-spine \
    hold-time=infinity \
    use-bfd=yes \
    graceful-restart=yes

# Spine connections (loopback-to-loopback iBGP)
/routing/bgp/connection add name=spine-1 \
    remote.address=10.0.0.1 \
    remote.as=65001 \
    multihop=yes \
    templates=dc-spine

/routing/bgp/connection add name=spine-2 \
    remote.address=10.0.0.2 \
    remote.as=65001 \
    multihop=yes \
    templates=dc-spine

# Fast BFD for fabric links (multi-hop via loopback)
/routing/bfd/peer/set [find remote-address=10.0.0.1] \
    multihop=yes interval=100ms min-rx=100ms multiplier=3

/routing/bfd/peer/set [find remote-address=10.0.0.2] \
    multihop=yes interval=100ms min-rx=100ms multiplier=3

Troubleshooting

BFD Session Not Establishing

Check that UDP port 3784 (single-hop) or 4784 (multi-hop) is not blocked:

# Test reachability to BGP peer
/tool/ping address=203.0.113.1 count=5

# Check if BFD peer exists for the BGP neighbor
/routing/bfd/peer/print

# Confirm use-bfd is set on the BGP connection
/routing/bgp/connection print detail where name=ebgp-isp

BFD Flapping on WAN Links

Increase the detection interval to reduce sensitivity to transient jitter:

# Increase interval to tolerate higher latency variation
/routing/bfd/peer/set [find remote-address=203.0.113.1] \
    interval=500ms min-rx=500ms multiplier=5

Detection time becomes 2.5 seconds — still much faster than hold-timer expiry.

Graceful Restart Not Negotiated

Both sides must support GR capability. Verify by checking session capabilities:

# Look for "gr" in remote.capabilities
/routing/bgp/session print detail

If gr is absent from remote capabilities, the peer may not support GR or may have it disabled.

Stale Routes Not Cleared After Restart

If stale routes persist after a GR event, check that the restarting router sent an End-of-RIB marker after reconvergence:

# View session EOR status
/routing/bgp/session print detail where state=established

The eor="" field shows which address families have received EOR. If EOR is missing, stale routes will be cleared only when the restart timer expires.

BGP BFD and Graceful Restart

BGP BFD and Graceful Restart

Overview

BFD for Fast BGP Failure Detection

How BGP BFD Works

Enabling BFD on a BGP Connection

BFD Timer Configuration

Multi-Hop BFD

Verifying BFD Status

BGP Graceful Restart

What Graceful Restart Does

Enabling Graceful Restart

Graceful Restart Timers

Graceful Restart with Planned Maintenance

Combining BFD and Graceful Restart

Complete Configuration Example

ISP Edge Router: BFD + GR on eBGP Upstreams

Data Center: Fast BFD on iBGP Spine Peers

Troubleshooting

BFD Session Not Establishing

BFD Flapping on WAN Links

Graceful Restart Not Negotiated

Stale Routes Not Cleared After Restart

See Also