Multi-WAN Failover with Recursive Routing
Multi-WAN Failover with Recursive Routing
Section titled “Multi-WAN Failover with Recursive Routing”RouterOS supports several multi-WAN strategies ranging from simple distance-based failover to advanced recursive routing with load balancing. This guide covers the full spectrum: recursive next-hop resolution, Netwatch-driven failover scripts, Per Connection Classifier (PCC) Mangle marking, and ECMP load balancing — including how to combine them.
Summary
Section titled “Summary”| Strategy | Use Case | Failover Speed | Complexity |
|---|---|---|---|
| Distance-based + check-gateway | Simple primary/backup | ~30 seconds | Low |
| Recursive anchor routing | Many routes failing over together | ~30 seconds | Medium |
| Netwatch + distance scripting | Detect upstream failures, not just gateway | Configurable | Medium |
| ECMP load balancing | Distribute load across WANs | N/A (active/active) | Low |
| PCC Mangle + per-table routing | Session-aware load balancing | New sessions only | High |
Network Topology
Section titled “Network Topology”Examples in this guide use the following assumptions:
LAN (bridge, 192.168.1.0/24) ↕MikroTik Router ├── ether1 → ISP1, gateway 203.0.113.1 (WAN1 — primary) └── ether2 → ISP2, gateway 198.51.100.1 (WAN2 — backup / secondary):::info Diagram See failover-wan-backup for a visual topology diagram. :::
Recursive Routing
Section titled “Recursive Routing”How Recursive Next-Hop Resolution Works
Section titled “How Recursive Next-Hop Resolution Works”In RouterOS, a route’s gateway can be an IP address that is not directly on a connected subnet. To forward traffic, the router looks up that gateway address in its own routing table to find a directly-reachable path. This is recursive resolution.
Packet destined for 10.1.0.0/16 → Route: dst=10.1.0.0/16 gateway=172.16.0.1 (not directly connected) → Route: dst=172.16.0.0/24 gateway=192.168.1.1 via ether1 (directly connected) → FIB entry: forward out ether1 to 192.168.1.1RouterOS installs the final resolved result into the Forwarding Information Base (FIB) for full-speed forwarding. Verify resolution with:
# Show routes with resolution state/ip/route print detail
# Show only active (resolved) routes/ip/route print where active=yes
# Show the next-hop cache/routing/nexthop/printA route whose gateway cannot be resolved is inactive and silently drops matching traffic. Always verify routes are active after configuration changes.
scope and target-scope
Section titled “scope and target-scope”scope and target-scope (0–255) control which routes RouterOS uses to resolve a recursive next-hop, preventing resolution loops.
| Route type | scope | target-scope |
|---|---|---|
| Connected (interface) route | 10 | 0 |
| Static route (default) | 30 | 10 |
Resolution rule: Route B resolves Route A’s gateway when B.scope ≤ A.target-scope.
With defaults, connected routes resolve static routes (10 ≤ 10 ✓), but static routes cannot resolve other static routes (30 ≤ 10 ✗).
To chain static routes, raise target-scope on the route being resolved:
# Route A: resolved via a static route (raise target-scope to allow it)/ip/route add dst-address=10.1.0.0/16 gateway=172.16.0.1 target-scope=30
# Route B: intermediate static route (scope=30 satisfies A's target-scope=30)/ip/route add dst-address=172.16.0.0/24 gateway=192.168.1.1Recursive Anchor Routing for Failover
Section titled “Recursive Anchor Routing for Failover”Recursive anchor routing lets you fail over many routes simultaneously using a single /32 host route as a “master switch”. When the anchor route’s gateway probe fails, every route that resolves through it goes inactive at once.
Use case: 50 customer VPN routes that must all fail over together when WAN1 goes down.
# Step 1: Anchor routes — /32 host routes with check-gateway# scope=10 allows dependent routes (target-scope=20) to resolve through them/ip/route add dst-address=203.0.113.1/32 gateway=203.0.113.1%ether1 \ check-gateway=ping distance=1 scope=10 target-scope=20/ip/route add dst-address=198.51.100.1/32 gateway=198.51.100.1%ether2 \ check-gateway=ping distance=1 scope=10 target-scope=20
# Step 2: Default routes resolve through the anchors (scope=20, target-scope default=10)# Note: set scope=20 so anchor can resolve them (anchor.scope=10 ≤ route.target-scope=20)/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 scope=20/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 scope=20
# Step 3: VPN routes also resolve through the same anchors/ip/route add dst-address=10.10.0.0/16 gateway=203.0.113.1 distance=1 scope=20/ip/route add dst-address=10.10.0.0/16 gateway=198.51.100.1 distance=10 scope=20When the WAN1 anchor goes inactive (3 consecutive ping failures, ~30 seconds), all routes with gateway=203.0.113.1 that resolved through it also go inactive. WAN2 routes at distance=10 activate automatically.
Failover with check-gateway
Section titled “Failover with check-gateway”For simple primary/backup failover without anchor routing:
# Primary route — actively probed every 10s/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping
# Backup route — activates automatically if primary fails/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10check-gateway options:
| Value | Probe method | When to use |
|---|---|---|
ping | ICMP echo | PPPoE, static IP connections |
arp | ARP request | When ICMP is filtered; CPE is directly attached |
| (none) | No probing | Route stays active unless interface goes down |
After 3 consecutive failures (~30 seconds) the route is marked inactive. RouterOS automatically restores it when probes succeed again.
:::warning Gateway reachability vs. internet reachability
check-gateway=ping tests only that the gateway IP responds. If your ISP’s gateway is always reachable but its upstream is broken, failover will not trigger. Use Netwatch in that case to test an external IP.
:::
Failover with Netwatch Scripts
Section titled “Failover with Netwatch Scripts”Netwatch monitors any host and runs scripts on state changes. Use it when you need to test an external IP (not just the gateway) to detect upstream failures.
Distance-toggling Netwatch
Section titled “Distance-toggling Netwatch”# Monitor an external IP for WAN1 reachability/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \ up-script="/ip/route set [find gateway=203.0.113.1] distance=1" \ down-script="/ip/route set [find gateway=203.0.113.1] distance=20"
# Primary route (distance=1 when WAN1 healthy; Netwatch raises to 20 on failure)/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1
# Backup route (distance=10 — always wins when WAN1 distance=20)/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10Disable/Enable Route Approach
Section titled “Disable/Enable Route Approach”For cleaner intent (route is explicitly absent rather than out-ranked):
/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \ up-script={ :log info "WAN1 upstream UP" /ip/route enable [find comment="wan1-default"] } \ down-script={ :log warning "WAN1 upstream DOWN - failing over" /ip/route disable [find comment="wan1-default"] }
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 comment="wan1-default"/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 comment="wan2-default"NAT for Dual-WAN
Section titled “NAT for Dual-WAN”Both interfaces need masquerade rules so return traffic is correctly NATed regardless of which link is active:
/interface/list add name=WAN/interface/list/member add interface=ether1 list=WAN/interface/list/member add interface=ether2 list=WAN
/ip/firewall/nat add chain=srcnat out-interface-list=WAN action=masqueradeECMP Load Balancing
Section titled “ECMP Load Balancing”ECMP (Equal-Cost Multi-Path) distributes traffic across multiple WANs simultaneously by installing multiple default routes at the same distance. RouterOS hashes flows to gateways using source/destination addresses (and optionally ports).
# Both gateways at distance=1 — RouterOS creates an ECMP group/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=1 check-gateway=pingECMP Hash Policy
Section titled “ECMP Hash Policy”Configure how RouterOS distributes flows across gateways:
# Options: l3 (src+dst IP), l4 (src+dst IP + ports), l3-inner (inner headers for tunnels)/ip/settings set ecmp-hash-policy=l4l4 hashing distributes individual connections more evenly than l3 when many flows share the same IP pair.
ECMP Failover Behavior
Section titled “ECMP Failover Behavior”With check-gateway=ping on both routes, RouterOS removes a gateway from the ECMP group when its probe fails. Remaining gateways continue forwarding. New flows are hashed only across active gateways; existing flows on the failed path must re-establish.
# Verify ECMP group — both routes should show as active/ip/route print where dst-address=0.0.0.0/0Expected output (both healthy):
Flags: A - ACTIVE, S - STATIC, E - ECMP # DST-ADDRESS GATEWAY DISTANCE 0 ASE 0.0.0.0/0 203.0.113.1 1 1 ASE 0.0.0.0/0 198.51.100.1 1:::note ECMP and NAT
ECMP performs per-flow (not per-connection) distribution in the routing table. Masquerade on out-interface-list=WAN handles NAT correctly for whichever interface each flow uses.
:::
PCC Mangle — Session-Aware Load Balancing
Section titled “PCC Mangle — Session-Aware Load Balancing”Pure ECMP distributes at the routing level, which can split the same TCP session across gateways on asymmetric paths (e.g., when the router is also handling the return path). PCC (Per Connection Classifier) avoids this by pinning each connection to a specific WAN using Mangle marks.
How PCC Works
Section titled “How PCC Works”PCC hashes connection tuples (src IP + dst IP + ports) and assigns a remainder. For two WANs, flows with remainder 0 go to WAN1, remainder 1 to WAN2. Once marked, the connection stays on the same WAN for its lifetime.
Configuration
Section titled “Configuration”# 1) Create routing tables for policy routing/routing/tableadd name=to_wan1 fibadd name=to_wan2 fib
# 2) Per-table default routes (for marked traffic)/ip/routeadd dst-address=0.0.0.0/0 gateway=203.0.113.1 routing-table=to_wan1 distance=1 check-gateway=pingadd dst-address=0.0.0.0/0 gateway=198.51.100.1 routing-table=to_wan2 distance=1 check-gateway=ping
# 3) Main table routes (for unmarked traffic and fallback)/ip/routeadd dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=pingadd dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 check-gateway=ping
# 4) Mangle: mark incoming connections from LAN using PCC (2-way split)/ip/firewall/mangleadd chain=prerouting in-interface=bridge connection-state=new \ dst-address-type=!local \ per-connection-classifier=both-addresses-and-ports:2/0 \ action=mark-connection new-connection-mark=wan1_conn passthrough=yes
add chain=prerouting in-interface=bridge connection-state=new \ dst-address-type=!local \ per-connection-classifier=both-addresses-and-ports:2/1 \ action=mark-connection new-connection-mark=wan2_conn passthrough=yes
# 5) Apply routing marks for established connectionsadd chain=prerouting in-interface=bridge \ connection-mark=wan1_conn \ action=mark-routing new-routing-mark=to_wan1 passthrough=no
add chain=prerouting in-interface=bridge \ connection-mark=wan2_conn \ action=mark-routing new-routing-mark=to_wan2 passthrough=no
# 6) Router-originated traffic (e.g., DNS, updates)add chain=output connection-mark=wan1_conn \ action=mark-routing new-routing-mark=to_wan1 passthrough=noadd chain=output connection-mark=wan2_conn \ action=mark-routing new-routing-mark=to_wan2 passthrough=noMangle Rule Order
Section titled “Mangle Rule Order”The rule order matters. The PCC classification (mark-connection) must happen before the routing mark assignment (mark-routing):
- PCC classifier marks new connections →
wan1_connorwan2_conn - Routing mark rules read the connection mark → set
to_wan1orto_wan2 - RouterOS uses the routing mark to look up the correct per-table route
PCC Failover Behavior
Section titled “PCC Failover Behavior”When a WAN fails and its route becomes inactive via check-gateway:
- New flows: PCC still assigns connections to both WANs, but flows assigned to the failed WAN will route via the main table (fallback to the other WAN, since the per-table route is inactive).
- Existing flows: Sessions already pinned to the failed WAN break and must re-establish.
For smoother failover, add Netwatch to rebalance active connections by flushing connection tracking on the failed WAN:
/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \ down-script={ :log warning "WAN1 down - clearing wan1 connections" /ip/firewall/connection remove [find connection-mark=wan1_conn] }Comparison: When to Use Each Approach
Section titled “Comparison: When to Use Each Approach”| Scenario | Recommended Approach |
|---|---|
| Simple primary/backup, single office | Distance-based + check-gateway=ping |
| Primary/backup with upstream failures (ISP CPE always up) | Netwatch to external IP |
| 50+ routes that must fail over atomically | Recursive anchor routing |
| Maximize aggregate WAN bandwidth, stateless traffic | ECMP |
| Session-persistent load balancing, stateful traffic | PCC Mangle |
| Load balancing with failover | PCC Mangle + main-table fallback routes |
Troubleshooting
Section titled “Troubleshooting”Route is inactive — gateway not resolved
Section titled “Route is inactive — gateway not resolved”# Check if gateway address has a route/ip/route print where dst-address=203.0.113.1/32
# Inspect next-hop cache/routing/nexthop/print
# Flush cache to force re-resolution/routing/nexthop/flushFailover not triggering
Section titled “Failover not triggering”- Verify
check-gatewayis set on the primary route:/ip/route print detail where dst-address=0.0.0.0/0 - Confirm ping to gateway works from the router:
/ping 203.0.113.1 count=5
- Check that the backup route has a higher distance and is active:
/ip/route print where dst-address=0.0.0.0/0
Recursive route not resolving (scope mismatch)
Section titled “Recursive route not resolving (scope mismatch)”If a route shows as inactive despite the gateway being reachable:
# Inspect scope/target-scope values/ip/route print detail where gateway=203.0.113.1Fix — raise target-scope on the route with the unresolvable gateway so that the resolving route’s scope satisfies B.scope ≤ A.target-scope:
/ip/route set [find dst-address=10.10.0.0/16] target-scope=30PCC traffic all going to one WAN
Section titled “PCC traffic all going to one WAN”Check that both PCC rules are active and the denominator/remainder values are correct:
/ip/firewall/mangle print where action=mark-connectionVerify connection marks are being assigned:
/ip/firewall/connection print where connection-mark=wan1_conn count-only/ip/firewall/connection print where connection-mark=wan2_conn count-onlyFlapping during brief outages
Section titled “Flapping during brief outages”Add confirmation delay in Netwatch to avoid reacting to transient failures:
/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \ down-script={ :delay 10 :if ([/ping 8.8.8.8 count=3] = 0) do={ :log error "Confirmed WAN1 failure" /ip/route set [find gateway=203.0.113.1] distance=20 } }Related Topics
Section titled “Related Topics”- Failover (WAN Backup) — Basic failover and check-gateway reference
- Mangle — Packet marking, PCC, connection marks
- Bonding — Link-level redundancy and aggregation
- VRRP — Router-level redundancy for LAN gateway failover