Skip to content

Multi-WAN Failover with Recursive Routing

RouterOS supports several multi-WAN strategies ranging from simple distance-based failover to advanced recursive routing with load balancing. This guide covers the full spectrum: recursive next-hop resolution, Netwatch-driven failover scripts, Per Connection Classifier (PCC) Mangle marking, and ECMP load balancing — including how to combine them.

StrategyUse CaseFailover SpeedComplexity
Distance-based + check-gatewaySimple primary/backup~30 secondsLow
Recursive anchor routingMany routes failing over together~30 secondsMedium
Netwatch + distance scriptingDetect upstream failures, not just gatewayConfigurableMedium
ECMP load balancingDistribute load across WANsN/A (active/active)Low
PCC Mangle + per-table routingSession-aware load balancingNew sessions onlyHigh

Examples in this guide use the following assumptions:

LAN (bridge, 192.168.1.0/24)
MikroTik Router
├── ether1 → ISP1, gateway 203.0.113.1 (WAN1 — primary)
└── ether2 → ISP2, gateway 198.51.100.1 (WAN2 — backup / secondary)

:::info Diagram See failover-wan-backup for a visual topology diagram. :::


In RouterOS, a route’s gateway can be an IP address that is not directly on a connected subnet. To forward traffic, the router looks up that gateway address in its own routing table to find a directly-reachable path. This is recursive resolution.

Packet destined for 10.1.0.0/16
→ Route: dst=10.1.0.0/16 gateway=172.16.0.1 (not directly connected)
→ Route: dst=172.16.0.0/24 gateway=192.168.1.1 via ether1 (directly connected)
→ FIB entry: forward out ether1 to 192.168.1.1

RouterOS installs the final resolved result into the Forwarding Information Base (FIB) for full-speed forwarding. Verify resolution with:

# Show routes with resolution state
/ip/route print detail
# Show only active (resolved) routes
/ip/route print where active=yes
# Show the next-hop cache
/routing/nexthop/print

A route whose gateway cannot be resolved is inactive and silently drops matching traffic. Always verify routes are active after configuration changes.

scope and target-scope (0–255) control which routes RouterOS uses to resolve a recursive next-hop, preventing resolution loops.

Route typescopetarget-scope
Connected (interface) route100
Static route (default)3010

Resolution rule: Route B resolves Route A’s gateway when B.scope ≤ A.target-scope.

With defaults, connected routes resolve static routes (10 ≤ 10 ✓), but static routes cannot resolve other static routes (30 ≤ 10 ✗).

To chain static routes, raise target-scope on the route being resolved:

# Route A: resolved via a static route (raise target-scope to allow it)
/ip/route add dst-address=10.1.0.0/16 gateway=172.16.0.1 target-scope=30
# Route B: intermediate static route (scope=30 satisfies A's target-scope=30)
/ip/route add dst-address=172.16.0.0/24 gateway=192.168.1.1

Recursive anchor routing lets you fail over many routes simultaneously using a single /32 host route as a “master switch”. When the anchor route’s gateway probe fails, every route that resolves through it goes inactive at once.

Use case: 50 customer VPN routes that must all fail over together when WAN1 goes down.

# Step 1: Anchor routes — /32 host routes with check-gateway
# scope=10 allows dependent routes (target-scope=20) to resolve through them
/ip/route add dst-address=203.0.113.1/32 gateway=203.0.113.1%ether1 \
check-gateway=ping distance=1 scope=10 target-scope=20
/ip/route add dst-address=198.51.100.1/32 gateway=198.51.100.1%ether2 \
check-gateway=ping distance=1 scope=10 target-scope=20
# Step 2: Default routes resolve through the anchors (scope=20, target-scope default=10)
# Note: set scope=20 so anchor can resolve them (anchor.scope=10 ≤ route.target-scope=20)
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 scope=20
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 scope=20
# Step 3: VPN routes also resolve through the same anchors
/ip/route add dst-address=10.10.0.0/16 gateway=203.0.113.1 distance=1 scope=20
/ip/route add dst-address=10.10.0.0/16 gateway=198.51.100.1 distance=10 scope=20

When the WAN1 anchor goes inactive (3 consecutive ping failures, ~30 seconds), all routes with gateway=203.0.113.1 that resolved through it also go inactive. WAN2 routes at distance=10 activate automatically.


For simple primary/backup failover without anchor routing:

# Primary route — actively probed every 10s
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping
# Backup route — activates automatically if primary fails
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10

check-gateway options:

ValueProbe methodWhen to use
pingICMP echoPPPoE, static IP connections
arpARP requestWhen ICMP is filtered; CPE is directly attached
(none)No probingRoute stays active unless interface goes down

After 3 consecutive failures (~30 seconds) the route is marked inactive. RouterOS automatically restores it when probes succeed again.

:::warning Gateway reachability vs. internet reachability check-gateway=ping tests only that the gateway IP responds. If your ISP’s gateway is always reachable but its upstream is broken, failover will not trigger. Use Netwatch in that case to test an external IP. :::


Netwatch monitors any host and runs scripts on state changes. Use it when you need to test an external IP (not just the gateway) to detect upstream failures.

# Monitor an external IP for WAN1 reachability
/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \
up-script="/ip/route set [find gateway=203.0.113.1] distance=1" \
down-script="/ip/route set [find gateway=203.0.113.1] distance=20"
# Primary route (distance=1 when WAN1 healthy; Netwatch raises to 20 on failure)
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1
# Backup route (distance=10 — always wins when WAN1 distance=20)
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10

For cleaner intent (route is explicitly absent rather than out-ranked):

/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \
up-script={
:log info "WAN1 upstream UP"
/ip/route enable [find comment="wan1-default"]
} \
down-script={
:log warning "WAN1 upstream DOWN - failing over"
/ip/route disable [find comment="wan1-default"]
}
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 comment="wan1-default"
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 comment="wan2-default"

Both interfaces need masquerade rules so return traffic is correctly NATed regardless of which link is active:

/interface/list add name=WAN
/interface/list/member add interface=ether1 list=WAN
/interface/list/member add interface=ether2 list=WAN
/ip/firewall/nat add chain=srcnat out-interface-list=WAN action=masquerade

ECMP (Equal-Cost Multi-Path) distributes traffic across multiple WANs simultaneously by installing multiple default routes at the same distance. RouterOS hashes flows to gateways using source/destination addresses (and optionally ports).

# Both gateways at distance=1 — RouterOS creates an ECMP group
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=1 check-gateway=ping

Configure how RouterOS distributes flows across gateways:

# Options: l3 (src+dst IP), l4 (src+dst IP + ports), l3-inner (inner headers for tunnels)
/ip/settings set ecmp-hash-policy=l4

l4 hashing distributes individual connections more evenly than l3 when many flows share the same IP pair.

With check-gateway=ping on both routes, RouterOS removes a gateway from the ECMP group when its probe fails. Remaining gateways continue forwarding. New flows are hashed only across active gateways; existing flows on the failed path must re-establish.

# Verify ECMP group — both routes should show as active
/ip/route print where dst-address=0.0.0.0/0

Expected output (both healthy):

Flags: A - ACTIVE, S - STATIC, E - ECMP
# DST-ADDRESS GATEWAY DISTANCE
0 ASE 0.0.0.0/0 203.0.113.1 1
1 ASE 0.0.0.0/0 198.51.100.1 1

:::note ECMP and NAT ECMP performs per-flow (not per-connection) distribution in the routing table. Masquerade on out-interface-list=WAN handles NAT correctly for whichever interface each flow uses. :::


PCC Mangle — Session-Aware Load Balancing

Section titled “PCC Mangle — Session-Aware Load Balancing”

Pure ECMP distributes at the routing level, which can split the same TCP session across gateways on asymmetric paths (e.g., when the router is also handling the return path). PCC (Per Connection Classifier) avoids this by pinning each connection to a specific WAN using Mangle marks.

PCC hashes connection tuples (src IP + dst IP + ports) and assigns a remainder. For two WANs, flows with remainder 0 go to WAN1, remainder 1 to WAN2. Once marked, the connection stays on the same WAN for its lifetime.

# 1) Create routing tables for policy routing
/routing/table
add name=to_wan1 fib
add name=to_wan2 fib
# 2) Per-table default routes (for marked traffic)
/ip/route
add dst-address=0.0.0.0/0 gateway=203.0.113.1 routing-table=to_wan1 distance=1 check-gateway=ping
add dst-address=0.0.0.0/0 gateway=198.51.100.1 routing-table=to_wan2 distance=1 check-gateway=ping
# 3) Main table routes (for unmarked traffic and fallback)
/ip/route
add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping
add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 check-gateway=ping
# 4) Mangle: mark incoming connections from LAN using PCC (2-way split)
/ip/firewall/mangle
add chain=prerouting in-interface=bridge connection-state=new \
dst-address-type=!local \
per-connection-classifier=both-addresses-and-ports:2/0 \
action=mark-connection new-connection-mark=wan1_conn passthrough=yes
add chain=prerouting in-interface=bridge connection-state=new \
dst-address-type=!local \
per-connection-classifier=both-addresses-and-ports:2/1 \
action=mark-connection new-connection-mark=wan2_conn passthrough=yes
# 5) Apply routing marks for established connections
add chain=prerouting in-interface=bridge \
connection-mark=wan1_conn \
action=mark-routing new-routing-mark=to_wan1 passthrough=no
add chain=prerouting in-interface=bridge \
connection-mark=wan2_conn \
action=mark-routing new-routing-mark=to_wan2 passthrough=no
# 6) Router-originated traffic (e.g., DNS, updates)
add chain=output connection-mark=wan1_conn \
action=mark-routing new-routing-mark=to_wan1 passthrough=no
add chain=output connection-mark=wan2_conn \
action=mark-routing new-routing-mark=to_wan2 passthrough=no

The rule order matters. The PCC classification (mark-connection) must happen before the routing mark assignment (mark-routing):

  1. PCC classifier marks new connections → wan1_conn or wan2_conn
  2. Routing mark rules read the connection mark → set to_wan1 or to_wan2
  3. RouterOS uses the routing mark to look up the correct per-table route

When a WAN fails and its route becomes inactive via check-gateway:

  • New flows: PCC still assigns connections to both WANs, but flows assigned to the failed WAN will route via the main table (fallback to the other WAN, since the per-table route is inactive).
  • Existing flows: Sessions already pinned to the failed WAN break and must re-establish.

For smoother failover, add Netwatch to rebalance active connections by flushing connection tracking on the failed WAN:

/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \
down-script={
:log warning "WAN1 down - clearing wan1 connections"
/ip/firewall/connection remove [find connection-mark=wan1_conn]
}

ScenarioRecommended Approach
Simple primary/backup, single officeDistance-based + check-gateway=ping
Primary/backup with upstream failures (ISP CPE always up)Netwatch to external IP
50+ routes that must fail over atomicallyRecursive anchor routing
Maximize aggregate WAN bandwidth, stateless trafficECMP
Session-persistent load balancing, stateful trafficPCC Mangle
Load balancing with failoverPCC Mangle + main-table fallback routes

Route is inactive — gateway not resolved

Section titled “Route is inactive — gateway not resolved”
# Check if gateway address has a route
/ip/route print where dst-address=203.0.113.1/32
# Inspect next-hop cache
/routing/nexthop/print
# Flush cache to force re-resolution
/routing/nexthop/flush
  1. Verify check-gateway is set on the primary route:
    /ip/route print detail where dst-address=0.0.0.0/0
  2. Confirm ping to gateway works from the router:
    /ping 203.0.113.1 count=5
  3. Check that the backup route has a higher distance and is active:
    /ip/route print where dst-address=0.0.0.0/0

Recursive route not resolving (scope mismatch)

Section titled “Recursive route not resolving (scope mismatch)”

If a route shows as inactive despite the gateway being reachable:

# Inspect scope/target-scope values
/ip/route print detail where gateway=203.0.113.1

Fix — raise target-scope on the route with the unresolvable gateway so that the resolving route’s scope satisfies B.scope ≤ A.target-scope:

/ip/route set [find dst-address=10.10.0.0/16] target-scope=30

Check that both PCC rules are active and the denominator/remainder values are correct:

/ip/firewall/mangle print where action=mark-connection

Verify connection marks are being assigned:

/ip/firewall/connection print where connection-mark=wan1_conn count-only
/ip/firewall/connection print where connection-mark=wan2_conn count-only

Add confirmation delay in Netwatch to avoid reacting to transient failures:

/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \
down-script={
:delay 10
:if ([/ping 8.8.8.8 count=3] = 0) do={
:log error "Confirmed WAN1 failure"
/ip/route set [find gateway=203.0.113.1] distance=20
}
}

  • Failover (WAN Backup) — Basic failover and check-gateway reference
  • Mangle — Packet marking, PCC, connection marks
  • Bonding — Link-level redundancy and aggregation
  • VRRP — Router-level redundancy for LAN gateway failover