Multi-WAN Failover with Recursive Routing

RouterOS supports several multi-WAN strategies ranging from simple distance-based failover to advanced recursive routing with load balancing. This guide covers the full spectrum: recursive next-hop resolution, Netwatch-driven failover scripts, Per Connection Classifier (PCC) Mangle marking, and ECMP load balancing — including how to combine them.

Summary

Strategy	Use Case	Failover Speed	Complexity
Distance-based + check-gateway	Simple primary/backup	~30 seconds	Low
Recursive anchor routing	Many routes failing over together	~30 seconds	Medium
Netwatch + distance scripting	Detect upstream failures, not just gateway	Configurable	Medium
ECMP load balancing	Distribute load across WANs	N/A (active/active)	Low
PCC Mangle + per-table routing	Session-aware load balancing	New sessions only	High

Network Topology

Examples in this guide use the following assumptions:

LAN (bridge, 192.168.1.0/24)
  ↕
MikroTik Router
  ├── ether1 → ISP1, gateway 203.0.113.1  (WAN1 — primary)
  └── ether2 → ISP2, gateway 198.51.100.1 (WAN2 — backup / secondary)

:::info Diagram See failover-wan-backup for a visual topology diagram. :::

Recursive Routing

How Recursive Next-Hop Resolution Works

In RouterOS, a route’s gateway can be an IP address that is not directly on a connected subnet. To forward traffic, the router looks up that gateway address in its own routing table to find a directly-reachable path. This is recursive resolution.

Packet destined for 10.1.0.0/16
  → Route: dst=10.1.0.0/16  gateway=172.16.0.1   (not directly connected)
      → Route: dst=172.16.0.0/24 gateway=192.168.1.1  via ether1  (directly connected)
          → FIB entry: forward out ether1 to 192.168.1.1

RouterOS installs the final resolved result into the Forwarding Information Base (FIB) for full-speed forwarding. Verify resolution with:

# Show routes with resolution state
/ip/route print detail

# Show only active (resolved) routes
/ip/route print where active=yes

# Show the next-hop cache
/routing/nexthop/print

A route whose gateway cannot be resolved is inactive and silently drops matching traffic. Always verify routes are active after configuration changes.

scope and target-scope

scope and target-scope (0–255) control which routes RouterOS uses to resolve a recursive next-hop, preventing resolution loops.

Route type	scope	target-scope
Connected (interface) route	10	0
Static route (default)	30	10

Resolution rule: Route B resolves Route A’s gateway when B.scope ≤ A.target-scope.

With defaults, connected routes resolve static routes (10 ≤ 10 ✓), but static routes cannot resolve other static routes (30 ≤ 10 ✗).

To chain static routes, raise target-scope on the route being resolved:

# Route A: resolved via a static route (raise target-scope to allow it)
/ip/route add dst-address=10.1.0.0/16 gateway=172.16.0.1 target-scope=30

# Route B: intermediate static route (scope=30 satisfies A's target-scope=30)
/ip/route add dst-address=172.16.0.0/24 gateway=192.168.1.1

Recursive Anchor Routing for Failover

Recursive anchor routing lets you fail over many routes simultaneously using a single /32 host route as a “master switch”. When the anchor route’s gateway probe fails, every route that resolves through it goes inactive at once.

Use case: 50 customer VPN routes that must all fail over together when WAN1 goes down.

# Step 1: Anchor routes — /32 host routes with check-gateway
# scope=10 allows dependent routes (target-scope=20) to resolve through them
/ip/route add dst-address=203.0.113.1/32 gateway=203.0.113.1%ether1 \
    check-gateway=ping distance=1 scope=10 target-scope=20
/ip/route add dst-address=198.51.100.1/32 gateway=198.51.100.1%ether2 \
    check-gateway=ping distance=1 scope=10 target-scope=20

# Step 2: Default routes resolve through the anchors (scope=20, target-scope default=10)
# Note: set scope=20 so anchor can resolve them (anchor.scope=10 ≤ route.target-scope=20)
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 scope=20
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 scope=20

# Step 3: VPN routes also resolve through the same anchors
/ip/route add dst-address=10.10.0.0/16 gateway=203.0.113.1 distance=1 scope=20
/ip/route add dst-address=10.10.0.0/16 gateway=198.51.100.1 distance=10 scope=20

When the WAN1 anchor goes inactive (3 consecutive ping failures, ~30 seconds), all routes with gateway=203.0.113.1 that resolved through it also go inactive. WAN2 routes at distance=10 activate automatically.

Failover with check-gateway

For simple primary/backup failover without anchor routing:

# Primary route — actively probed every 10s
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping

# Backup route — activates automatically if primary fails
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10

check-gateway options:

Value	Probe method	When to use
`ping`	ICMP echo	PPPoE, static IP connections
`arp`	ARP request	When ICMP is filtered; CPE is directly attached
(none)	No probing	Route stays active unless interface goes down

After 3 consecutive failures (~30 seconds) the route is marked inactive. RouterOS automatically restores it when probes succeed again.

:::warning Gateway reachability vs. internet reachability check-gateway=ping tests only that the gateway IP responds. If your ISP’s gateway is always reachable but its upstream is broken, failover will not trigger. Use Netwatch in that case to test an external IP. :::

Failover with Netwatch Scripts

Netwatch monitors any host and runs scripts on state changes. Use it when you need to test an external IP (not just the gateway) to detect upstream failures.

Distance-toggling Netwatch

# Monitor an external IP for WAN1 reachability
/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \
    up-script="/ip/route set [find gateway=203.0.113.1] distance=1" \
    down-script="/ip/route set [find gateway=203.0.113.1] distance=20"

# Primary route (distance=1 when WAN1 healthy; Netwatch raises to 20 on failure)
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1

# Backup route (distance=10 — always wins when WAN1 distance=20)
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10

Disable/Enable Route Approach

For cleaner intent (route is explicitly absent rather than out-ranked):

/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \
    up-script={
        :log info "WAN1 upstream UP"
        /ip/route enable [find comment="wan1-default"]
    } \
    down-script={
        :log warning "WAN1 upstream DOWN - failing over"
        /ip/route disable [find comment="wan1-default"]
    }

/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 comment="wan1-default"
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 comment="wan2-default"

NAT for Dual-WAN

Both interfaces need masquerade rules so return traffic is correctly NATed regardless of which link is active:

/interface/list add name=WAN
/interface/list/member add interface=ether1 list=WAN
/interface/list/member add interface=ether2 list=WAN

/ip/firewall/nat add chain=srcnat out-interface-list=WAN action=masquerade

ECMP Load Balancing

ECMP (Equal-Cost Multi-Path) distributes traffic across multiple WANs simultaneously by installing multiple default routes at the same distance. RouterOS hashes flows to gateways using source/destination addresses (and optionally ports).

# Both gateways at distance=1 — RouterOS creates an ECMP group
/ip/route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping
/ip/route add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=1 check-gateway=ping

ECMP Hash Policy

Configure how RouterOS distributes flows across gateways:

# Options: l3 (src+dst IP), l4 (src+dst IP + ports), l3-inner (inner headers for tunnels)
/ip/settings set ecmp-hash-policy=l4

l4 hashing distributes individual connections more evenly than l3 when many flows share the same IP pair.

ECMP Failover Behavior

With check-gateway=ping on both routes, RouterOS removes a gateway from the ECMP group when its probe fails. Remaining gateways continue forwarding. New flows are hashed only across active gateways; existing flows on the failed path must re-establish.

# Verify ECMP group — both routes should show as active
/ip/route print where dst-address=0.0.0.0/0

Expected output (both healthy):

Flags: A - ACTIVE, S - STATIC, E - ECMP
 #    DST-ADDRESS   GATEWAY        DISTANCE
 0  ASE 0.0.0.0/0  203.0.113.1    1
 1  ASE 0.0.0.0/0  198.51.100.1   1

:::note ECMP and NAT ECMP performs per-flow (not per-connection) distribution in the routing table. Masquerade on out-interface-list=WAN handles NAT correctly for whichever interface each flow uses. :::

PCC Mangle — Session-Aware Load Balancing

Pure ECMP distributes at the routing level, which can split the same TCP session across gateways on asymmetric paths (e.g., when the router is also handling the return path). PCC (Per Connection Classifier) avoids this by pinning each connection to a specific WAN using Mangle marks.

How PCC Works

PCC hashes connection tuples (src IP + dst IP + ports) and assigns a remainder. For two WANs, flows with remainder 0 go to WAN1, remainder 1 to WAN2. Once marked, the connection stays on the same WAN for its lifetime.

Configuration

# 1) Create routing tables for policy routing
/routing/table
add name=to_wan1 fib
add name=to_wan2 fib

# 2) Per-table default routes (for marked traffic)
/ip/route
add dst-address=0.0.0.0/0 gateway=203.0.113.1  routing-table=to_wan1 distance=1 check-gateway=ping
add dst-address=0.0.0.0/0 gateway=198.51.100.1 routing-table=to_wan2 distance=1 check-gateway=ping

# 3) Main table routes (for unmarked traffic and fallback)
/ip/route
add dst-address=0.0.0.0/0 gateway=203.0.113.1  distance=1  check-gateway=ping
add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 check-gateway=ping

# 4) Mangle: mark incoming connections from LAN using PCC (2-way split)
/ip/firewall/mangle
add chain=prerouting in-interface=bridge connection-state=new \
    dst-address-type=!local \
    per-connection-classifier=both-addresses-and-ports:2/0 \
    action=mark-connection new-connection-mark=wan1_conn passthrough=yes

add chain=prerouting in-interface=bridge connection-state=new \
    dst-address-type=!local \
    per-connection-classifier=both-addresses-and-ports:2/1 \
    action=mark-connection new-connection-mark=wan2_conn passthrough=yes

# 5) Apply routing marks for established connections
add chain=prerouting in-interface=bridge \
    connection-mark=wan1_conn \
    action=mark-routing new-routing-mark=to_wan1 passthrough=no

add chain=prerouting in-interface=bridge \
    connection-mark=wan2_conn \
    action=mark-routing new-routing-mark=to_wan2 passthrough=no

# 6) Router-originated traffic (e.g., DNS, updates)
add chain=output connection-mark=wan1_conn \
    action=mark-routing new-routing-mark=to_wan1 passthrough=no
add chain=output connection-mark=wan2_conn \
    action=mark-routing new-routing-mark=to_wan2 passthrough=no

Mangle Rule Order

The rule order matters. The PCC classification (mark-connection) must happen before the routing mark assignment (mark-routing):

PCC classifier marks new connections → wan1_conn or wan2_conn
Routing mark rules read the connection mark → set to_wan1 or to_wan2
RouterOS uses the routing mark to look up the correct per-table route

PCC Failover Behavior

When a WAN fails and its route becomes inactive via check-gateway:

New flows: PCC still assigns connections to both WANs, but flows assigned to the failed WAN will route via the main table (fallback to the other WAN, since the per-table route is inactive).
Existing flows: Sessions already pinned to the failed WAN break and must re-establish.

For smoother failover, add Netwatch to rebalance active connections by flushing connection tracking on the failed WAN:

/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \
    down-script={
        :log warning "WAN1 down - clearing wan1 connections"
        /ip/firewall/connection remove [find connection-mark=wan1_conn]
    }

Comparison: When to Use Each Approach

Scenario	Recommended Approach
Simple primary/backup, single office	Distance-based + `check-gateway=ping`
Primary/backup with upstream failures (ISP CPE always up)	Netwatch to external IP
50+ routes that must fail over atomically	Recursive anchor routing
Maximize aggregate WAN bandwidth, stateless traffic	ECMP
Session-persistent load balancing, stateful traffic	PCC Mangle
Load balancing with failover	PCC Mangle + main-table fallback routes

Troubleshooting

Route is inactive — gateway not resolved

# Check if gateway address has a route
/ip/route print where dst-address=203.0.113.1/32

# Inspect next-hop cache
/routing/nexthop/print

# Flush cache to force re-resolution
/routing/nexthop/flush

Failover not triggering

Verify check-gateway is set on the primary route:

/ip/route print detail where dst-address=0.0.0.0/0

Confirm ping to gateway works from the router:
```
/ping 203.0.113.1 count=5
```
Check that the backup route has a higher distance and is active:
```
/ip/route print where dst-address=0.0.0.0/0
```

Recursive route not resolving (scope mismatch)

If a route shows as inactive despite the gateway being reachable:

# Inspect scope/target-scope values
/ip/route print detail where gateway=203.0.113.1

Fix — raise target-scope on the route with the unresolvable gateway so that the resolving route’s scope satisfies B.scope ≤ A.target-scope:

/ip/route set [find dst-address=10.10.0.0/16] target-scope=30

PCC traffic all going to one WAN

Check that both PCC rules are active and the denominator/remainder values are correct:

/ip/firewall/mangle print where action=mark-connection

Verify connection marks are being assigned:

/ip/firewall/connection print where connection-mark=wan1_conn count-only
/ip/firewall/connection print where connection-mark=wan2_conn count-only

Flapping during brief outages

Add confirmation delay in Netwatch to avoid reacting to transient failures:

/tool/netwatch add host=8.8.8.8 interval=10s timeout=3s \
    down-script={
        :delay 10
        :if ([/ping 8.8.8.8 count=3] = 0) do={
            :log error "Confirmed WAN1 failure"
            /ip/route set [find gateway=203.0.113.1] distance=20
        }
    }

Failover (WAN Backup) — Basic failover and check-gateway reference
Mangle — Packet marking, PCC, connection marks
Bonding — Link-level redundancy and aggregation
VRRP — Router-level redundancy for LAN gateway failover

Multi-WAN Failover with Recursive Routing

Multi-WAN Failover with Recursive Routing

Summary

Network Topology

Recursive Routing

How Recursive Next-Hop Resolution Works

scope and target-scope

Recursive Anchor Routing for Failover

Failover with check-gateway

Failover with Netwatch Scripts

Distance-toggling Netwatch

Disable/Enable Route Approach

NAT for Dual-WAN

ECMP Load Balancing

ECMP Hash Policy

ECMP Failover Behavior

PCC Mangle — Session-Aware Load Balancing

How PCC Works

Configuration

Mangle Rule Order

PCC Failover Behavior

Comparison: When to Use Each Approach

Troubleshooting

Route is inactive — gateway not resolved

Failover not triggering

Recursive route not resolving (scope mismatch)

PCC traffic all going to one WAN

Flapping during brief outages

Related Topics