Failover (WAN Backup)

Failover (WAN Backup) provides automatic network redundancy by switching to a backup internet connection when the primary connection fails. This is essential for businesses and deployments where network uptime is critical. RouterOS offers multiple approaches to failover, from simple distance-based routing to advanced recursive gateway detection, each with different trade-offs in reliability, speed, and complexity.

This guide covers the fundamental concepts of WAN failover, practical configurations ranging from basic to advanced, and addresses common issues discovered through community discussions that cause failover to fail when you need it most.

Summary

WAN failover addresses the single point of failure inherent in networks with a single internet connection. When the primary WAN link becomes unavailable, traffic automatically routes through the backup connection, maintaining network connectivity without manual intervention.

RouterOS failover typically uses one of three approaches:

Distance-based failover - Primary route has lower distance, backup has higher. When primary becomes unreachable, the backup route activates.
Check-gateway - Router actively monitors gateway reachability using ARP or ICMP ping. Routes are disabled when the gateway fails check-gateway.
Recursive routing - Uses indirect gateway monitoring through multiple hops, providing more reliable detection than direct gateway checks.

The appropriate method depends on network topology, ISP behavior, and required failover speed. This guide covers all three approaches with their advantages and limitations.

Introduction

Understanding how RouterOS evaluates routes is essential for configuring reliable failover. The router’s decision process follows a specific order that determines which route gets used under various conditions.

How RouterOS Selects Routes

When forwarding traffic, RouterOS selects routes based on the following priority:

Most specific destination match - The route with the longest prefix matching the destination wins.
Lowest distance - Among routes to the same destination, the one with lowest distance is preferred.
First route in routing table - If distances are equal, the route appearing first in the table wins.

This means failover works by manipulating route distances. When the primary route becomes invalid (gateway unreachable), RouterOS automatically uses the backup route with higher distance. However, this only works reliably when the router correctly detects that the primary gateway is unreachable.

The Gateway Detection Problem

The most common cause of failover failure is incorrect gateway detection. Many users configure what appears to be correct failover setup, only to find it doesn’t work when the primary link actually fails. Understanding why this happens is critical.

When using DHCP-based WAN connections (common with fiber and cable modems), the ISP’s DHCP server provides a default route. This route often points to a gateway that is actually the next-hop router on the ISP’s network, not the final internet gateway. If that intermediate router remains reachable even when the ISP’s upstream connection fails, the MikroTik sees the gateway as “reachable” and never activates the backup route.

This is why simply having two default routes with different distances doesn’t always work. The router must actively verify that traffic can actually reach the internet through the primary gateway, not just that the directly-connected gateway IP responds to ARP requests.

Setup Overview

The basic failover configuration involves three key components: primary and backup WAN interfaces, appropriately configured routes, and gateway health monitoring appropriate to your network topology.

Network Topology Example

This guide uses a common dual-WAN topology:

Internet <-- WAN1 (PPPoE/Ethernet) <-- Router <-- LAN
         <-- WAN2 (DHCP/LTE) <--------|

WAN1: Primary connection (ether1), typically PPPoE or static IP
WAN2: Backup connection (ether2), typically DHCP or LTE
LAN: Internal network (bridge or ether3+)

Configuration

Basic Failover (Distance-Based)

The simplest failover method uses route distances. The primary route has distance=1, and the backup route has distance=2. When the primary gateway becomes unreachable, RouterOS automatically uses the backup route.

# Configure primary WAN (WAN1) - assume already configured with IP
/ip address add address=203.0.113.2/30 interface=ether1

# Configure backup WAN (WAN2) - assume already configured with IP
/ip address add address=192.168.100.2/30 interface=ether2

# Add primary default route (lower distance = preferred)
/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1

# Add backup default route (higher distance = used when primary fails)
/ip route add dst-address=0.0.0.0/0 gateway=192.168.100.1 distance=2

Verify the routes:

/ip route print

Expected output:

Flags: X - disabled, I - inactive, D - dynamic, C - connect, S - static, r - rip, o - ospf, b - bgp, U - unreachable
 #      DST-ADDRESS        GATEWAY         DISTANCE
 0 S  0.0.0.0/0          203.0.113.1              1
 1 S  0.0.0.0/0          192.168.100.1            2

This configuration works when the primary gateway becomes completely unreachable (link down or gateway not responding to ARP). However, as discussed in the introduction, it may not detect failures where the gateway IP responds but internet connectivity is lost.

Check-Gateway

The check-gateway parameter adds active gateway monitoring. When enabled, RouterOS periodically tests gateway reachability and disables the route if the gateway fails the check. This provides more reliable failover than distance alone.

# Primary route with ping-based gateway check
/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping

# Backup route
/ip route add dst-address=0.0.0.0/0 gateway=192.168.100.1 distance=2

The check-gateway options are:

Option	How It Works	Pros	Cons
`ping`	ICMP echo request to gateway	Tests IP-level reachability	Can be blocked by firewall; may give false positives
`arp`	ARP request to gateway	Faster, lower overhead	May show gateway reachable even when internet is down

# View route status with gateway checks
/ip route print detail

When check-gateway detects failure, the route is marked as inactive:

Flags: X - disabled, I - inactive, D - dynamic, C - connect, S - static, r - rip, o - ospf, b - bgp, U - unreachable
 0 S  dst-address=0.0.0.0/0 gateway=203.0.113.1 gateway-status=203.0.113.1 unreachable check-gateway=ping distance=1 scope=30 target-scope=10

Improving Detection Reliability

The default check-gateway interval is 10 seconds. For faster failover detection, use the gateway-timeout parameter:

# Faster check-gateway detection using recursive routing (see next section)
# or use ARP mode which has lower overhead than ICMP ping
/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 \
    check-gateway=ping

For more reliable detection, use recursive routing as described in the next section.

Recursive Routing

Recursive routing solves the gateway detection problem by monitoring an IP address that is actually reachable through the internet, not just the directly-connected gateway. This prevents false positives where the gateway responds but internet connectivity is down.

# First, add routes to monitoring IPs (these should be stable public IPs)
# Cloudflare DNS
/ip route add dst-address=1.1.1.1/32 gateway=203.0.113.1 distance=1 scope=10
# Google DNS (backup)
/ip route add dst-address=8.8.8.8/32 gateway=192.168.100.1 distance=2 scope=10

# Add recursive default routes
/ip route add dst-address=0.0.0.0/0 gateway=1.1.1.1 distance=1 \
    scope=30 target-scope=20 check-gateway=ping

/ip route add dst-address=0.0.0.0/0 gateway=8.8.8.8 distance=2 \
    scope=30 target-scope=20 check-gateway=ping

The key parameters:

scope (30): Limits route lookup to 30 hops, preventing infinite recursion
target-scope (20): The recursive lookup can traverse up to 20 hops to find the actual gateway
gateway: An IP that is pingable and represents actual internet connectivity

This works by:

The default route points to a public IP (e.g., 1.1.1.1) instead of the local gateway
RouterOS recursively resolves this to an actual gateway by looking for a route to 1.1.1.1
The recursive route to 1.1.1.1 has check-gateway enabled
When 1.1.1.1 becomes unreachable, the recursive resolution fails and the default route becomes inactive

Script-Based Failover with Netwatch

For advanced control, Netwatch monitors hosts and runs scripts on state changes. This provides the most flexibility for complex failover scenarios:

# Add Netwatch entry to monitor primary gateway
/tool netwatch add host=203.0.113.1 interval=10s timeout=3s \
    up-script=":log info \"Primary gateway UP\" " \
    down-script=":log error \"Primary gateway DOWN - failing over\"; \
        /ip route set [find gateway=203.0.113.1] disabled=yes; \
        /ip route set [find gateway=192.168.100.1] disabled=no"

# Alternative: Use script to handle both directions
/tool netwatch add host=203.0.113.1 interval=10s timeout=3s \
    down-script={
        :log error "Primary gateway DOWN"
        /ip route set [find gateway=203.0.113.1] disabled=yes
    }

When the primary comes back up:

# Add second Netwatch to detect recovery and switch back
/tool netwatch add host=203.0.113.1 interval=30s timeout=5s \
    up-script={
        :log info "Primary gateway restored"
        /ip route set [find gateway=203.0.113.1] disabled=no
    }

Common Issues and Troubleshooting

Issue: DHCP Default Routes Conflict with Failover

When using DHCP on WAN interfaces, the ISP-provided default route can interfere with failover configuration. The DHCP client automatically creates a route with distance=1, which takes precedence over your manual routes.

# Check for DHCP client routes
/ip route print where dynamic=yes

# Remove or lower priority of DHCP default route
# Option 1: Set DHCP client to not add default route
/ip dhcp-client add interface=ether2 use-peer-route=no

# Option 2: Increase distance of DHCP route
/ip route set [find gateway-type=darwin] distance=10

Issue: Check-Gateway False Positives

Some ISPs use CGNAT or block ICMP, causing check-gateway=ping to fail even when connectivity exists:

# If ping fails but connectivity works, use ARP mode instead
/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=arp

# Or use recursive routing with DNS servers that respond
/ip route add dst-address=1.1.1.1/32 gateway=203.0.113.1 distance=1 scope=10
/ip route add dst-address=0.0.0.0/0 gateway=1.1.1.1 distance=1 scope=30 target-scope=20 check-gateway=ping

Issue: Flapping During Brief Outages

Brief outages can cause rapid failover cycling. Add hysteresis to prevent this:

# Use script-based Netwatch with confirmation
/tool netwatch add host=1.1.1.1 interval=10s timeout=3s \
    down-script={
        :delay 5
        :if ([/ping 1.1.1.1 count=1] = 0) do={
            :log error "Confirmed failure - failing over"
            /ip route set [find gateway=203.0.113.1] disabled=yes
        }
    }

Issue: PCC Load Balancing Prevents Failover

If you’re using PCC (Per Connection Classifier) for load balancing, failover requires additional configuration. PCC marks connections and routes them based on those marks, which bypasses the failover routes:

# Ensure failover routes are checked before PCC rules
# Add routing-mark for failover traffic
/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 routing-mark=wan1
/ip route add dst-address=0.0.0.0/0 gateway=192.168.100.1 distance=2 routing-mark=wan2

# Accept already-marked connections to skip further mangle processing
/ip firewall mangle add chain=prerouting action=accept \
    connection-mark=wan1_conn passthrough=no
/ip firewall mangle add chain=prerouting action=accept \
    connection-mark=wan2_conn passthrough=no

# Mark connections for each WAN
/ip firewall mangle add chain=prerouting in-interface=ether1 action=mark-connection \
    new-connection-mark=wan1_conn passthrough=yes
/ip firewall mangle add chain=prerouting in-interface=ether2 action=mark-connection \
    new-connection-mark=wan2_conn passthrough=yes

Issue: Dual WAN with Mixed Connection Types

Combining DHCP and static/PPPoE connections requires special handling:

# For DHCP WAN - ensure it doesn't add default route
/ip dhcp-client add interface=ether2 use-peer-route=no default-route-distance=5

# Add explicit routes for each WAN
/ip route add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=1 \
    routing-mark=isp_dhcp check-gateway=ping
/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=2 \
    routing-mark=isp_pppoe check-gateway=ping

Quick Reference Commands

# View all default routes and their status
/ip route print where dst-address=0.0.0.0/0

# View route gateway status
/ip route print detail

# Test gateway connectivity
/ping 203.0.113.1

# Monitor Netwatch status
/tool netwatch print

# View interface link status
/interface print

# Check active routes
/ip route print

VRRP - For router-level redundancy
Bonding - For link-level redundancy
Netwatch - For advanced host monitoring
Routing Settings - For ECMP and routing configuration

Failover (WAN Backup)

Failover (WAN Backup)

Summary

Introduction

How RouterOS Selects Routes

The Gateway Detection Problem

Setup Overview

Network Topology Example

Configuration

Basic Failover (Distance-Based)

Check-Gateway

Improving Detection Reliability

Recursive Routing

Script-Based Failover with Netwatch

Common Issues and Troubleshooting

Issue: DHCP Default Routes Conflict with Failover

Issue: Check-Gateway False Positives

Issue: Flapping During Brief Outages

Issue: PCC Load Balancing Prevents Failover

Issue: Dual WAN with Mixed Connection Types

Quick Reference Commands

Related Topics