Failover (WAN Backup)
Failover (WAN Backup)
Section titled “Failover (WAN Backup)”Failover (WAN Backup) provides automatic network redundancy by switching to a backup internet connection when the primary connection fails. This is essential for businesses and deployments where network uptime is critical. RouterOS offers multiple approaches to failover, from simple distance-based routing to advanced recursive gateway detection, each with different trade-offs in reliability, speed, and complexity.
This guide covers the fundamental concepts of WAN failover, practical configurations ranging from basic to advanced, and addresses common issues discovered through community discussions that cause failover to fail when you need it most.
Summary
Section titled “Summary”WAN failover addresses the single point of failure inherent in networks with a single internet connection. When the primary WAN link becomes unavailable, traffic automatically routes through the backup connection, maintaining network connectivity without manual intervention.
RouterOS failover typically uses one of three approaches:
- Distance-based failover - Primary route has lower distance, backup has higher. When primary becomes unreachable, the backup route activates.
- Check-gateway - Router actively monitors gateway reachability using ARP or ICMP ping. Routes are disabled when the gateway fails check-gateway.
- Recursive routing - Uses indirect gateway monitoring through multiple hops, providing more reliable detection than direct gateway checks.
The appropriate method depends on network topology, ISP behavior, and required failover speed. This guide covers all three approaches with their advantages and limitations.
Introduction
Section titled “Introduction”Understanding how RouterOS evaluates routes is essential for configuring reliable failover. The router’s decision process follows a specific order that determines which route gets used under various conditions.
How RouterOS Selects Routes
Section titled “How RouterOS Selects Routes”When forwarding traffic, RouterOS selects routes based on the following priority:
- Most specific destination match - The route with the longest prefix matching the destination wins.
- Lowest distance - Among routes to the same destination, the one with lowest distance is preferred.
- First route in routing table - If distances are equal, the route appearing first in the table wins.
This means failover works by manipulating route distances. When the primary route becomes invalid (gateway unreachable), RouterOS automatically uses the backup route with higher distance. However, this only works reliably when the router correctly detects that the primary gateway is unreachable.
The Gateway Detection Problem
Section titled “The Gateway Detection Problem”The most common cause of failover failure is incorrect gateway detection. Many users configure what appears to be correct failover setup, only to find it doesn’t work when the primary link actually fails. Understanding why this happens is critical.
When using DHCP-based WAN connections (common with fiber and cable modems), the ISP’s DHCP server provides a default route. This route often points to a gateway that is actually the next-hop router on the ISP’s network, not the final internet gateway. If that intermediate router remains reachable even when the ISP’s upstream connection fails, the MikroTik sees the gateway as “reachable” and never activates the backup route.
This is why simply having two default routes with different distances doesn’t always work. The router must actively verify that traffic can actually reach the internet through the primary gateway, not just that the directly-connected gateway IP responds to ARP requests.
Setup Overview
Section titled “Setup Overview”The basic failover configuration involves three key components: primary and backup WAN interfaces, appropriately configured routes, and gateway health monitoring appropriate to your network topology.
Network Topology Example
Section titled “Network Topology Example”This guide uses a common dual-WAN topology:
Internet <-- WAN1 (PPPoE/Ethernet) <-- Router <-- LAN <-- WAN2 (DHCP/LTE) <--------|- WAN1: Primary connection (ether1), typically PPPoE or static IP
- WAN2: Backup connection (ether2), typically DHCP or LTE
- LAN: Internal network (bridge or ether3+)
Configuration
Section titled “Configuration”Basic Failover (Distance-Based)
Section titled “Basic Failover (Distance-Based)”The simplest failover method uses route distances. The primary route has distance=1, and the backup route has distance=2. When the primary gateway becomes unreachable, RouterOS automatically uses the backup route.
# Configure primary WAN (WAN1) - assume already configured with IP/ip address add address=203.0.113.2/30 interface=ether1
# Configure backup WAN (WAN2) - assume already configured with IP/ip address add address=192.168.100.2/30 interface=ether2
# Add primary default route (lower distance = preferred)/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1
# Add backup default route (higher distance = used when primary fails)/ip route add dst-address=0.0.0.0/0 gateway=192.168.100.1 distance=2Verify the routes:
/ip route printExpected output:
Flags: X - disabled, I - inactive, D - dynamic, C - connect, S - static, r - rip, o - ospf, b - bgp, U - unreachable # DST-ADDRESS GATEWAY DISTANCE 0 S 0.0.0.0/0 203.0.113.1 1 1 S 0.0.0.0/0 192.168.100.1 2This configuration works when the primary gateway becomes completely unreachable (link down or gateway not responding to ARP). However, as discussed in the introduction, it may not detect failures where the gateway IP responds but internet connectivity is lost.
Check-Gateway
Section titled “Check-Gateway”The check-gateway parameter adds active gateway monitoring. When enabled, RouterOS periodically tests gateway reachability and disables the route if the gateway fails the check. This provides more reliable failover than distance alone.
# Primary route with ping-based gateway check/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=ping
# Backup route/ip route add dst-address=0.0.0.0/0 gateway=192.168.100.1 distance=2The check-gateway options are:
| Option | How It Works | Pros | Cons |
|---|---|---|---|
ping | ICMP echo request to gateway | Tests IP-level reachability | Can be blocked by firewall; may give false positives |
arp | ARP request to gateway | Faster, lower overhead | May show gateway reachable even when internet is down |
# View route status with gateway checks/ip route print detailWhen check-gateway detects failure, the route is marked as inactive:
Flags: X - disabled, I - inactive, D - dynamic, C - connect, S - static, r - rip, o - ospf, b - bgp, U - unreachable 0 S dst-address=0.0.0.0/0 gateway=203.0.113.1 gateway-status=203.0.113.1 unreachable check-gateway=ping distance=1 scope=30 target-scope=10Improving Detection Reliability
Section titled “Improving Detection Reliability”The default check-gateway interval is 10 seconds. For faster failover detection, use the gateway-timeout parameter:
# Faster check-gateway detection using recursive routing (see next section)# or use ARP mode which has lower overhead than ICMP ping/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 \ check-gateway=pingFor more reliable detection, use recursive routing as described in the next section.
Recursive Routing
Section titled “Recursive Routing”Recursive routing solves the gateway detection problem by monitoring an IP address that is actually reachable through the internet, not just the directly-connected gateway. This prevents false positives where the gateway responds but internet connectivity is down.
# First, add routes to monitoring IPs (these should be stable public IPs)# Cloudflare DNS/ip route add dst-address=1.1.1.1/32 gateway=203.0.113.1 distance=1 scope=10# Google DNS (backup)/ip route add dst-address=8.8.8.8/32 gateway=192.168.100.1 distance=2 scope=10
# Add recursive default routes/ip route add dst-address=0.0.0.0/0 gateway=1.1.1.1 distance=1 \ scope=30 target-scope=20 check-gateway=ping
/ip route add dst-address=0.0.0.0/0 gateway=8.8.8.8 distance=2 \ scope=30 target-scope=20 check-gateway=pingThe key parameters:
- scope (30): Limits route lookup to 30 hops, preventing infinite recursion
- target-scope (20): The recursive lookup can traverse up to 20 hops to find the actual gateway
- gateway: An IP that is pingable and represents actual internet connectivity
This works by:
- The default route points to a public IP (e.g., 1.1.1.1) instead of the local gateway
- RouterOS recursively resolves this to an actual gateway by looking for a route to 1.1.1.1
- The recursive route to 1.1.1.1 has check-gateway enabled
- When 1.1.1.1 becomes unreachable, the recursive resolution fails and the default route becomes inactive
Script-Based Failover with Netwatch
Section titled “Script-Based Failover with Netwatch”For advanced control, Netwatch monitors hosts and runs scripts on state changes. This provides the most flexibility for complex failover scenarios:
# Add Netwatch entry to monitor primary gateway/tool netwatch add host=203.0.113.1 interval=10s timeout=3s \ up-script=":log info \"Primary gateway UP\" " \ down-script=":log error \"Primary gateway DOWN - failing over\"; \ /ip route set [find gateway=203.0.113.1] disabled=yes; \ /ip route set [find gateway=192.168.100.1] disabled=no"
# Alternative: Use script to handle both directions/tool netwatch add host=203.0.113.1 interval=10s timeout=3s \ down-script={ :log error "Primary gateway DOWN" /ip route set [find gateway=203.0.113.1] disabled=yes }When the primary comes back up:
# Add second Netwatch to detect recovery and switch back/tool netwatch add host=203.0.113.1 interval=30s timeout=5s \ up-script={ :log info "Primary gateway restored" /ip route set [find gateway=203.0.113.1] disabled=no }Common Issues and Troubleshooting
Section titled “Common Issues and Troubleshooting”Issue: DHCP Default Routes Conflict with Failover
Section titled “Issue: DHCP Default Routes Conflict with Failover”When using DHCP on WAN interfaces, the ISP-provided default route can interfere with failover configuration. The DHCP client automatically creates a route with distance=1, which takes precedence over your manual routes.
# Check for DHCP client routes/ip route print where dynamic=yes
# Remove or lower priority of DHCP default route# Option 1: Set DHCP client to not add default route/ip dhcp-client add interface=ether2 use-peer-route=no
# Option 2: Increase distance of DHCP route/ip route set [find gateway-type=darwin] distance=10Issue: Check-Gateway False Positives
Section titled “Issue: Check-Gateway False Positives”Some ISPs use CGNAT or block ICMP, causing check-gateway=ping to fail even when connectivity exists:
# If ping fails but connectivity works, use ARP mode instead/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 check-gateway=arp
# Or use recursive routing with DNS servers that respond/ip route add dst-address=1.1.1.1/32 gateway=203.0.113.1 distance=1 scope=10/ip route add dst-address=0.0.0.0/0 gateway=1.1.1.1 distance=1 scope=30 target-scope=20 check-gateway=pingIssue: Flapping During Brief Outages
Section titled “Issue: Flapping During Brief Outages”Brief outages can cause rapid failover cycling. Add hysteresis to prevent this:
# Use script-based Netwatch with confirmation/tool netwatch add host=1.1.1.1 interval=10s timeout=3s \ down-script={ :delay 5 :if ([/ping 1.1.1.1 count=1] = 0) do={ :log error "Confirmed failure - failing over" /ip route set [find gateway=203.0.113.1] disabled=yes } }Issue: PCC Load Balancing Prevents Failover
Section titled “Issue: PCC Load Balancing Prevents Failover”If you’re using PCC (Per Connection Classifier) for load balancing, failover requires additional configuration. PCC marks connections and routes them based on those marks, which bypasses the failover routes:
# Ensure failover routes are checked before PCC rules# Add routing-mark for failover traffic/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 routing-mark=wan1/ip route add dst-address=0.0.0.0/0 gateway=192.168.100.1 distance=2 routing-mark=wan2
# Accept already-marked connections to skip further mangle processing/ip firewall mangle add chain=prerouting action=accept \ connection-mark=wan1_conn passthrough=no/ip firewall mangle add chain=prerouting action=accept \ connection-mark=wan2_conn passthrough=no
# Mark connections for each WAN/ip firewall mangle add chain=prerouting in-interface=ether1 action=mark-connection \ new-connection-mark=wan1_conn passthrough=yes/ip firewall mangle add chain=prerouting in-interface=ether2 action=mark-connection \ new-connection-mark=wan2_conn passthrough=yesIssue: Dual WAN with Mixed Connection Types
Section titled “Issue: Dual WAN with Mixed Connection Types”Combining DHCP and static/PPPoE connections requires special handling:
# For DHCP WAN - ensure it doesn't add default route/ip dhcp-client add interface=ether2 use-peer-route=no default-route-distance=5
# Add explicit routes for each WAN/ip route add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=1 \ routing-mark=isp_dhcp check-gateway=ping/ip route add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=2 \ routing-mark=isp_pppoe check-gateway=pingQuick Reference Commands
Section titled “Quick Reference Commands”# View all default routes and their status/ip route print where dst-address=0.0.0.0/0
# View route gateway status/ip route print detail
# Test gateway connectivity/ping 203.0.113.1
# Monitor Netwatch status/tool netwatch print
# View interface link status/interface print
# Check active routes/ip route printRelated Topics
Section titled “Related Topics”- VRRP - For router-level redundancy
- Bonding - For link-level redundancy
- Netwatch - For advanced host monitoring
- Routing Settings - For ECMP and routing configuration