Skip to content

Health Monitoring Scripts

Health monitoring scripts automate failover and recovery decisions in RouterOS by detecting link and service failures and triggering configuration changes. Combining Netwatch event hooks with scripted VRRP priority changes and route state control provides end-to-end automation without manual intervention.

RouterOS provides two complementary approaches to health monitoring: Netwatch, a built-in tool that tracks host reachability and executes scripts on state transitions, and scheduled scripts using /tool ping for custom multi-target or hysteresis logic. These triggers drive downstream changes such as adjusting VRRP election priority, toggling route entries, or sending notifications.

Health monitoring is especially important in WAN failover deployments where the directly connected ISP gateway remains reachable while the upstream Internet path is broken. In those scenarios, check-gateway=ping alone is insufficient, and explicit probes to internet-reachable hosts are required.

Netwatch monitors host reachability and fires up-script and down-script commands on state transitions. It supports ICMP, TCP, HTTP, and DNS probe types with configurable thresholds for packet loss, round-trip time, and jitter.

# Monitor an internet host via ICMP
/tool/netwatch
add name=wan1-probe host=8.8.8.8 type=icmp interval=10s timeout=1s \
up-script="/system script run wan1-up" \
down-script="/system script run wan1-down"

The interval controls how often the probe runs. The timeout controls how long to wait for a reply before marking the probe as failed.

Netwatch supports four probe types:

TypeDescriptionAdditional Parameters
icmpICMP echo (ping)thr-loss-percent, thr-rtt-jitter-ms, thr-rtt-stdev-ms, thr-rtt-avg-ms, thr-rtt-max-ms
tcpTCP connection to portport
httpHTTP GET and status checkhttp-code-min, http-code-max, http-code-redirect
dnsDNS resolution checkNo extra parameters

For WAN quality monitoring, Netwatch can trigger on degraded links before they completely fail:

/tool/netwatch
add name=wan1-quality host=8.8.8.8 type=icmp interval=10s timeout=2s \
thr-loss-percent=20 \
thr-rtt-avg-ms=200 \
up-script="/system script run wan1-up" \
down-script="/system script run wan1-down" \
test-script="/system script run wan1-test"

The test-script runs on every probe regardless of state change, allowing logging of quality metrics over time.

Netwatch scripts run as the *sys user. This has two important implications:

  • Scripts cannot read global variables set by other users or sessions.
  • Scripts are subject to the policy restrictions applied to the *sys user.

Write self-contained scripts that do not rely on global variables set elsewhere. Pass data via system comments, route properties, or dedicated script variables instead.

# Bad: global variable set elsewhere is not visible to *sys
:global wanState "up"
# Better: read current state directly inside the Netwatch script
/system script
add name=wan1-down source={
:local currentPrio [/interface vrrp get [find name=vrrp-lan] priority]
:if ($currentPrio != 90) do={
/interface vrrp set [find name=vrrp-lan] priority=90
:log warning "WAN1 down: reduced VRRP priority to 90"
}
}

VRRP Master election is priority-based. Scripts can reduce a router’s VRRP priority when upstream health fails, causing the Backup router (with a stable higher priority) to assume Master role. When the upstream recovers, restoring priority allows the primary to reclaim Master status if preemption is enabled.

On the primary router, configure VRRP with a high priority and preemption enabled:

/interface vrrp add \
name=vrrp-lan \
interface=bridge-lan \
vrid=10 \
priority=150 \
preemption-mode=yes

On the standby router, use a lower priority that sits between the primary’s normal and degraded values:

/interface vrrp add \
name=vrrp-lan \
interface=bridge-lan \
vrid=10 \
priority=100 \
preemption-mode=yes
/system script
add name=wan1-down source={
/interface vrrp set [find name=vrrp-lan] priority=90
:log warning "WAN1 down: VRRP priority reduced to 90, standby will take over"
}
add name=wan1-up source={
/interface vrrp set [find name=vrrp-lan] priority=150
:log info "WAN1 up: VRRP priority restored to 150"
}
/tool/netwatch
add name=wan1-probe host=8.8.8.8 type=icmp interval=10s timeout=1s \
down-script="/system script run wan1-down" \
up-script="/system script run wan1-up"

Priority ladder: Design the priority values so that the standby router’s normal priority (100) is higher than the primary’s degraded priority (90). This ensures deterministic failover:

StatePrimary priorityStandby priorityMaster
Normal150100Primary
WAN1 down90100Standby
WAN1 recovered150100Primary (preempts)

An alternative to VRRP manipulation is toggling route entries based on health. This pattern is useful when failover is handled by routing distance or when BGP conditional advertisement is needed.

RouterOS route distance failover works automatically when check-gateway=ping detects the primary gateway is down. However, for deeper upstream health checks, scripts can disable the primary default route explicitly:

# Primary and backup default routes
/ip route
add dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 comment=wan1-primary
add dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 comment=wan2-backup
/system script
add name=wan1-down source={
/ip route set [find comment=wan1-primary] disabled=yes
:log warning "WAN1 down: primary default route disabled"
}
add name=wan1-up source={
/ip route set [find comment=wan1-primary] disabled=no
:log info "WAN1 up: primary default route re-enabled"
}

For multi-WAN setups with ECMP, use recursive routes with per-ISP health targets:

# Health probes — specific hosts reachable only via each ISP
/ip route
add dst-address=9.9.9.9/32 gateway=203.0.113.1 scope=10 comment=isp1-health-target
add dst-address=1.1.1.1/32 gateway=198.51.100.1 scope=10 comment=isp2-health-target
# ECMP default routes that recursively resolve through health targets
/ip route
add dst-address=0.0.0.0/0 gateway=9.9.9.9 check-gateway=ping comment=isp1-default
add dst-address=0.0.0.0/0 gateway=1.1.1.1 check-gateway=ping comment=isp2-default

When Netwatch detects 9.9.9.9 is unreachable, the script disables the ISP1 health target route, which removes the gateway resolution for the ISP1 ECMP entry and automatically withdraws it from active use.

Scripts can enable or disable static/blackhole routes that BGP redistributes. This controls what prefixes are announced to upstream peers based on health:

# A blackhole route used as a BGP advertisement gate
/ip route
add dst-address=203.0.113.0/24 type=blackhole disabled=yes comment=bgp-health-gate
/system script
add name=service-down source={
/ip route set [find comment=bgp-health-gate] disabled=no
:log warning "Service down: advertising fallback prefix via BGP"
}
add name=service-up source={
/ip route set [find comment=bgp-health-gate] disabled=yes
:log info "Service up: withdrew fallback BGP prefix"
}

The BGP instance must be configured to redistribute connected or static routes matching this prefix for the announce/withdraw to take effect.

VRRP interfaces support on-master and on-backup script hooks that fire when the interface transitions state. These complement Netwatch-driven priority changes by allowing additional automation after a role change occurs.

# Execute scripts on VRRP state transitions
/interface vrrp set vrrp-lan \
on-master="/system script run became-master" \
on-backup="/system script run became-backup"
/system script
add name=became-master source={
:log info "VRRP: became master — enabling NAT masquerade"
/ip firewall nat set [find action=masquerade] disabled=no
}
add name=became-backup source={
:log info "VRRP: became backup — disabling NAT masquerade"
/ip firewall nat set [find action=masquerade] disabled=yes
}

Append logging or email notifications to health scripts to alert operators of state changes:

/system script
add name=wan1-down source={
/interface vrrp set [find name=vrrp-lan] priority=90
:log warning "WAN1 down: failover activated"
/tool/e-mail send to="[email protected]" subject="WAN1 down" body="Probe to 8.8.8.8 failed; VRRP priority reduced."
}

Email requires /tool/e-mail to be configured with an SMTP server. For environments without SMTP access, log entries are sufficient for monitoring systems that ingest RouterOS syslog.

For multi-target logic or custom hysteresis beyond Netwatch’s single-host model, use the RouterOS scheduler with a scripted ping loop:

/system script
add name=multi-probe source={
:local wan1ok false
:local wan2ok false
# Check WAN1 health target
:if ([/tool ping 9.9.9.9 count=3 interval=200ms] = 3) do={ :set wan1ok true }
# Check WAN2 health target
:if ([/tool ping 1.1.1.1 count=3 interval=200ms] = 3) do={ :set wan2ok true }
:if ($wan1ok = false) do={
/ip route set [find comment=isp1-default] disabled=yes
:log warning "Multi-probe: ISP1 unreachable"
} else={
/ip route set [find comment=isp1-default] disabled=no
}
:if ($wan2ok = false) do={
/ip route set [find comment=isp2-default] disabled=yes
:log warning "Multi-probe: ISP2 unreachable"
} else={
/ip route set [find comment=isp2-default] disabled=no
}
}
/system scheduler
add name=health-check interval=30s on-event="/system script run multi-probe"

Scheduler-based probes are suitable when you need to check multiple destinations or apply conditional logic across several targets before triggering an action.

Scripts not running — Verify the script name matches exactly the name referenced in the Netwatch down-script/up-script fields. Script names are case-sensitive. Run the script manually with /system script run <name> to confirm it executes without errors.

Netwatch stays in “down” state after recovery — Check that the up-script correctly reverses the changes made by down-script. Review /log print where message~"vrrp" or /log print where message~"route" for evidence of state restoration.

VRRP priority not changing — Confirm that preemption-mode=yes is set on the Backup router. Without preemption, the Backup will not become Master even when the primary’s priority drops below its own.

Probe targets unreachable via one ISP — Use per-ISP health target routes with a specific scope value (scope=10) to ensure the probe traffic is routed through the correct gateway. Without this, probes may take the default route and produce misleading results.

# Verify Netwatch probe state
/tool/netwatch print detail
# View recent health script log entries
/log print where topics~"script" where message~"WAN"
# Test script execution manually
/system script run wan1-down