Health Monitoring Scripts
Health Monitoring Scripts
Section titled “Health Monitoring Scripts”Health monitoring scripts automate failover and recovery decisions in RouterOS by detecting link and service failures and triggering configuration changes. Combining Netwatch event hooks with scripted VRRP priority changes and route state control provides end-to-end automation without manual intervention.
Summary
Section titled “Summary”RouterOS provides two complementary approaches to health monitoring: Netwatch, a built-in tool that tracks host reachability and executes scripts on state transitions, and scheduled scripts using /tool ping for custom multi-target or hysteresis logic. These triggers drive downstream changes such as adjusting VRRP election priority, toggling route entries, or sending notifications.
Health monitoring is especially important in WAN failover deployments where the directly connected ISP gateway remains reachable while the upstream Internet path is broken. In those scenarios, check-gateway=ping alone is insufficient, and explicit probes to internet-reachable hosts are required.
Netwatch
Section titled “Netwatch”Netwatch monitors host reachability and fires up-script and down-script commands on state transitions. It supports ICMP, TCP, HTTP, and DNS probe types with configurable thresholds for packet loss, round-trip time, and jitter.
Basic Configuration
Section titled “Basic Configuration”# Monitor an internet host via ICMP/tool/netwatchadd name=wan1-probe host=8.8.8.8 type=icmp interval=10s timeout=1s \ up-script="/system script run wan1-up" \ down-script="/system script run wan1-down"The interval controls how often the probe runs. The timeout controls how long to wait for a reply before marking the probe as failed.
Probe Types
Section titled “Probe Types”Netwatch supports four probe types:
| Type | Description | Additional Parameters |
|---|---|---|
icmp | ICMP echo (ping) | thr-loss-percent, thr-rtt-jitter-ms, thr-rtt-stdev-ms, thr-rtt-avg-ms, thr-rtt-max-ms |
tcp | TCP connection to port | port |
http | HTTP GET and status check | http-code-min, http-code-max, http-code-redirect |
dns | DNS resolution check | No extra parameters |
Advanced ICMP Thresholds
Section titled “Advanced ICMP Thresholds”For WAN quality monitoring, Netwatch can trigger on degraded links before they completely fail:
/tool/netwatchadd name=wan1-quality host=8.8.8.8 type=icmp interval=10s timeout=2s \ thr-loss-percent=20 \ thr-rtt-avg-ms=200 \ up-script="/system script run wan1-up" \ down-script="/system script run wan1-down" \ test-script="/system script run wan1-test"The test-script runs on every probe regardless of state change, allowing logging of quality metrics over time.
Execution Context
Section titled “Execution Context”Netwatch scripts run as the *sys user. This has two important implications:
- Scripts cannot read global variables set by other users or sessions.
- Scripts are subject to the policy restrictions applied to the
*sysuser.
Write self-contained scripts that do not rely on global variables set elsewhere. Pass data via system comments, route properties, or dedicated script variables instead.
# Bad: global variable set elsewhere is not visible to *sys:global wanState "up"
# Better: read current state directly inside the Netwatch script/system scriptadd name=wan1-down source={ :local currentPrio [/interface vrrp get [find name=vrrp-lan] priority] :if ($currentPrio != 90) do={ /interface vrrp set [find name=vrrp-lan] priority=90 :log warning "WAN1 down: reduced VRRP priority to 90" }}VRRP Priority Manipulation
Section titled “VRRP Priority Manipulation”VRRP Master election is priority-based. Scripts can reduce a router’s VRRP priority when upstream health fails, causing the Backup router (with a stable higher priority) to assume Master role. When the upstream recovers, restoring priority allows the primary to reclaim Master status if preemption is enabled.
On the primary router, configure VRRP with a high priority and preemption enabled:
/interface vrrp add \ name=vrrp-lan \ interface=bridge-lan \ vrid=10 \ priority=150 \ preemption-mode=yesOn the standby router, use a lower priority that sits between the primary’s normal and degraded values:
/interface vrrp add \ name=vrrp-lan \ interface=bridge-lan \ vrid=10 \ priority=100 \ preemption-mode=yesScripts
Section titled “Scripts”/system scriptadd name=wan1-down source={ /interface vrrp set [find name=vrrp-lan] priority=90 :log warning "WAN1 down: VRRP priority reduced to 90, standby will take over"}
add name=wan1-up source={ /interface vrrp set [find name=vrrp-lan] priority=150 :log info "WAN1 up: VRRP priority restored to 150"}Netwatch Integration
Section titled “Netwatch Integration”/tool/netwatchadd name=wan1-probe host=8.8.8.8 type=icmp interval=10s timeout=1s \ down-script="/system script run wan1-down" \ up-script="/system script run wan1-up"Priority ladder: Design the priority values so that the standby router’s normal priority (100) is higher than the primary’s degraded priority (90). This ensures deterministic failover:
| State | Primary priority | Standby priority | Master |
|---|---|---|---|
| Normal | 150 | 100 | Primary |
| WAN1 down | 90 | 100 | Standby |
| WAN1 recovered | 150 | 100 | Primary (preempts) |
Route State Control
Section titled “Route State Control”An alternative to VRRP manipulation is toggling route entries based on health. This pattern is useful when failover is handled by routing distance or when BGP conditional advertisement is needed.
Distance-Based WAN Failover
Section titled “Distance-Based WAN Failover”RouterOS route distance failover works automatically when check-gateway=ping detects the primary gateway is down. However, for deeper upstream health checks, scripts can disable the primary default route explicitly:
# Primary and backup default routes/ip routeadd dst-address=0.0.0.0/0 gateway=203.0.113.1 distance=1 comment=wan1-primaryadd dst-address=0.0.0.0/0 gateway=198.51.100.1 distance=10 comment=wan2-backup/system scriptadd name=wan1-down source={ /ip route set [find comment=wan1-primary] disabled=yes :log warning "WAN1 down: primary default route disabled"}
add name=wan1-up source={ /ip route set [find comment=wan1-primary] disabled=no :log info "WAN1 up: primary default route re-enabled"}Recursive Routing Health Targets
Section titled “Recursive Routing Health Targets”For multi-WAN setups with ECMP, use recursive routes with per-ISP health targets:
# Health probes — specific hosts reachable only via each ISP/ip routeadd dst-address=9.9.9.9/32 gateway=203.0.113.1 scope=10 comment=isp1-health-targetadd dst-address=1.1.1.1/32 gateway=198.51.100.1 scope=10 comment=isp2-health-target
# ECMP default routes that recursively resolve through health targets/ip routeadd dst-address=0.0.0.0/0 gateway=9.9.9.9 check-gateway=ping comment=isp1-defaultadd dst-address=0.0.0.0/0 gateway=1.1.1.1 check-gateway=ping comment=isp2-defaultWhen Netwatch detects 9.9.9.9 is unreachable, the script disables the ISP1 health target route, which removes the gateway resolution for the ISP1 ECMP entry and automatically withdraws it from active use.
BGP Conditional Advertisement
Section titled “BGP Conditional Advertisement”Scripts can enable or disable static/blackhole routes that BGP redistributes. This controls what prefixes are announced to upstream peers based on health:
# A blackhole route used as a BGP advertisement gate/ip routeadd dst-address=203.0.113.0/24 type=blackhole disabled=yes comment=bgp-health-gate
/system scriptadd name=service-down source={ /ip route set [find comment=bgp-health-gate] disabled=no :log warning "Service down: advertising fallback prefix via BGP"}
add name=service-up source={ /ip route set [find comment=bgp-health-gate] disabled=yes :log info "Service up: withdrew fallback BGP prefix"}The BGP instance must be configured to redistribute connected or static routes matching this prefix for the announce/withdraw to take effect.
Combining Netwatch with VRRP Event Hooks
Section titled “Combining Netwatch with VRRP Event Hooks”VRRP interfaces support on-master and on-backup script hooks that fire when the interface transitions state. These complement Netwatch-driven priority changes by allowing additional automation after a role change occurs.
# Execute scripts on VRRP state transitions/interface vrrp set vrrp-lan \ on-master="/system script run became-master" \ on-backup="/system script run became-backup"
/system scriptadd name=became-master source={ :log info "VRRP: became master — enabling NAT masquerade" /ip firewall nat set [find action=masquerade] disabled=no}
add name=became-backup source={ :log info "VRRP: became backup — disabling NAT masquerade" /ip firewall nat set [find action=masquerade] disabled=yes}Notification Scripts
Section titled “Notification Scripts”Append logging or email notifications to health scripts to alert operators of state changes:
/system scriptadd name=wan1-down source={ /interface vrrp set [find name=vrrp-lan] priority=90 :log warning "WAN1 down: failover activated" /tool/e-mail send to="[email protected]" subject="WAN1 down" body="Probe to 8.8.8.8 failed; VRRP priority reduced."}Email requires /tool/e-mail to be configured with an SMTP server. For environments without SMTP access, log entries are sufficient for monitoring systems that ingest RouterOS syslog.
Scheduler-Based Probes
Section titled “Scheduler-Based Probes”For multi-target logic or custom hysteresis beyond Netwatch’s single-host model, use the RouterOS scheduler with a scripted ping loop:
/system scriptadd name=multi-probe source={ :local wan1ok false :local wan2ok false
# Check WAN1 health target :if ([/tool ping 9.9.9.9 count=3 interval=200ms] = 3) do={ :set wan1ok true } # Check WAN2 health target :if ([/tool ping 1.1.1.1 count=3 interval=200ms] = 3) do={ :set wan2ok true }
:if ($wan1ok = false) do={ /ip route set [find comment=isp1-default] disabled=yes :log warning "Multi-probe: ISP1 unreachable" } else={ /ip route set [find comment=isp1-default] disabled=no }
:if ($wan2ok = false) do={ /ip route set [find comment=isp2-default] disabled=yes :log warning "Multi-probe: ISP2 unreachable" } else={ /ip route set [find comment=isp2-default] disabled=no }}
/system scheduleradd name=health-check interval=30s on-event="/system script run multi-probe"Scheduler-based probes are suitable when you need to check multiple destinations or apply conditional logic across several targets before triggering an action.
Troubleshooting
Section titled “Troubleshooting”Scripts not running — Verify the script name matches exactly the name referenced in the Netwatch down-script/up-script fields. Script names are case-sensitive. Run the script manually with /system script run <name> to confirm it executes without errors.
Netwatch stays in “down” state after recovery — Check that the up-script correctly reverses the changes made by down-script. Review /log print where message~"vrrp" or /log print where message~"route" for evidence of state restoration.
VRRP priority not changing — Confirm that preemption-mode=yes is set on the Backup router. Without preemption, the Backup will not become Master even when the primary’s priority drops below its own.
Probe targets unreachable via one ISP — Use per-ISP health target routes with a specific scope value (scope=10) to ensure the probe traffic is routed through the correct gateway. Without this, probes may take the default route and produce misleading results.
# Verify Netwatch probe state/tool/netwatch print detail
# View recent health script log entries/log print where topics~"script" where message~"WAN"
# Test script execution manually/system script run wan1-downRelated Topics
Section titled “Related Topics”- VRRP — Virtual Router Redundancy Protocol configuration and state machine
- Failover (WAN Backup) — Route distance and check-gateway based WAN failover
- Multi-WAN Failover with Recursive Routing — Advanced multi-WAN using recursive routes and Netwatch
- Resilient Network Topologies — Designing redundant network architectures