System Health Monitoring in RouterOS: A Complete Guide
System Health Monitoring in RouterOS: A Complete Guide
Section titled “System Health Monitoring in RouterOS: A Complete Guide”RouterOS Version: 7.x+ Difficulty: Beginner Estimated Time: 20 minutes
Overview
Section titled “Overview”System health monitoring provides real-time hardware status information including temperature, voltage, current, fan speed, and power consumption. Monitoring these values helps you:
- Detect overheating before hardware damage occurs
- Verify power supplies are functioning correctly
- Monitor fan operation to prevent thermal throttling
- Track power consumption for capacity planning
- Integrate with external monitoring via SNMP
Health data is available via CLI, WinBox, API, and SNMP, making it easy to integrate with monitoring systems like MRTG, Cacti, Zabbix, or The Dude.
Key limitation: Health monitoring requires hardware support. CHR (Cloud Hosted Router) and some low-end devices have no sensors and will show an empty health menu.
Menu Reference
Section titled “Menu Reference”| Menu | Purpose |
|---|---|
/system/health | View sensor readings |
/system/health/settings | Configure fan control (v7.9+) |
Understanding Health Readings
Section titled “Understanding Health Readings”Common Sensors
Section titled “Common Sensors”| Sensor | Type | Description |
|---|---|---|
cpu-temperature | C | CPU die temperature |
board-temperature | C | Board/ambient temperature |
temperature | C | General temperature sensor |
voltage | V | Input voltage |
current | A | Input current draw |
power-consumption | W | Total power draw |
fan1-speed | RPM | Fan rotation speed |
psu1-state | ok/fail | Power supply status |
What’s Normal?
Section titled “What’s Normal?”| Sensor | Normal Range | Warning |
|---|---|---|
| CPU Temperature | 40-70°C | Above 80°C |
| Board Temperature | 20-50°C | Above 60°C |
| Voltage | 10-28V (varies) | Check device specs |
| Fan Speed | 2000-6000 RPM | 0 RPM when should be running |
Note: CPU temperatures of 70-80°C are normal under load. The “operating temperature” in device specs (-20°C to +70°C) refers to room/ambient temperature, not CPU temperature.
Configuration Examples
Section titled “Configuration Examples”Example 1: View Current Health Status
Section titled “Example 1: View Current Health Status”/system/health/printExample output on a CCR device:
# NAME VALUE TYPE 0 power-consumption 50.8 W 1 cpu-temperature 43 C 2 fan1-speed 5654 RPM 3 board-temperature1 29 C 4 voltage 24.5 VExample 2: Filter Specific Readings
Section titled “Example 2: Filter Specific Readings”View only temperatures:
/system/health/print where type="C"View only fan speeds:
/system/health/print where type="RPM"Find a specific sensor:
/system/health/print where name="cpu-temperature"Example 3: Configure Fan Control (v7.9+)
Section titled “Example 3: Configure Fan Control (v7.9+)”Control when fans start and reach full speed:
# View current settings/system/health/settings/print
# Set temperature where fans start spinning/system/health/settings/set fan-target-temp=55
# Set temperature where fans reach maximum speed/system/health/settings/set fan-full-speed-temp=65
# Prevent fans from completely stopping (reduces cycling)/system/health/settings/set fan-min-speed-percent=15Fan control is available on: CRS3xx, CRS5xx, CCR2xxx (v7.9+), CCR1036/CCR1016 (v7.14+)
Example 4: Enable CPU Overtemperature Protection
Section titled “Example 4: Enable CPU Overtemperature Protection”For ARM/ARM64 devices, enable automatic protection if CPU overheats:
# Enable overtemperature monitoring/system/health/settings/set cpu-overtemp-check=yes
# Set threshold (default 105°C)/system/health/settings/set cpu-overtemp-threshold=100
# Delay after boot before monitoring (prevents false triggers)/system/health/settings/set cpu-overtemp-startup-delay=2mExample 5: Create Temperature Alert Script
Section titled “Example 5: Create Temperature Alert Script”Send email when temperature exceeds threshold:
# First configure email (required)
# Create alert script/system/script/add name=temp-alert policy=read,write,test source={ :local cpuTemp [/system/health/get [find name="cpu-temperature"] value] :local threshold 75 :if ($cpuTemp > $threshold) do={ :log warning "CPU temperature high: $cpuTemp C" subject="[ALERT] Router temperature high" \ body="CPU temperature: $cpuTemp C (threshold: $threshold C)" }}
# Schedule to run every 5 minutes/system/scheduler/add name=temp-check interval=5m on-event=temp-alertExample 6: Temperature Alert with Rate Limiting
Section titled “Example 6: Temperature Alert with Rate Limiting”Prevent email spam by limiting alerts:
/system/script/add name=temp-alert-limited policy=read,write,test source={ :global lastTempAlert :local cpuTemp [/system/health/get [find name="cpu-temperature"] value] :local threshold 75 :local cooldown 3600
:if ($cpuTemp > $threshold) do={ :local now [/system/clock/get time] :local currentSecs ([:pick $now 0 2] * 3600 + [:pick $now 3 5] * 60)
:if (($lastTempAlert = nil) or (($currentSecs - $lastTempAlert) > $cooldown)) do={ :log warning "CPU temperature high: $cpuTemp C" subject="[ALERT] Router temperature high" \ body="CPU temperature: $cpuTemp C" :set lastTempAlert $currentSecs } }}Example 7: Monitor via SNMP
Section titled “Example 7: Monitor via SNMP”Get SNMP OIDs for external monitoring tools:
/system/health/print oidOutput shows OID for each sensor:
# NAME VALUE TYPE OID 0 cpu-temperature 43 C .1.3.6.1.4.1.14988.1.1.3.11 1 voltage 24.5 V .1.3.6.1.4.1.14988.1.1.3.8Poll from external system (Linux example):
# Get CPU temperature (returns decidegrees - divide by 10)snmpget -v2c -c public 192.168.1.1 .1.3.6.1.4.1.14988.1.1.3.11.0Example 8: Configure SNMP Temperature Traps
Section titled “Example 8: Configure SNMP Temperature Traps”Send SNMP trap when temperature exceeds threshold:
# Enable SNMP/snmp set enabled=yes
# Configure trap destination/snmp set trap-target=192.168.1.100 trap-community=public trap-version=2
# Enable temperature exception traps/snmp set trap-generators=temp-exceptionTrap triggers at 100°C or the cpu-overtemp-threshold value.
Example 9: Check Power Supply Status (Dual-PSU Devices)
Section titled “Example 9: Check Power Supply Status (Dual-PSU Devices)”# View PSU status/system/health/print where name~"psu"Expected output:
# NAME VALUE TYPE 0 psu1-state ok 1 psu2-state ok 2 psu1-voltage 24.2 V 3 psu2-voltage 24.3 VIf a PSU fails, psu1-state or psu2-state will show fail.
SNMP Integration
Section titled “SNMP Integration”Common Health OIDs
Section titled “Common Health OIDs”| Reading | OID | Notes |
|---|---|---|
| voltage | .1.3.6.1.4.1.14988.1.1.3.8.0 | Decivolts (÷10) |
| temperature | .1.3.6.1.4.1.14988.1.1.3.10.0 | Decidegrees (÷10) |
| cpu-temperature | .1.3.6.1.4.1.14988.1.1.3.11.0 | Decidegrees (÷10) |
| power-consumption | .1.3.6.1.4.1.14988.1.1.3.12.0 | Deciwatts (÷10) |
| fan-speed | .1.3.6.1.4.1.14988.1.1.3.17.0 | RPM |
| psu1-state | .1.3.6.1.4.1.14988.1.1.3.15.0 | 0=fail, 1=ok |
| psu2-state | .1.3.6.1.4.1.14988.1.1.3.16.0 | 0=fail, 1=ok |
Important: SNMP returns values multiplied by 10. CLI shows 24.5V; SNMP returns 245.
Use in Monitoring Tools
Section titled “Use in Monitoring Tools”Zabbix/Cacti/MRTG: Use the OIDs above with a multiplier of 0.1 for voltage/temperature/power.
The Dude: Supports MikroTik health OIDs natively; no configuration needed.
Common Problems and Solutions
Section titled “Common Problems and Solutions”Problem 1: Health Menu is Empty
Section titled “Problem 1: Health Menu is Empty”Cause: Device has no hardware monitoring support.
Solution: Check device specifications at mikrotik.com. CHR and some low-end RouterBOARDs have no sensors.
Problem 2: Scripts Broken After v7 Upgrade
Section titled “Problem 2: Scripts Broken After v7 Upgrade”Cause: RouterOS v7 changed health menu structure.
v6 syntax (broken):
:local temp [/system health get temperature]v7 syntax (correct):
:local temp [/system/health/get [find name="cpu-temperature"] value]Problem 3: SNMP Values Look Wrong
Section titled “Problem 3: SNMP Values Look Wrong”Cause: SNMP returns decivolts/decidegrees (multiplied by 10).
Solution: Divide SNMP values by 10 in your monitoring tool.
Problem 4: Fan Shows 0 RPM
Section titled “Problem 4: Fan Shows 0 RPM”Possible causes:
- Temperature below
fan-target-temp(fans not needed) - Device doesn’t support fan control
- Fan hardware failure
Check:
/system/health/settings/print# If fan-target-temp is higher than current temp, fans won't spinProblem 5: Cannot Control Fan Speed Directly
Section titled “Problem 5: Cannot Control Fan Speed Directly”Cause: MikroTik doesn’t allow direct RPM control.
Solution: Use temperature thresholds to influence behavior:
/system/health/settings/set fan-target-temp=50 fan-full-speed-temp=60Problem 6: Temperature Email Spam
Section titled “Problem 6: Temperature Email Spam”Cause: Script runs repeatedly while temperature is high.
Solution: Add rate limiting (see Example 6) or track recovery:
# Only alert once until temperature recovers:global tempAlertSent:if ($cpuTemp > 75) do={ :if ($tempAlertSent != true) do={ # Send alert :set tempAlertSent true }} else={ :set tempAlertSent false}Problem 7: PoE Voltage Reads Lower Than Expected
Section titled “Problem 7: PoE Voltage Reads Lower Than Expected”Cause: PoE-powered devices have protection circuitry causing voltage drop in readings.
Solution: This is expected behavior. Actual input voltage is higher than displayed.
Fan Control Settings Reference
Section titled “Fan Control Settings Reference”| Setting | Default | Description |
|---|---|---|
fan-target-temp | 58°C | Temperature where fans start |
fan-full-speed-temp | 65°C | Temperature where fans reach max |
fan-min-speed-percent | 12% | Minimum fan speed (prevents cycling) |
fan-control-interval | 30s | Seconds between temp readings |
Supported devices (v7.9+): CRS3xx, CRS5xx, CCR2xxx
Additional support (v7.14+): CCR1036, CCR1016
Verification Commands
Section titled “Verification Commands”# Check if health monitoring is available/system/health/print# Empty = no hardware support
# View all temperature sensors/system/health/print where type="C"
# Check fan status/system/health/print where type="RPM"
# View fan control settings/system/health/settings/print
# Get SNMP OIDs/system/health/print oid
# Check PSU status (dual-PSU devices)/system/health/print where name~"psu.*state"Summary
Section titled “Summary”Health monitoring in RouterOS provides essential hardware status information:
- View readings with
/system/health/print - Configure fans with
/system/health/settings(v7.9+) - Create alerts using scripts and scheduler
Key points:
- Available sensors vary by device model
- SNMP values are multiplied by 10 (divide for actual values)
- Fan control is indirect via temperature thresholds
- v7 changed health menu structure (update scripts accordingly)
- Alert scripts need rate limiting to prevent spam
Related Topics
Section titled “Related Topics”Monitoring Integration
Section titled “Monitoring Integration”- SNMP Configuration - export health data to monitoring systems
- Netwatch - complement with connectivity monitoring
Alerting
Section titled “Alerting”- Email Tool - send temperature alerts
- Scheduler - run health checks periodically
- Scripts - custom health monitoring logic
System Reliability
Section titled “System Reliability”- Watchdog - automatic recovery on system issues
- System Backup - backup before potential hardware issues
Configuration and Logging
Section titled “Configuration and Logging”- Logging - log health-related events