Hard drives announce their failure — through rising reallocated sectors, pending sectors and temperatures. TrueNAS collects all this SMART data reliably, but the web UI only shows snapshots. To detect an impending disk failure early, you need time series, dashboards and alerts.
That is exactly what Prometheus and Grafana deliver. In this article we show how to export SMART data from TrueNAS, scrape it with Prometheus, visualize it in Grafana and configure alerts on pre-failure attributes — so you can plan disk replacements instead of reacting to emergencies.
Why SMART Monitoring on a Dashboard?
The TrueNAS web UI shows SMART values as a per-disk table — a snapshot without history. Three problems arise:
- No trends visible: Whether Reallocated_Sector_Ct has been slowly rising for three weeks is invisible. You only see the current value.
- No pool-wide overview: With 24 disks per pool, manually inspecting each drive is unrealistic.
- No alerts on changes: TrueNAS only warns when a SMART test fails — not when pre-fail values are getting worse.
A Grafana dashboard solves all three: temperature curves over 30 days, reallocated-sector trends at pool level, power-on hours per disk, and alerts the moment a critical attribute crosses a threshold. You see not only that a disk has problems right now, but also when the trend reversed.
Architecture: From TrueNAS to a Grafana Panel
The data path consists of three components:
+--------------+ +--------------+ +-----------+ +---------+
| TrueNAS | ---> | Exporter | <---- | Prometheus | ---> | Grafana |
| smartctl | | (netdata | | Scrape + | | Panels |
| /dev/sda | | or | | TSDB | | Alerts |
| | | textfile) | | | | |
+--------------+ +--------------+ +-----------+ +---------+
Two field-proven exporter variants have established themselves:
- Netdata as collector: Netdata runs as an app on TrueNAS SCALE, collects SMART data plus dozens of system metrics and exposes them in Prometheus format at
/api/v1/allmetrics?format=prometheus. Low setup effort, many out-of-the-box metrics. - Textfile exporter: A cron job invokes
smartctl, writes the values to a.promfile, and the node exporter reads it. Maximum control over the exported fields, ideal for dedicated SMART dashboards.
For SMB environments we recommend Netdata because the overhead is minimal. In larger setups with dozens of disks and tailored alerts, the textfile approach is often the better fit.
Variant A: Netdata on TrueNAS SCALE 25.10
On TrueNAS SCALE you install Netdata from the apps catalog. The Prometheus endpoint is then immediately reachable:
curl http://truenas.lan:19999/api/v1/allmetrics?format=prometheus | grep smart
The output contains metrics like:
smart_log_attribute_value{device="sda",attribute="reallocated_sector_ct"} 0
smart_log_attribute_value{device="sda",attribute="current_pending_sector"} 0
smart_log_attribute_raw{device="sda",attribute="temperature_celsius"} 38
smart_log_attribute_raw{device="sda",attribute="power_on_hours"} 18432
In the Prometheus configuration you add a job:
scrape_configs:
- job_name: 'truenas-netdata'
metrics_path: /api/v1/allmetrics
params:
format: ['prometheus']
scrape_interval: 60s
static_configs:
- targets: ['truenas.lan:19999']
labels:
host: 'truenas-prod'
After a Prometheus reload, the metrics appear in the browser at http://prometheus.lan:9090/graph.
Variant B: The smartmon Textfile Exporter
If you already run node exporter, you can use the official smartmon.sh collector. On TrueNAS SCALE (Debian-based), you install the script once and create a cron job:
# install /usr/local/sbin/smartmon.sh (simplified)
cat > /etc/cron.d/smartmon <<'EOF'
*/5 * * * * root /usr/local/sbin/smartmon.sh > /var/lib/node_exporter/textfile/smartmon.prom.$$ \
&& mv /var/lib/node_exporter/textfile/smartmon.prom.$$ /var/lib/node_exporter/textfile/smartmon.prom
EOF
Node exporter is started with --collector.textfile.directory=/var/lib/node_exporter/textfile and then delivers metrics like:
smartmon_attr_value{disk="/dev/sda",attribute_name="Reallocated_Sector_Ct"} 0
smartmon_attr_value{disk="/dev/sda",attribute_name="Current_Pending_Sector"} 0
smartmon_attr_raw_value{disk="/dev/sda",attribute_name="Temperature_Celsius"} 38
smartmon_device_smart_healthy{disk="/dev/sda",model="WDC WD80EFAX"} 1
The advantage: you control which attributes get exported and can add labels like model, serial number or pool per disk.
The Most Important SMART Metrics for the Dashboard
Not every SMART value is relevant. Focus on the pre-fail attributes that statistically actually predict failures (Backblaze studies as reference):
| Attribute | ID | What it shows | Grafana panel |
|---|---|---|---|
| Reallocated_Sector_Ct | 5 | Defective, replaced sectors | Stat + time series, highlight > 0 |
| Current_Pending_Sector | 197 | Unstable sectors, not yet replaced | Stat, > 0 = alert |
| Offline_Uncorrectable | 198 | Uncorrectable sectors | Stat, > 0 = alert |
| UDMA_CRC_Error_Count | 199 | Cable or controller errors | Time series, watch slope |
| Temperature_Celsius | 194 | Current disk temperature | Heatmap, warn at > 45 C |
| Power_On_Hours | 9 | Operating hours | Stat, lifecycle context |
| Wear_Leveling_Count | 173 | SSD wear | Gauge, for NVMe/SSD pools |
A good dashboard combines pool overview (count of disks with pending sectors > 0), per-disk detail panels (temperature curve, reallocation trend) and a top-N view (e.g. “5 hottest disks”).
Alerting on Pre-Failure Attributes
Prometheus alerts belong in a separate rules.yml. Three rules cover the most important cases:
groups:
- name: truenas-smart
interval: 60s
rules:
- alert: SmartPendingSectorsDetected
expr: smartmon_attr_raw_value{attribute_name="Current_Pending_Sector"} > 0
for: 10m
labels:
severity: critical
annotations:
summary: "Pending sectors on {{ $labels.disk }} -- plan disk replacement"
- alert: SmartReallocatedSectorsRising
expr: increase(smartmon_attr_raw_value{attribute_name="Reallocated_Sector_Ct"}[24h]) > 0
for: 1h
labels:
severity: warning
annotations:
summary: "Reallocated sectors rising on {{ $labels.disk }}"
- alert: SmartDiskTemperatureHigh
expr: smartmon_attr_raw_value{attribute_name="Temperature_Celsius"} > 50
for: 15m
labels:
severity: warning
annotations:
summary: "Disk temperature {{ $value }} C on {{ $labels.disk }}"
The trend rule with increase(...[24h]) > 0 is key: it fires on any deterioration — even a reallocation from 0 to 1. That way you catch the start of degradation, not just the late stage.
Real-World Workflow: From Alert to Disk Replacement
A typical lifecycle of an alert in a managed environment:
- Day 0: Grafana shows the first pending sectors on
/dev/sdf. Prometheus fires an alert to Alertmanager. - Day 0: Alert lands in our ticketing system via webhook, status: “disk under observation”.
- Days 1-3: Compare with pool status (
zpool status), check whether ZFS already reports CKSUM errors, trigger a long SMART test. - Days 3-5: Replacement disk is ordered, resilver window is scheduled with the customer.
- Days 5-7: Disk is replaced during operation, the pool resilvers automatically — with zero downtime.
Without monitoring, the disk would probably only have surfaced at the next zpool scrub — possibly together with the second disk in the mirror, which would have caused data loss.
Conclusion
SMART values are the most honest signal a hard drive emits — but only those who measure them continuously can use them. With Netdata or the smartmon exporter, Prometheus as the time-series database and Grafana as the dashboard, you build a disk-health platform that surfaces pre-failure attributes weeks before a failure.
The setup effort is manageable, the benefit measurable: fewer emergency call-outs, planned disk swaps, higher availability of your ZFS storage.
DATAZONE supports you in building a complete monitoring stack for your TrueNAS environment — from SMART exporters through Prometheus rules to Grafana dashboards and Alertmanager routing. We also operate the solution continuously as part of our Linux and storage managed services. Contact us for an initial consultation.
More on these topics:
More articles
Handling ZFS Encryption Keys Right in TrueNAS Replication
TrueNAS replication of encrypted ZFS datasets: raw send, key management at the remote site and real-world recovery walk-through.
TrueNAS App Catalog: Hosting Your Own Docker Images
TrueNAS Scale Custom App explained: deploy your own Docker images, mount host paths, expose ports and use persistent datasets. Includes update workflow.
Joining TrueNAS to Active Directory: SMB Shares with AD Permissions
Integrate TrueNAS SCALE 24.10 with Active Directory -- get DNS, Kerberos, idmap and SMB ACLs right. Practical guide with troubleshooting.