Remote Support Start download

ZFS Scrub and SMART: Proactive Data Integrity Protection

ZFSTrueNASStorageMonitoring
ZFS Scrub and SMART: Proactive Data Integrity Protection

Hard drives do not die suddenly — they announce their failure in advance. The problem: without active monitoring, you only notice the warning signs when it is too late. ZFS scrub and SMART monitoring are the two tools that make silent data corruption and impending hardware failures visible before production data is affected.

This article shows how to properly configure both mechanisms on TrueNAS, interpret their output, and combine them into a proactive disk replacement strategy.

What Is Bit Rot and Why Does It Matter?

Bit rot refers to the gradual corruption of stored data on hard drives or SSDs — without error messages, without warnings. Causes include magnetic degradation, cosmic radiation, or firmware bugs. The result: a single flipped bit can render a file unusable, a database backup unreadable, or a VM image corrupt.

Conventional filesystems like ext4 or NTFS do not detect bit rot. The data sits on the disk, the filesystem reports “all good” — and during the next restore, you discover that your backup has been defective for months.

ZFS Checksumming: Every Block Is Verified

ZFS solves this problem at a fundamental level. Every data block receives a SHA-256 checksum stored in the metadata tree — separate from the actual data. When reading a block, ZFS compares the stored checksum against the calculated one. If they do not match, an error is detected.

In a redundant pool (mirror or RAIDZ), ZFS can automatically repair the corrupted block from an intact copy — completely transparent to the user. This is self-healing built into the filesystem layer.

What ZFS Scrub Does

A scrub is the systematic verification of every data block in the pool. ZFS reads each block, compares the checksum, and automatically repairs errors from redundancy copies.

The critical difference from normal operation: during daily use, ZFS only verifies blocks that are actually read. Blocks that remain untouched for months stay unchecked. A scrub ensures that even those blocks are intact.

Setting Up Scrub in TrueNAS

TrueNAS creates a monthly scrub task by default. For production environments, we recommend a shorter interval:

Data Protection > Scrub Tasks > Add
  Pool:        tank
  Threshold:   14 (days)
  Schedule:    Every Sunday, 02:00 AM

Alternatively via cron on the TrueNAS shell:

# Scrub every 2 weeks, Sunday at 02:00 AM
echo "0 2 * * 0 root zpool scrub tank" >> /etc/cron.d/zfs-scrub

Interpreting Scrub Results

After a scrub completes, check the status with zpool status:

zpool status tank

A healthy pool shows:

  scan: scrub repaired 0B in 04:32:15 with 0 errors on Sun Mar 22 06:32:15 2026
config:

        NAME                                  STATE     READ WRITE CKSUM
        tank                                  ONLINE       0     0     0
          mirror-0                            ONLINE       0     0     0
            da0                               ONLINE       0     0     0
            da1                               ONLINE       0     0     0

The critical columns are READ, WRITE, and CKSUM. Any value greater than 0 requires attention:

ColumnMeaningAction
READRead errors on the deviceCheck disk SMART, replace if recurring
WRITEWrite errors on the deviceInvestigate immediately — possible controller or cable defect
CKSUMChecksum errors (bit rot)ZFS repaired the data, but the root cause must be found

SMART Monitoring: Watching the Hardware

While ZFS secures data integrity at the logical level, SMART (Self-Monitoring, Analysis and Reporting Technology) monitors the physical condition of hard drives. SMART values reveal mechanical wear, defective sectors, and temperature issues — often weeks before a drive fails completely.

Critical SMART Attributes

AttributeIDMeaningThreshold
Reallocated_Sector_Ct5Replaced defective sectors> 0 monitor, > 10 critical
Current_Pending_Sector197Unstable sectors awaiting reallocation> 0 investigate immediately
Offline_Uncorrectable198Uncorrectable sectors> 0 plan disk replacement
UDMA_CRC_Error_Count199Transfer errors (cable/controller)> 0 check cables
Temperature_Celsius194Operating temperature> 45 C improve cooling, > 55 C critical
Power_On_Hours9Total operating hoursContext for wear assessment

Setting Up SMART Tests in TrueNAS

TrueNAS offers two test types:

Data Protection > S.M.A.R.T. Tests > Add
  Type:     SHORT (15-30 minutes, weekly)
  Disks:    All Disks
  Schedule: Every Monday, 03:00 AM
Data Protection > S.M.A.R.T. Tests > Add
  Type:     LONG (2-8 hours, monthly)
  Disks:    All Disks
  Schedule: First Saturday of the month, 01:00 AM

Short tests verify basic functionality and read the error log. Long tests scan the entire disk surface and find errors that short tests miss.

smartctl on the Command Line

Get detailed SMART information directly via CLI:

# Retrieve full SMART status
smartctl -a /dev/da0

# Show only critical attributes
smartctl -A /dev/da0 | grep -E "Reallocated|Pending|Uncorrectable|CRC|Temperature"

# Start a long test manually
smartctl -t long /dev/da0

# Retrieve test results
smartctl -l selftest /dev/da0

Alerting on SMART Failures

TrueNAS sends email alerts on SMART warnings by default. Make sure the alert configuration is active:

System > Alert Settings > Email
  Recipient:  admin@example.com
  SMART:      Warning + Critical

Combining Scrub and SMART: Proactive Disk Replacement

The real strength lies in combining both mechanisms. ZFS scrub detects logical errors (bit rot, checksum failures), SMART detects physical degradation (defective sectors, mechanical wear). Together, they form an early warning system.

When to Replace a Disk

SituationUrgencyAction
CKSUM errors in scrub, SMART OKMediumWait for next scrub, replace if recurring
Reallocated_Sector_Ct risingHighOrder replacement disk, swap within 1-2 weeks
Current_Pending_Sector > 0HighMonitor disk closely, ensure resilver capacity
CKSUM errors + rising SMART valuesCriticalReplace immediately — drive will fail soon
Offline_Uncorrectable > 0CriticalReplace immediately — data loss risk with further degradation
SMART self-test failedCriticalReplace immediately

Rule of thumb: A single checksum error in a scrub is worth monitoring. Rising SMART values combined with scrub errors are a clear signal for timely disk replacement.

Monitoring with DATAZONE Control

In a production TrueNAS environment, manual checks are not sufficient. With DATAZONE Control, we monitor scrub and SMART status automatically around the clock:

  • Scrub monitoring: Last scrub time, duration, error count, overdue scrubs
  • SMART trends: Historical development of critical attributes over weeks and months
  • Threshold alerts: Automatic notification when reallocated sectors or checksum errors increase
  • Disk lifecycle tracking: Operating hours and wear trends for predictive replacement planning
  • Pool health: Overall status of all ZFS pools at a glance

Through trend analysis, we detect degradation not when the drive fails, but weeks in advance — and replace disks proactively before a rebuild under load becomes necessary.

Conclusion

Data integrity does not happen by itself. ZFS scrubs find silent data errors that no other filesystem would detect. SMART monitoring reveals physical wear before it leads to failure. Both mechanisms together form the foundation for a proactive storage strategy that prevents data loss rather than reacting to it.

The effort to set this up is minimal — the protection you gain is substantial.


Want to secure your TrueNAS environment with professional scrub and SMART monitoring? Contact us — we set up proactive disk health monitoring and ensure that disk failures never catch you off guard again.

More on these topics:

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch