ZFS replication is one of the most powerful features of TrueNAS: at the block level, only changed data is transferred between two systems. Combined with a virtual air-gap — a dedicated, isolated backup network — this creates a backup solution that is resilient even against targeted ransomware attacks. This article covers the complete setup, inspired by the discussion in the T3 Podcast (Episode 058).
Fundamentals: ZFS Send/Receive
ZFS replication is based on the ZFS send/receive mechanism. A snapshot is serialized and sent to a target system, where it is restored as an identical copy.
Initial Replication (Full Send)
# Create the first snapshot
zfs snapshot tank/data@initial
# Send the full snapshot to the remote system
zfs send -Rv tank/data@initial | ssh backup-nas zfs recv -Fv backup-pool/data
The -R option (replication stream) sends all child datasets along. -v displays progress.
Incremental Replication
After the initial replication, only the differences between two snapshots are transferred:
# Create a new snapshot
zfs snapshot tank/data@2026-04-13
# Send only the delta
zfs send -Rvi tank/data@initial tank/data@2026-04-13 | \
ssh backup-nas zfs recv -Fv backup-pool/data
Incremental transfers are dramatically faster and more bandwidth-efficient than a full send. For a 10 TB dataset with 50 GB of daily changes, only 50 GB is transferred instead of 10 TB.
TrueNAS Replication Tasks
TrueNAS SCALE provides a convenient GUI for setting up replication tasks that automate ZFS send/receive.
Push vs Pull: Two Approaches
| Mode | Description | Data Flow |
|---|---|---|
| Push | Source NAS actively sends to target NAS | Source → Target |
| Pull | Target NAS actively pulls from source NAS | Source ← Target |
Push is the simpler approach: the production system sends its data to the backup system.
Pull offers a security advantage: the backup system controls the process. The production system needs no access to the backup system whatsoever. For air-gap scenarios, pull is the better choice.
SSH Key Configuration
Replication uses SSH as the transport channel. For automated tasks, SSH keys are used instead of passwords.
On the source system (for push):
# Generate SSH key (on TrueNAS via GUI or CLI)
ssh-keygen -t ed25519 -f /root/.ssh/replication_key -N ""
# Copy public key to the target system
ssh-copy-id -i /root/.ssh/replication_key.pub root@backup-nas
On the target system (for pull):
# Generate SSH key
ssh-keygen -t ed25519 -f /root/.ssh/replication_key -N ""
# Copy public key to the source system
ssh-copy-id -i /root/.ssh/replication_key.pub root@production-nas
In the TrueNAS GUI under Credentials > Backup Credentials > SSH Keypairs:
Create SSH Keypair:
├── Name: replication-to-backup
├── Private Key: (auto-generated)
└── Public Key: (deploy on target host)
Create SSH Connection:
├── Name: backup-nas
├── Method: Semi-automatic or Manual
├── Host: 10.0.99.10 (backup network)
├── Port: 22
├── Username: root
├── Private Key: replication-to-backup
└── Cipher: AES-256-GCM (faster than default)
Setting Up a Replication Task
In the TrueNAS GUI under Data Protection > Replication Tasks:
Replication Task:
├── Direction: Push (or Pull)
├── Transport: SSH
├── SSH Connection: backup-nas
├── Source:
│ ├── Dataset: tank/data
│ └── Recursive: ✓ (include child datasets)
├── Destination:
│ └── Dataset: backup-pool/data
├── Scheduling:
│ ├── Frequency: Daily
│ ├── Time: 02:00
│ └── Begin/End: 02:00 - 06:00
├── Snapshot:
│ ├── Naming Schema: auto-%Y-%m-%d_%H-%M
│ ├── Lifetime: 30 days (on source)
│ └── Lifetime (Remote): 90 days (on target)
├── Replication:
│ ├── Incremental: ✓
│ ├── Compressed: ✓ (LZ4 for transport)
│ └── Speed Limit: 100 MB/s (optional)
└── Retention Policy:
└── Same as source / Custom
Scheduling Strategies
| Strategy | Frequency | Suited For |
|---|---|---|
| Hourly | Every 1–4 hours | Business-critical data with low RPO |
| Daily | Once nightly | Standard for most environments |
| Weekly | Once per week (weekend) | Large datasets with little change |
| Tiered | Hourly (weekdays) + daily (weekends) | Flexible requirements |
# Example: Tiered scheduling
Replication Task 1 (business data):
├── Mon-Fri: Every 4 hours (08:00, 12:00, 16:00, 20:00)
└── Sat-Sun: Once at 02:00
Replication Task 2 (media data):
└── Sunday 03:00 (weekly)
Virtual Air-Gap Setup
Network Architecture
┌──────────────┐
│ OPNsense │
│ Firewall │
└──┬───────┬───┘
│ │
VLAN 10 │ │ VLAN 99
┌───────────┘ └───────────┐
│ │
┌──────┴──────┐ ┌──────┴──────┐
│ Production │ │ Backup NAS │
│ TrueNAS │ │ TrueNAS │
│ 10.0.10.20 │ Replication │ 10.0.99.20 │
│ │ ──────────────► │ │
└─────────────┘ └─────────────┘
Firewall Rules (OPNsense)
# VLAN 10 → VLAN 99 (Production → Backup):
ALLOW TCP 10.0.10.20 → 10.0.99.20 Port 22 (SSH/Replication)
DENY ANY 10.0.10.0/24 → 10.0.99.0/24 (block everything else)
# VLAN 99 → VLAN 10 (Backup → Production):
DENY ANY 10.0.99.0/24 → 10.0.10.0/24 (block ALL)
# VLAN 99 → Internet:
DENY ANY 10.0.99.0/24 → 0.0.0.0/0 (block ALL)
# VLAN 99 → VLAN 99 (Management):
ALLOW TCP 10.0.99.1 → 10.0.99.20 Port 443 (Web GUI from firewall only)
The backup NAS can:
- Accept incoming SSH connections from the production NAS (replication)
- Not communicate with the production network
- Not communicate with the internet
- Only be managed from the firewall IP
Pull Mode for Maximum Security
In pull mode, the backup NAS initiates the replication. The firewall rules change accordingly:
# VLAN 99 → VLAN 10 (Backup → Production, pull only):
ALLOW TCP 10.0.99.20 → 10.0.10.20 Port 22 (SSH/Replication)
DENY ANY 10.0.99.0/24 → 10.0.10.0/24 (block everything else)
# VLAN 10 → VLAN 99:
DENY ANY 10.0.10.0/24 → 10.0.99.0/24 (block ALL)
Advantage: the production system has zero network access to the backup system. Even with complete compromise of the production NAS, the attacker cannot reach the backup.
Dedicated Replication User with Reduced Privileges
Many tutorials use root for SSH connections between replication partners. It is convenient, but dangerous: if one system is compromised, the attacker can execute any command on the counterpart over the existing SSH session — including zfs destroy against the backup snapshots.
The correct approach is a dedicated user on the target system that receives only the ZFS privileges strictly required for replication — and nothing else.
# 1. Create the user on the backup NAS (no shell login)
pw useradd -n replrecv -s /sbin/nologin -m -c "Replication Receiver"
# 2. Place the SSH public key in ~replrecv/.ssh/authorized_keys
mkdir -p /home/replrecv/.ssh
echo "ssh-ed25519 AAAA... replication-source" >> /home/replrecv/.ssh/authorized_keys
chown -R replrecv:replrecv /home/replrecv/.ssh
chmod 700 /home/replrecv/.ssh
chmod 600 /home/replrecv/.ssh/authorized_keys
# 3. Delegate the minimum ZFS privileges on the target dataset
zfs allow replrecv create,mount,receive,hold backup-pool/data
# 4. Verify the delegated permissions
zfs allow backup-pool/data
What this user cannot do:
zfs destroy— delete snapshots or datasetszfs rollback— roll back snapshotszfs send— send data back to the production system- Log in via shell — no interactive access
- Modify files outside the delegated dataset
Even if the production system is compromised and the attacker gains the private SSH key, the backup NAS will only accept new snapshots — existing ones cannot be deleted. Retention on the target is controlled exclusively by a local snapshot task on the backup NAS, not by the replication user.
# On the backup NAS: independent retention policy (separate from source)
zfs set com.sun:auto-snapshot:daily=true backup-pool/data
zfs set com.sun:auto-snapshot:weekly=true backup-pool/data
# Periodic Snapshot Task in the GUI:
# Data Protection > Periodic Snapshot Tasks > Add
# Dataset: backup-pool/data
# Lifetime: 90 days (independent of source lifetime)
# Naming: local-%Y-%m-%d_%H-%M
For an additional layer, restrict the user’s SSH access inside ~/.ssh/authorized_keys:
command="/sbin/zfs receive backup-pool/data",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAAA... replication-source
With this line, the key can only execute zfs receive on this specific dataset — any other command is rejected by sshd. This is the strongest form of lockdown: even if an attacker holds the key, they can only trigger a data receive and nothing else.
Snapshot Retention
On the Source System (Production NAS)
Snapshots on the production system serve primarily for fast recovery (e.g., accidentally deleted files):
Snapshot Retention (Source):
├── Hourly: 24 (last 24 hours)
├── Daily: 7 (last week)
├── Weekly: 4 (last month)
└── Monthly: 0 (handled by backup NAS)
On the Target System (Backup NAS)
The backup NAS retains snapshots for long-term archival:
Snapshot Retention (Target):
├── Daily: 30 (last month)
├── Weekly: 12 (last quarter)
├── Monthly: 12 (last year)
└── Yearly: 3 (3-year archive)
Configuring Retention in TrueNAS
Data Protection > Periodic Snapshot Tasks:
├── Dataset: tank/data
├── Recursive: ✓
├── Schedule: Hourly
├── Lifetime: 24 Hours
├── Naming Schema: auto-%Y-%m-%d_%H-%M
└── Enabled: ✓
Data Protection > Periodic Snapshot Tasks (2):
├── Dataset: tank/data
├── Schedule: Daily (00:00)
├── Lifetime: 7 Days
└── Naming Schema: daily-%Y-%m-%d
Disaster Recovery Workflow
Scenario 1: Recover Individual Files
# On the production NAS: list available snapshots
zfs list -t snapshot tank/data | tail -10
# Mount snapshot and copy files
mkdir /mnt/restore
mount -t zfs tank/data@daily-2026-04-12 /mnt/restore
# Copy the file back
cp /mnt/restore/documents/important.docx /mnt/tank/data/documents/
# Unmount snapshot
umount /mnt/restore
Easier in the TrueNAS GUI: Datasets > data > Snapshots > Browse
Scenario 2: Restore a Complete Dataset
# On the backup NAS: send dataset to the production NAS
zfs send -Rv backup-pool/data@daily-2026-04-12 | \
ssh production-nas zfs recv -Fv tank/data-restored
# On the production NAS: rename old dataset and activate new one
zfs rename tank/data tank/data-corrupted
zfs rename tank/data-restored tank/data
# Check and restart SMB/NFS shares
systemctl restart smbd
Scenario 3: Complete NAS Failure (Bare-Metal Recovery)
- Install fresh TrueNAS on replacement hardware
- Import the ZFS pool (if disks still work) or:
- Start replication from the backup NAS to the new system
# On the backup NAS: full replication to the new system
zfs send -Rv backup-pool/data@daily-2026-04-12 | \
ssh new-production-nas zfs recv -Fv tank/data
- Restore SMB/NFS shares, users, and configuration
- Reconfigure replication tasks to point to the new system
Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
| Scenario | RPO | RTO |
|---|---|---|
| File recovery (local) | Last snapshot (1–4 hours) | Minutes |
| Dataset restore from backup | Last replicated snapshot (4–24 hours) | 1–4 hours |
| Bare-metal recovery | Last replicated snapshot | 4–12 hours |
Monitoring and Alerting
Monitor Replication Status
# Check last replication job
midclt call replication.query | python3 -m json.tool | grep -A5 state
# Latest snapshots on the backup NAS
zfs list -t snapshot -o name,creation backup-pool/data | tail -5
Configure Alerts
In TrueNAS under System > Alert Settings:
Alert Rules:
├── Replication Failed: ✓ Critical (email + notification)
├── Replication Delayed: ✓ Warning (> 2x normal interval)
├── Snapshot Space: ✓ Warning (> 80% of pool)
└── Pool Health: ✓ Critical (degraded/faulted)
Regular Restore Tests
A backup without a restore test is not a backup. Recommendation:
Restore Test Plan:
├── Weekly: Randomly restore individual files
├── Monthly: Restore complete dataset in test environment
├── Quarterly: Simulate bare-metal recovery
└── Documentation: Every test is documented with results
Performance Optimization
Compression During Transfer
# ZFS send with LZ4 compression over SSH
zfs send -Rv tank/data@snap | lz4 | ssh -c aes256-gcm@openssh.com backup-nas "lz4 -d | zfs recv -Fv backup-pool/data"
SSH Cipher Optimization
The SSH cipher has a significant impact on transfer speed:
| Cipher | Throughput (approx.) | Security |
|---|---|---|
aes256-gcm@openssh.com | 800–1200 MB/s | Very high |
chacha20-poly1305 | 400–600 MB/s | Very high |
aes128-ctr | 600–900 MB/s | High |
The cipher can be explicitly configured in the TrueNAS SSH connection settings.
Bandwidth Limiting
For environments where replication must not disrupt production operations:
Replication Task > Speed Limit:
├── Weekdays 08:00-18:00: 50 MB/s (business hours)
└── Nights/weekends: Unlimited
Conclusion
TrueNAS ZFS replication combined with a virtual air-gap is one of the most effective backup strategies for SMBs and home labs. Block-level, incremental transfer minimizes bandwidth and time, while network isolation protects the backup from ransomware. Pull mode offers the highest security since the production system has no access to the backup. With clearly defined retention policies and regular restore tests, you build a resilient backup system that can save business operations when disaster strikes.
More on these topics:
More articles
Backup Strategy for SMBs: Proxmox PBS + TrueNAS as a Reliable Backup Solution
Backup strategy for SMBs with Proxmox PBS and TrueNAS: implement the 3-2-1 rule, PBS as primary backup target, TrueNAS replication as offsite copy, retention policies, and automated restore tests.
TrueNAS with MCP: AI-Powered NAS Management via Natural Language
Connect TrueNAS with MCP (Model Context Protocol): AI assistants for NAS management, status queries, snapshot creation via chat, security considerations, and future outlook.
ZFS SLOG and Special VDEV: Accelerate Sync Writes and Optimize Metadata
ZFS SLOG (Separate Intent Log) and Special VDEV explained: accelerate sync writes, SLOG sizing, Special VDEV for metadata, hardware selection with Optane, and failure risks.