Remote Support Start download

TrueNAS Replication: Air-Gap Backup with ZFS Send/Receive

TrueNASBackupZFSSecurity
TrueNAS Replication: Air-Gap Backup with ZFS Send/Receive

ZFS replication is one of the most powerful features of TrueNAS: at the block level, only changed data is transferred between two systems. Combined with a virtual air-gap — a dedicated, isolated backup network — this creates a backup solution that is resilient even against targeted ransomware attacks. This article covers the complete setup, inspired by the discussion in the T3 Podcast (Episode 058).

Fundamentals: ZFS Send/Receive

ZFS replication is based on the ZFS send/receive mechanism. A snapshot is serialized and sent to a target system, where it is restored as an identical copy.

Initial Replication (Full Send)

# Create the first snapshot
zfs snapshot tank/data@initial

# Send the full snapshot to the remote system
zfs send -Rv tank/data@initial | ssh backup-nas zfs recv -Fv backup-pool/data

The -R option (replication stream) sends all child datasets along. -v displays progress.

Incremental Replication

After the initial replication, only the differences between two snapshots are transferred:

# Create a new snapshot
zfs snapshot tank/data@2026-04-13

# Send only the delta
zfs send -Rvi tank/data@initial tank/data@2026-04-13 | \
  ssh backup-nas zfs recv -Fv backup-pool/data

Incremental transfers are dramatically faster and more bandwidth-efficient than a full send. For a 10 TB dataset with 50 GB of daily changes, only 50 GB is transferred instead of 10 TB.

TrueNAS Replication Tasks

TrueNAS SCALE provides a convenient GUI for setting up replication tasks that automate ZFS send/receive.

Push vs Pull: Two Approaches

ModeDescriptionData Flow
PushSource NAS actively sends to target NASSource → Target
PullTarget NAS actively pulls from source NASSource ← Target

Push is the simpler approach: the production system sends its data to the backup system.

Pull offers a security advantage: the backup system controls the process. The production system needs no access to the backup system whatsoever. For air-gap scenarios, pull is the better choice.

SSH Key Configuration

Replication uses SSH as the transport channel. For automated tasks, SSH keys are used instead of passwords.

On the source system (for push):

# Generate SSH key (on TrueNAS via GUI or CLI)
ssh-keygen -t ed25519 -f /root/.ssh/replication_key -N ""

# Copy public key to the target system
ssh-copy-id -i /root/.ssh/replication_key.pub root@backup-nas

On the target system (for pull):

# Generate SSH key
ssh-keygen -t ed25519 -f /root/.ssh/replication_key -N ""

# Copy public key to the source system
ssh-copy-id -i /root/.ssh/replication_key.pub root@production-nas

In the TrueNAS GUI under Credentials > Backup Credentials > SSH Keypairs:

Create SSH Keypair:
├── Name: replication-to-backup
├── Private Key: (auto-generated)
└── Public Key: (deploy on target host)

Create SSH Connection:
├── Name: backup-nas
├── Method: Semi-automatic or Manual
├── Host: 10.0.99.10 (backup network)
├── Port: 22
├── Username: root
├── Private Key: replication-to-backup
└── Cipher: AES-256-GCM (faster than default)

Setting Up a Replication Task

In the TrueNAS GUI under Data Protection > Replication Tasks:

Replication Task:
├── Direction: Push (or Pull)
├── Transport: SSH
├── SSH Connection: backup-nas
├── Source:
│   ├── Dataset: tank/data
│   └── Recursive: ✓ (include child datasets)
├── Destination:
│   └── Dataset: backup-pool/data
├── Scheduling:
│   ├── Frequency: Daily
│   ├── Time: 02:00
│   └── Begin/End: 02:00 - 06:00
├── Snapshot:
│   ├── Naming Schema: auto-%Y-%m-%d_%H-%M
│   ├── Lifetime: 30 days (on source)
│   └── Lifetime (Remote): 90 days (on target)
├── Replication:
│   ├── Incremental: ✓
│   ├── Compressed: ✓ (LZ4 for transport)
│   └── Speed Limit: 100 MB/s (optional)
└── Retention Policy:
    └── Same as source / Custom

Scheduling Strategies

StrategyFrequencySuited For
HourlyEvery 1–4 hoursBusiness-critical data with low RPO
DailyOnce nightlyStandard for most environments
WeeklyOnce per week (weekend)Large datasets with little change
TieredHourly (weekdays) + daily (weekends)Flexible requirements
# Example: Tiered scheduling
Replication Task 1 (business data):
├── Mon-Fri: Every 4 hours (08:00, 12:00, 16:00, 20:00)
└── Sat-Sun: Once at 02:00

Replication Task 2 (media data):
└── Sunday 03:00 (weekly)

Virtual Air-Gap Setup

Network Architecture

                    ┌──────────────┐
                    │  OPNsense    │
                    │  Firewall    │
                    └──┬───────┬───┘
                       │       │
              VLAN 10  │       │  VLAN 99
           ┌───────────┘       └───────────┐
           │                               │
    ┌──────┴──────┐                 ┌──────┴──────┐
    │ Production  │                 │ Backup NAS  │
    │ TrueNAS     │                 │ TrueNAS     │
    │ 10.0.10.20  │   Replication   │ 10.0.99.20  │
    │             │ ──────────────► │             │
    └─────────────┘                 └─────────────┘

Firewall Rules (OPNsense)

# VLAN 10 → VLAN 99 (Production → Backup):
ALLOW  TCP  10.0.10.20  →  10.0.99.20  Port 22 (SSH/Replication)
DENY   ANY  10.0.10.0/24 →  10.0.99.0/24  (block everything else)

# VLAN 99 → VLAN 10 (Backup → Production):
DENY   ANY  10.0.99.0/24 →  10.0.10.0/24  (block ALL)

# VLAN 99 → Internet:
DENY   ANY  10.0.99.0/24 →  0.0.0.0/0     (block ALL)

# VLAN 99 → VLAN 99 (Management):
ALLOW  TCP  10.0.99.1    →  10.0.99.20  Port 443 (Web GUI from firewall only)

The backup NAS can:

  • Accept incoming SSH connections from the production NAS (replication)
  • Not communicate with the production network
  • Not communicate with the internet
  • Only be managed from the firewall IP

Pull Mode for Maximum Security

In pull mode, the backup NAS initiates the replication. The firewall rules change accordingly:

# VLAN 99 → VLAN 10 (Backup → Production, pull only):
ALLOW  TCP  10.0.99.20  →  10.0.10.20  Port 22 (SSH/Replication)
DENY   ANY  10.0.99.0/24 →  10.0.10.0/24  (block everything else)

# VLAN 10 → VLAN 99:
DENY   ANY  10.0.10.0/24 →  10.0.99.0/24  (block ALL)

Advantage: the production system has zero network access to the backup system. Even with complete compromise of the production NAS, the attacker cannot reach the backup.

Dedicated Replication User with Reduced Privileges

Many tutorials use root for SSH connections between replication partners. It is convenient, but dangerous: if one system is compromised, the attacker can execute any command on the counterpart over the existing SSH session — including zfs destroy against the backup snapshots.

The correct approach is a dedicated user on the target system that receives only the ZFS privileges strictly required for replication — and nothing else.

# 1. Create the user on the backup NAS (no shell login)
pw useradd -n replrecv -s /sbin/nologin -m -c "Replication Receiver"

# 2. Place the SSH public key in ~replrecv/.ssh/authorized_keys
mkdir -p /home/replrecv/.ssh
echo "ssh-ed25519 AAAA... replication-source" >> /home/replrecv/.ssh/authorized_keys
chown -R replrecv:replrecv /home/replrecv/.ssh
chmod 700 /home/replrecv/.ssh
chmod 600 /home/replrecv/.ssh/authorized_keys

# 3. Delegate the minimum ZFS privileges on the target dataset
zfs allow replrecv create,mount,receive,hold backup-pool/data

# 4. Verify the delegated permissions
zfs allow backup-pool/data

What this user cannot do:

  • zfs destroy — delete snapshots or datasets
  • zfs rollback — roll back snapshots
  • zfs send — send data back to the production system
  • Log in via shell — no interactive access
  • Modify files outside the delegated dataset

Even if the production system is compromised and the attacker gains the private SSH key, the backup NAS will only accept new snapshots — existing ones cannot be deleted. Retention on the target is controlled exclusively by a local snapshot task on the backup NAS, not by the replication user.

# On the backup NAS: independent retention policy (separate from source)
zfs set com.sun:auto-snapshot:daily=true backup-pool/data
zfs set com.sun:auto-snapshot:weekly=true backup-pool/data

# Periodic Snapshot Task in the GUI:
# Data Protection > Periodic Snapshot Tasks > Add
#   Dataset: backup-pool/data
#   Lifetime: 90 days (independent of source lifetime)
#   Naming: local-%Y-%m-%d_%H-%M

For an additional layer, restrict the user’s SSH access inside ~/.ssh/authorized_keys:

command="/sbin/zfs receive backup-pool/data",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAAA... replication-source

With this line, the key can only execute zfs receive on this specific dataset — any other command is rejected by sshd. This is the strongest form of lockdown: even if an attacker holds the key, they can only trigger a data receive and nothing else.

Snapshot Retention

On the Source System (Production NAS)

Snapshots on the production system serve primarily for fast recovery (e.g., accidentally deleted files):

Snapshot Retention (Source):
├── Hourly:   24 (last 24 hours)
├── Daily:    7  (last week)
├── Weekly:   4  (last month)
└── Monthly:  0  (handled by backup NAS)

On the Target System (Backup NAS)

The backup NAS retains snapshots for long-term archival:

Snapshot Retention (Target):
├── Daily:    30  (last month)
├── Weekly:   12  (last quarter)
├── Monthly:  12  (last year)
└── Yearly:   3   (3-year archive)

Configuring Retention in TrueNAS

Data Protection > Periodic Snapshot Tasks:
├── Dataset: tank/data
├── Recursive: ✓
├── Schedule: Hourly
├── Lifetime: 24 Hours
├── Naming Schema: auto-%Y-%m-%d_%H-%M
└── Enabled: ✓

Data Protection > Periodic Snapshot Tasks (2):
├── Dataset: tank/data
├── Schedule: Daily (00:00)
├── Lifetime: 7 Days
└── Naming Schema: daily-%Y-%m-%d

Disaster Recovery Workflow

Scenario 1: Recover Individual Files

# On the production NAS: list available snapshots
zfs list -t snapshot tank/data | tail -10

# Mount snapshot and copy files
mkdir /mnt/restore
mount -t zfs tank/data@daily-2026-04-12 /mnt/restore

# Copy the file back
cp /mnt/restore/documents/important.docx /mnt/tank/data/documents/

# Unmount snapshot
umount /mnt/restore

Easier in the TrueNAS GUI: Datasets > data > Snapshots > Browse

Scenario 2: Restore a Complete Dataset

# On the backup NAS: send dataset to the production NAS
zfs send -Rv backup-pool/data@daily-2026-04-12 | \
  ssh production-nas zfs recv -Fv tank/data-restored

# On the production NAS: rename old dataset and activate new one
zfs rename tank/data tank/data-corrupted
zfs rename tank/data-restored tank/data

# Check and restart SMB/NFS shares
systemctl restart smbd

Scenario 3: Complete NAS Failure (Bare-Metal Recovery)

  1. Install fresh TrueNAS on replacement hardware
  2. Import the ZFS pool (if disks still work) or:
  3. Start replication from the backup NAS to the new system
# On the backup NAS: full replication to the new system
zfs send -Rv backup-pool/data@daily-2026-04-12 | \
  ssh new-production-nas zfs recv -Fv tank/data
  1. Restore SMB/NFS shares, users, and configuration
  2. Reconfigure replication tasks to point to the new system

Recovery Time Objective (RTO) and Recovery Point Objective (RPO)

ScenarioRPORTO
File recovery (local)Last snapshot (1–4 hours)Minutes
Dataset restore from backupLast replicated snapshot (4–24 hours)1–4 hours
Bare-metal recoveryLast replicated snapshot4–12 hours

Monitoring and Alerting

Monitor Replication Status

# Check last replication job
midclt call replication.query | python3 -m json.tool | grep -A5 state

# Latest snapshots on the backup NAS
zfs list -t snapshot -o name,creation backup-pool/data | tail -5

Configure Alerts

In TrueNAS under System > Alert Settings:

Alert Rules:
├── Replication Failed:    ✓ Critical (email + notification)
├── Replication Delayed:   ✓ Warning (> 2x normal interval)
├── Snapshot Space:        ✓ Warning (> 80% of pool)
└── Pool Health:           ✓ Critical (degraded/faulted)

Regular Restore Tests

A backup without a restore test is not a backup. Recommendation:

Restore Test Plan:
├── Weekly:     Randomly restore individual files
├── Monthly:    Restore complete dataset in test environment
├── Quarterly:  Simulate bare-metal recovery
└── Documentation: Every test is documented with results

Performance Optimization

Compression During Transfer

# ZFS send with LZ4 compression over SSH
zfs send -Rv tank/data@snap | lz4 | ssh -c aes256-gcm@openssh.com backup-nas "lz4 -d | zfs recv -Fv backup-pool/data"

SSH Cipher Optimization

The SSH cipher has a significant impact on transfer speed:

CipherThroughput (approx.)Security
aes256-gcm@openssh.com800–1200 MB/sVery high
chacha20-poly1305400–600 MB/sVery high
aes128-ctr600–900 MB/sHigh

The cipher can be explicitly configured in the TrueNAS SSH connection settings.

Bandwidth Limiting

For environments where replication must not disrupt production operations:

Replication Task > Speed Limit:
├── Weekdays 08:00-18:00:  50 MB/s  (business hours)
└── Nights/weekends:       Unlimited

Conclusion

TrueNAS ZFS replication combined with a virtual air-gap is one of the most effective backup strategies for SMBs and home labs. Block-level, incremental transfer minimizes bandwidth and time, while network isolation protects the backup from ransomware. Pull mode offers the highest security since the production system has no access to the backup. With clearly defined retention policies and regular restore tests, you build a resilient backup system that can save business operations when disaster strikes.

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch