An IT outage costs small and medium-sized businesses an average of €5,600 per minute. Server failure, ransomware, hardware defect, or natural disaster — the question isn’t if but when an incident occurs. Disaster recovery (DR) defines how quickly systems are restored and how much data loss is acceptable. A well-thought-out DR plan makes the difference between a manageable incident and an existential crisis.
Two Metrics That Determine Everything
RPO — Recovery Point Objective
RPO defines the maximum tolerable data loss. An RPO of 1 hour means: In the event of an outage, at most the last hour’s data may be lost.
- RPO = 0: No data loss tolerated → Synchronous replication, cluster
- RPO = 1 hour: Hourly backups or snapshots
- RPO = 24 hours: Daily backup sufficient
- RPO = 1 week: Weekly backup (for archive data)
RTO — Recovery Time Objective
RTO defines the maximum tolerable downtime. An RTO of 4 hours means: Systems must be operational again within 4 hours of the outage.
- RTO = 0: No downtime tolerated → High availability cluster
- RTO = 1 hour: Hot standby systems
- RTO = 4 hours: Restore from local backup
- RTO = 24 hours: Restore from offsite backup
- RTO = 48+ hours: Hardware procurement required
RPO and RTO together determine the cost of the DR strategy. The shorter both values, the more expensive the infrastructure.
Risks and Their Impact
| Risk | Likelihood | Typical Downtime | Data Loss |
|---|---|---|---|
| Hardware failure (disk, PSU) | High | 2–8 hours | Low (with RAID) |
| Ransomware | Medium-High | 1–7 days | High (without offsite backup) |
| Software bug/update | Medium | 1–4 hours | Low |
| Power outage | Medium | 0–2 hours | Minimal (with UPS) |
| Human error | Medium | 1–24 hours | Variable |
| Fire/water damage | Low | 1–4 weeks | Total (without offsite) |
| Provider outage | Low | 2–24 hours | None |
Disaster Recovery Measures by Tier
Tier 1: Basic Protection (RPO 24h, RTO 24h)
The absolute baseline every business should implement:
Daily backup with offsite copy:
- Proxmox Backup Server backs up all VMs and containers incrementally daily
- TrueNAS replication transfers ZFS snapshots to a second location
- Restore testing at least quarterly
UPS (Uninterruptible Power Supply):
- Servers and network equipment on UPS
- Minimum 15 minutes bridging time for clean shutdown
- Automatic shutdown on extended outage
RAID redundancy:
- No single-disk systems in production
- RAID-Z2 or mirror for all drives
- Spare drives (hot spare or in stock) available
Cost: Minimal — Proxmox Backup Server and TrueNAS are open source; hardware for a backup system starting at approximately €2,000.
Tier 2: Extended (RPO 1h, RTO 4h)
For businesses whose operations depend on IT:
Hourly snapshots:
- ZFS snapshots on the production system (instant, space-efficient)
- Snapshot retention: hourly for 24h, daily for 30 days, weekly for 12 months
Prepared replacement server:
- A second server with Proxmox installation on-site
- PBS backups can be restored directly on the replacement server
- Alternatively: Proxmox cluster with 2 nodes
Documentation:
- Runbook with step-by-step instructions for each recovery scenario
- Network topology documented
- Credentials in encrypted password manager (offline copy)
Cost: Moderate — second server (~€3,000–8,000), no additional software budget needed.
Tier 3: High Availability (RPO ~0, RTO <1h)
For mission-critical systems with no tolerable downtime:
Proxmox HA cluster:
- 3-node cluster with quorum
- Automatic VM migration on node failure
- Shared storage (Ceph, TrueNAS iSCSI) or replicated local storage
Synchronous replication:
- ZFS send/receive every few minutes between nodes
- Proxmox storage replication between cluster nodes
- TrueNAS replication with minimal latency
Geo-redundancy:
- Second site with its own cluster
- Asynchronous replication over WAN (RPO: minutes)
- DNS failover or load balancing between sites
Cost: Significant — at least 3 servers, redundant network infrastructure, potentially a second location.
Disaster Recovery with Proxmox and TrueNAS
Proxmox Backup Server as the Backbone
PBS provides everything needed for DR:
- Incremental backups: Only changed blocks are transferred — hourly backups are practical
- Deduplication: Identical data blocks are stored only once — massive space savings
- Encryption: Backups can be client-side encrypted — secure even on remote storage
- Verify: Automatic integrity checking of all backups
- Fast restore: Restore individual VMs or containers in minutes
TrueNAS as Offsite Target
TrueNAS with ZFS is excellently suited as an offsite backup target:
- ZFS replication: Efficient block-level replication over SSH
- Immutable snapshots: ZFS snapshots can be marked read-only — ransomware-proof
- Compression: LZ4/ZSTD saves bandwidth and storage space
- Alerting: TrueNAS alerts on replication failures
Example Setup for an SMB
Site A (Production):
├── Proxmox VE Cluster (2-3 nodes)
│ ├── Production VMs and containers
│ └── Local PBS → Hourly backups
└── TrueNAS → ZFS snapshots every 15 minutes
Site B (Offsite):
├── TrueNAS → Receives replication from Site A
│ └── Immutable snapshots (14-day retention)
└── PBS Offsite → Receives encrypted backup sync
RPO: 15 minutes (ZFS replication) to 1 hour (PBS) RTO: 1–4 hours (restore to Proxmox at Site A or B)
The DR Plan: What It Must Include
A written DR plan should cover the following:
- Contact list: Who is reachable in an emergency? (IT service provider, management, provider)
- Escalation levels: When is the DR plan activated?
- Prioritization: Which systems are restored first? (ERP before wiki, email before archive)
- Restore instructions: Step-by-step for each server/service
- Backup verification: Where are the backups? How are they accessed?
- Hardware procurement: Where is replacement hardware ordered in an emergency?
- Communication: Who informs customers and employees?
- Test interval: When is the DR plan tested?
Frequently Asked Questions
What does a disaster recovery plan cost?
The planning itself is an investment of 1–3 days of consulting. The infrastructure (second server, offsite storage) typically costs €5,000–15,000 one-time. Compared to the average cost of a multi-day outage (€50,000–500,000), this is a worthwhile investment.
How often should the DR plan be tested?
At least annually with a full restore test. Quarterly with a partial test (restoring a single VM). After every major infrastructure change, update and test the plan.
Is cloud backup sufficient as a DR strategy?
Cloud backup fulfills the offsite criterion of the 3-2-1 rule. But: Restoring from the cloud takes hours to days depending on data volume (bandwidth limitation). For short RTOs, local restore from PBS is significantly faster.
What’s the difference between backup and disaster recovery?
Backup secures data. Disaster recovery encompasses the entire restoration process: servers, network, services, access, communication. A backup without a DR plan is like insurance without knowing where the policy document is.
Want to create a disaster recovery plan for your business? Contact us — we analyze your infrastructure and implement a backup strategy matching your RPO/RTO requirements.
More on these topics:
More articles
Vaultwarden: Self-Hosted Password Manager for Teams
Run Vaultwarden as a self-hosted password manager: Docker deployment, reverse proxy, SMTP, 2FA enforcement, and backup strategy — the complete guide for teams.
Fail2ban: Automating Brute-Force Protection for Linux Servers
Install and configure Fail2ban: log parsing, jail.local, protecting SSH, Nginx, Postfix, and Dovecot, whitelists, email alerts, and a comparison with CrowdSec, sshguard, and CSF.
TrueNAS Dataset Encryption: ZFS Encryption in Practice
Understanding and implementing TrueNAS ZFS Encryption: dataset vs. pool encryption, passphrase vs. key file, key management, and performance impact with AES-NI.