The only test that truly validates a backup is the restore. Nobody disputes that in theory. In practice, restores in many companies only happen when there’s a real fire — and that’s when the surprises arrive: the backup is there but unreadable. The tape library has thrown errors for three months that nobody prioritised. The M365 restore works in theory, but the OAuth tokens have expired.
For years we have recommended a simple format to our clients: the Backup Test Day. Monthly, ideally on a fixed date (e.g., second Wednesday). 30 minutes per restore scenario, four rotating scenarios — one quarter covered in four months. Documented, with checklist, with named responsibles.
This article shows how we set up the Backup Test Day in client projects — including the question of what must be documented and why.
Why 30 Minutes
The most common excuse against restore testing: “We don’t have time.” True — but 30 minutes per month every IT lead has. The trick: you don’t test a complete disaster-recovery run, you test a single scenario in its natural time frame.
A PBS restore of a small test VM actually takes 5–15 minutes — with documentation, verify, and cleanup you end up at 30 minutes. A TrueNAS snapshot restore of a single dataset or single file takes minutes. Four scenarios x 30 minutes = two hours per quarter — and after one year you have 48 documented restore tests, which any insurance and NIS2 auditor accepts.
The Four Standard Scenarios
We rotate through four monthly blocks:
| Month | Scenario | Goal |
|---|---|---|
| Jan / May / Sep | PBS restore of a VM | Verify hypervisor and image backup layer |
| Feb / Jun / Oct | TrueNAS snapshot restore | Verify storage snapshots and ZFS replication |
| Mar / Jul / Nov | M365 restore (Veeam / AvePoint) | Verify cloud backup layer |
| Apr / Aug / Dec | Tape or cloud-tier restore | Verify air-gap / long-term backup |
Each scenario has its own checklist — deliberately limited to 10–15 items. Tick them off and the scenario is validated.
Scenario 1: PBS Restore of a VM (30 Minutes)
Preparation (5 minutes)
- Pick a test VM ID — a production VM with a small footprint (e.g., a print server, 8–20 GB)
- Check target storage — where does the restore land? Which storage has enough free space?
- Clarify restore conflict strategy — we usually restore with a new VMID or into an empty storage to leave production alone
Restore (15 minutes)
# On a Proxmox node
proxmox-backup-client list \
--repository pbs-user@pbs!datazone@backup-server:datastore1
# Last snapshot of the test VM
proxmox-backup-client snapshot list vm/142 \
--repository pbs-user@pbs!datazone@backup-server:datastore1
# Restore via Proxmox WebUI (easier) or via API/CLI
# WebUI: Storage -> Backups -> [select backup] -> Restore -> leave target VMID empty
Or via web UI: Storage > Backups > [select backup] > Restore, set new VMID, disable “Start after restore”.
Verify (5 minutes)
- VM boots cleanly (no boot loop)
- Login works
- Most important application on the VM starts (service check)
- Disk usage matches backup-time state
- Network connection optional (“isolated test net” is cleaner)
Cleanup + documentation (5 minutes)
qm destroy 999 --purge
Doc entry in internal wiki / Confluence:
Date: 2026-05-15
Scenario: PBS restore of VM 142 (print server)
Performed by: Florian Schermer
Result: OK
Duration: 28 minutes
Notes: None
Next test: 2026-09-15
Scenario 2: TrueNAS Snapshot Restore (30 Minutes)
Preparation (5 minutes)
- Pick a dataset — a production dataset with non-critical data (e.g.,
tank/shared/test) - Check snapshot list — on TrueNAS WebUI or via CLI:
zfs list -t snapshot tank/shared/test - Pick a restore method: clone, rollback, or file-level restore
Restore (10 minutes)
Three variants, depending on the test goal:
Variant A — file level (most common):
# SSH to TrueNAS
zfs list -t snapshot tank/shared/test
# Find the path of the desired snapshot:
ls /mnt/tank/shared/test/.zfs/snapshot/auto-2026-05-14_02-00/
# Copy file back:
cp /mnt/tank/shared/test/.zfs/snapshot/auto-2026-05-14_02-00/important-file.xlsx \
/mnt/tank/shared/test/important-file-restored.xlsx
Variant B — clone (for isolated tests):
zfs clone tank/shared/test@auto-2026-05-14_02-00 tank/shared/test-restore-2026-05-15
# Publish dataset as SMB share via WebUI, then test
Variant C — rollback (destructive, careful!):
Usually not for test day — rollback discards all snapshots after the target snapshot.
Verify (10 minutes)
- File is readable and identical to expected value (hash check)
- ZFS properties correct (compress, atime etc.)
- For clone: SMB access works
- Audit log on TrueNAS shows the operations
Cleanup + doc (5 minutes)
Delete restore files, destroy clones (zfs destroy ...), write doc entry.
Scenario 3: M365 Restore (30 Minutes)
Microsoft 365 has become the most important data store for many mid-market clients. Microsoft’s native recovery options are not enough for compliance requirements — Veeam Backup for Microsoft 365, AvePoint Cloud Backup, or comparable solutions back up Exchange, SharePoint, OneDrive, and Teams separately.
Preparation (5 minutes)
- Pick a test user — a dedicated test account in the M365 tenant (no production user)
- Pick a test item — a specific email, a SharePoint document, a OneDrive file
- Pick a restore target — original location, alternative location, or test mailbox
Restore (15 minutes)
In the Veeam M365 console:
- Open the backup job for the tenant
- Start the restore wizard (Exchange / SharePoint / OneDrive — depending on test)
- Search and select the test item
- Pick restore target (for test: alternative mailbox or local export)
- Start restore
Verify (5 minutes)
- Item is visible at the target
- Metadata correct (date, sender, etc.)
- For email: attachments intact
- For SharePoint/OneDrive: version history preserved
Cleanup + doc (5 minutes)
Delete test items from alternative mailbox, write doc.
Scenario 4: Tape or Cloud-Tier Restore (30 Minutes)
This is the most frequently forgotten scenario — and the one with the most hidden problems. Tapes that have sat in a bank vault for months may have block errors. Cloud tiers like S3 Glacier have egress costs and wait times of several hours that become surprisingly long in an emergency.
Preparation (5–10 minutes)
- Get the backup medium — fetch tape from the vault or prepare cloud restore job
- Pick a restore target — dedicated test volume, never production
- Pick a specific item to restore — a VM, a dataset, a file group
Restore (15–20 minutes)
This is where the stopwatch wobbles: tape restores need spool time. An LTO-8 tape spools to the block offset in 30–90 seconds, restore itself for 200 GB is roughly 15–20 minutes net. For cloud glacier tiers the restore staging alone can take several hours — that wait time is an important test-day finding.
If the test takes longer than 30 minutes, that’s not a failure — it’s exactly the information that helps in an emergency: “Tape restore of our 2 TB VM takes about 2 hours. Plan recovery time objective accordingly.”
Verify (5 minutes)
- Restored data readable
- Integrity checked (backup verify additionally if needed)
- Restore time measured and documented
Cleanup + doc (5 minutes)
Delete test data, return tape, doc with time measurement.
The Documentation — Not Bureaucracy, But Proof
Each of the four scenario runs is documented in a restore log. We recommend a simple Markdown document or a Confluence page with a table:
| Date | Scenario | Performed by | Duration | Result | Notes | Next test |
|---|---|---|---|---|---|---|
| 2026-01-14 | PBS VM restore | F. Schermer | 28 min | OK | — | 2026-05-13 |
| 2026-02-11 | TrueNAS snapshot | F. Schermer | 22 min | OK | — | 2026-06-10 |
| 2026-03-11 | M365 Exchange | M. Bauer | 31 min | OK | Renew OAuth token | 2026-07-08 |
| 2026-04-08 | LTO-8 tape | F. Schermer | 47 min | OK | Spool 90 s, acceptable | 2026-08-12 |
This is the document you present to an insurance company, a NIS2 audit, a German GoBD audit, or an ISO 27001 audit. Four lines per quarter, a dozen per year.
What NIS2, GoBD, and ISO 27001 Say
- NIS2 (implemented in § 30 BSIG): “appropriate and proportionate technical, operational, and organisational measures” including backup and recovery procedures — recoverability must be demonstrably tested
- GoBD: procedural documentation must describe “data backup and recovery” — a restore-test log is the most evidence-strong artefact for this
- ISO 27001, Annex A.12.3: “Information backup” — requirement for regular backup tests
- GDPR Article 32(1)(c): “availability and resilience” of processing systems — recoverability after an incident
In all four standards the keyword is the same: regularly tested. Nobody defines “regularly” with a concrete interval — but monthly has demonstrably been sufficient and has never been criticised as too infrequent in audits.
What Goes Wrong — And What You Learn From It
From real test days of our customers, the most common findings:
- PBS backup server full — retention was too generous, new backups couldn’t be written
- TrueNAS replication stuck for 17 days — nobody had read the mail reports
- M365 backup had expired app registrations — Veeam hadn’t pulled new data for months
- Tape drive cleaning fault — cleaning counter had exceeded threshold
- Restore performance worse than RTO — backup was OK, but 2 TB restore took 6 instead of 2 hours
Each of these would be fatal in an emergency. In a test day in May they are 30 minutes and a ticket.
What We Recommend at DATAZONE
Concrete implementation for a typical SMB with 50–200 staff:
- Fixed calendar slot — second Wednesday of the month, 14:00–14:30, IT leads block the slot
- Four scenarios rotated as above, checklist per scenario in the wiki
- Emergency documentation updated on the same cycle — who does what in case, with phone numbers
- Quarterly review with management — four test logs as appendix, 15-minute discussion
- External audit every 2 years — provider tests restore capability in a realistic exercise
If you keep that up, you don’t just have a better backup strategy — you have proof that holds up in any audit and any insurance case.
Related DATAZONE Articles
- Proxmox Backup Server 4.1: verify, sync, garbage collection
- Restic: encrypted backups
- Ransomware 2026: protection measures
- TrueNAS data security against ransomware
Conclusion
A backup that hasn’t been tested isn’t a backup — it’s an assumption. The Backup Test Day is the cheapest life insurance for IT that we know: four times 30 minutes per quarter. It fits in every calendar. What doesn’t fit is the day in November when the only answer to “can we restore the backup?” is “we’ve never tried”.
Sources
More on these topics:
More articles
Home Office IT: Securely Connecting Remote Employees
Secure home office for SMBs: VPN with OPNsense, MDM, RDP gateway, Vaultwarden, MFA with Yubikey. Configuration blueprint from laptop via VPN to terminal session.
TrueNAS Cloud Sync to Backblaze B2: Affordable Offsite Backup
TrueNAS Cloud Sync to Backblaze B2 as an offsite backup target: B2 application key, bucket setup, push mode, encryption and bandwidth management. With best practices for SMBs.
Authentik: Single Sign-On for Self-Hosted Services
Authentik as self-hosted SSO and identity provider: OIDC, SAML2, LDAP, MFA. Example setup with Nextcloud, GitLab and Vaultwarden — plus comparison with Authelia.