Remote Support Start download

Backup Test Day: Prove Restores in 30 Minutes

BackupDisaster RecoverySecurity

The only test that truly validates a backup is the restore. Nobody disputes that in theory. In practice, restores in many companies only happen when there’s a real fire — and that’s when the surprises arrive: the backup is there but unreadable. The tape library has thrown errors for three months that nobody prioritised. The M365 restore works in theory, but the OAuth tokens have expired.

For years we have recommended a simple format to our clients: the Backup Test Day. Monthly, ideally on a fixed date (e.g., second Wednesday). 30 minutes per restore scenario, four rotating scenarios — one quarter covered in four months. Documented, with checklist, with named responsibles.

This article shows how we set up the Backup Test Day in client projects — including the question of what must be documented and why.

Why 30 Minutes

The most common excuse against restore testing: “We don’t have time.” True — but 30 minutes per month every IT lead has. The trick: you don’t test a complete disaster-recovery run, you test a single scenario in its natural time frame.

A PBS restore of a small test VM actually takes 5–15 minutes — with documentation, verify, and cleanup you end up at 30 minutes. A TrueNAS snapshot restore of a single dataset or single file takes minutes. Four scenarios x 30 minutes = two hours per quarter — and after one year you have 48 documented restore tests, which any insurance and NIS2 auditor accepts.

The Four Standard Scenarios

We rotate through four monthly blocks:

MonthScenarioGoal
Jan / May / SepPBS restore of a VMVerify hypervisor and image backup layer
Feb / Jun / OctTrueNAS snapshot restoreVerify storage snapshots and ZFS replication
Mar / Jul / NovM365 restore (Veeam / AvePoint)Verify cloud backup layer
Apr / Aug / DecTape or cloud-tier restoreVerify air-gap / long-term backup

Each scenario has its own checklist — deliberately limited to 10–15 items. Tick them off and the scenario is validated.

Scenario 1: PBS Restore of a VM (30 Minutes)

Preparation (5 minutes)

  1. Pick a test VM ID — a production VM with a small footprint (e.g., a print server, 8–20 GB)
  2. Check target storage — where does the restore land? Which storage has enough free space?
  3. Clarify restore conflict strategy — we usually restore with a new VMID or into an empty storage to leave production alone

Restore (15 minutes)

# On a Proxmox node
proxmox-backup-client list \
  --repository pbs-user@pbs!datazone@backup-server:datastore1

# Last snapshot of the test VM
proxmox-backup-client snapshot list vm/142 \
  --repository pbs-user@pbs!datazone@backup-server:datastore1

# Restore via Proxmox WebUI (easier) or via API/CLI
# WebUI: Storage -> Backups -> [select backup] -> Restore -> leave target VMID empty

Or via web UI: Storage > Backups > [select backup] > Restore, set new VMID, disable “Start after restore”.

Verify (5 minutes)

  • VM boots cleanly (no boot loop)
  • Login works
  • Most important application on the VM starts (service check)
  • Disk usage matches backup-time state
  • Network connection optional (“isolated test net” is cleaner)

Cleanup + documentation (5 minutes)

qm destroy 999 --purge

Doc entry in internal wiki / Confluence:

Date: 2026-05-15
Scenario: PBS restore of VM 142 (print server)
Performed by: Florian Schermer
Result: OK
Duration: 28 minutes
Notes: None
Next test: 2026-09-15

Scenario 2: TrueNAS Snapshot Restore (30 Minutes)

Preparation (5 minutes)

  1. Pick a dataset — a production dataset with non-critical data (e.g., tank/shared/test)
  2. Check snapshot list — on TrueNAS WebUI or via CLI: zfs list -t snapshot tank/shared/test
  3. Pick a restore method: clone, rollback, or file-level restore

Restore (10 minutes)

Three variants, depending on the test goal:

Variant A — file level (most common):

# SSH to TrueNAS
zfs list -t snapshot tank/shared/test
# Find the path of the desired snapshot:
ls /mnt/tank/shared/test/.zfs/snapshot/auto-2026-05-14_02-00/
# Copy file back:
cp /mnt/tank/shared/test/.zfs/snapshot/auto-2026-05-14_02-00/important-file.xlsx \
   /mnt/tank/shared/test/important-file-restored.xlsx

Variant B — clone (for isolated tests):

zfs clone tank/shared/test@auto-2026-05-14_02-00 tank/shared/test-restore-2026-05-15
# Publish dataset as SMB share via WebUI, then test

Variant C — rollback (destructive, careful!):

Usually not for test day — rollback discards all snapshots after the target snapshot.

Verify (10 minutes)

  • File is readable and identical to expected value (hash check)
  • ZFS properties correct (compress, atime etc.)
  • For clone: SMB access works
  • Audit log on TrueNAS shows the operations

Cleanup + doc (5 minutes)

Delete restore files, destroy clones (zfs destroy ...), write doc entry.

Scenario 3: M365 Restore (30 Minutes)

Microsoft 365 has become the most important data store for many mid-market clients. Microsoft’s native recovery options are not enough for compliance requirements — Veeam Backup for Microsoft 365, AvePoint Cloud Backup, or comparable solutions back up Exchange, SharePoint, OneDrive, and Teams separately.

Preparation (5 minutes)

  1. Pick a test user — a dedicated test account in the M365 tenant (no production user)
  2. Pick a test item — a specific email, a SharePoint document, a OneDrive file
  3. Pick a restore target — original location, alternative location, or test mailbox

Restore (15 minutes)

In the Veeam M365 console:

  1. Open the backup job for the tenant
  2. Start the restore wizard (Exchange / SharePoint / OneDrive — depending on test)
  3. Search and select the test item
  4. Pick restore target (for test: alternative mailbox or local export)
  5. Start restore

Verify (5 minutes)

  • Item is visible at the target
  • Metadata correct (date, sender, etc.)
  • For email: attachments intact
  • For SharePoint/OneDrive: version history preserved

Cleanup + doc (5 minutes)

Delete test items from alternative mailbox, write doc.

Scenario 4: Tape or Cloud-Tier Restore (30 Minutes)

This is the most frequently forgotten scenario — and the one with the most hidden problems. Tapes that have sat in a bank vault for months may have block errors. Cloud tiers like S3 Glacier have egress costs and wait times of several hours that become surprisingly long in an emergency.

Preparation (5–10 minutes)

  1. Get the backup medium — fetch tape from the vault or prepare cloud restore job
  2. Pick a restore target — dedicated test volume, never production
  3. Pick a specific item to restore — a VM, a dataset, a file group

Restore (15–20 minutes)

This is where the stopwatch wobbles: tape restores need spool time. An LTO-8 tape spools to the block offset in 30–90 seconds, restore itself for 200 GB is roughly 15–20 minutes net. For cloud glacier tiers the restore staging alone can take several hours — that wait time is an important test-day finding.

If the test takes longer than 30 minutes, that’s not a failure — it’s exactly the information that helps in an emergency: “Tape restore of our 2 TB VM takes about 2 hours. Plan recovery time objective accordingly.”

Verify (5 minutes)

  • Restored data readable
  • Integrity checked (backup verify additionally if needed)
  • Restore time measured and documented

Cleanup + doc (5 minutes)

Delete test data, return tape, doc with time measurement.

The Documentation — Not Bureaucracy, But Proof

Each of the four scenario runs is documented in a restore log. We recommend a simple Markdown document or a Confluence page with a table:

DateScenarioPerformed byDurationResultNotesNext test
2026-01-14PBS VM restoreF. Schermer28 minOK2026-05-13
2026-02-11TrueNAS snapshotF. Schermer22 minOK2026-06-10
2026-03-11M365 ExchangeM. Bauer31 minOKRenew OAuth token2026-07-08
2026-04-08LTO-8 tapeF. Schermer47 minOKSpool 90 s, acceptable2026-08-12

This is the document you present to an insurance company, a NIS2 audit, a German GoBD audit, or an ISO 27001 audit. Four lines per quarter, a dozen per year.

What NIS2, GoBD, and ISO 27001 Say

  • NIS2 (implemented in § 30 BSIG): “appropriate and proportionate technical, operational, and organisational measures” including backup and recovery procedures — recoverability must be demonstrably tested
  • GoBD: procedural documentation must describe “data backup and recovery” — a restore-test log is the most evidence-strong artefact for this
  • ISO 27001, Annex A.12.3: “Information backup” — requirement for regular backup tests
  • GDPR Article 32(1)(c): “availability and resilience” of processing systems — recoverability after an incident

In all four standards the keyword is the same: regularly tested. Nobody defines “regularly” with a concrete interval — but monthly has demonstrably been sufficient and has never been criticised as too infrequent in audits.

What Goes Wrong — And What You Learn From It

From real test days of our customers, the most common findings:

  • PBS backup server full — retention was too generous, new backups couldn’t be written
  • TrueNAS replication stuck for 17 days — nobody had read the mail reports
  • M365 backup had expired app registrations — Veeam hadn’t pulled new data for months
  • Tape drive cleaning fault — cleaning counter had exceeded threshold
  • Restore performance worse than RTO — backup was OK, but 2 TB restore took 6 instead of 2 hours

Each of these would be fatal in an emergency. In a test day in May they are 30 minutes and a ticket.

What We Recommend at DATAZONE

Concrete implementation for a typical SMB with 50–200 staff:

  1. Fixed calendar slot — second Wednesday of the month, 14:00–14:30, IT leads block the slot
  2. Four scenarios rotated as above, checklist per scenario in the wiki
  3. Emergency documentation updated on the same cycle — who does what in case, with phone numbers
  4. Quarterly review with management — four test logs as appendix, 15-minute discussion
  5. External audit every 2 years — provider tests restore capability in a realistic exercise

If you keep that up, you don’t just have a better backup strategy — you have proof that holds up in any audit and any insurance case.

Conclusion

A backup that hasn’t been tested isn’t a backup — it’s an assumption. The Backup Test Day is the cheapest life insurance for IT that we know: four times 30 minutes per quarter. It fits in every calendar. What doesn’t fit is the day in November when the only answer to “can we restore the backup?” is “we’ve never tried”.

Sources

More on these topics:

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch