In nearly every Proxmox project for SMBs, the same fundamental question comes up sooner or later: Ceph or ZFS? Both solutions are mature, both are open source, both run natively on Proxmox VE 8.x. Yet the concepts differ fundamentally — and so do the requirements for hardware, network and day-2 operations. Get this wrong and you pay twice: once in hardware, once in operational overhead.
This article delivers a decision matrix for the typical SMB use case — three to five hypervisors, 10 to 50 VMs, one or two sites. We compare the architectures, show when each approach makes sense technically and economically, and highlight the pitfalls you will not find in any glossy datasheet.
Two fundamentally different concepts
ZFS and Ceph answer two different questions. ZFS is a filesystem with an integrated volume manager that runs on a single host. It combines the block layer, RAID, snapshots, compression and encryption in a tightly integrated stack. Ceph, by contrast, is a distributed object store that bundles multiple nodes into a single storage pool — with redundancy across hosts rather than just across disks.
| Property | ZFS | Ceph |
|---|---|---|
| Architecture | Single-host, local | Scale-out, distributed |
| Redundancy | Across disks (mirror, RAIDZ) | Across hosts (replication, EC) |
| Minimum nodes | 1 | 3 (better 4—5) |
| Shared storage | No (only via NFS/iSCSI export) | Yes (native) |
| Live migration in Proxmox | With replication every 1—15 min | Instant, zero delay |
| HA failure scenario | Seconds to minutes of data loss | No data loss |
| Single-thread performance | Very high | Medium |
| Distributed performance | Limited to one host | Scales linearly with nodes |
| Network requirement | 1—10 GbE | 25—100 GbE recommended |
| Minimum RAM per host | 8—16 GB for storage | 32—64 GB for storage |
| Operational complexity | Low | High |
At first glance this table seems to clearly favour Ceph — until you look at the hardware requirements and the operational complexity.
When ZFS is the right choice
For the vast majority of SMB setups, ZFS is the economically and technically appropriate solution. Specifically whenever:
- You run one to three hypervisors and no massive growth is planned
- Your workloads need single-thread performance — databases, ERP, Exchange
- You have tight budgets for storage networking (10 GbE is plenty)
- Your team is Linux-savvy but not a storage specialist
- Downtimes of a few minutes in a disaster scenario are acceptable
ZFS on Proxmox is in production within an hour. A typical pool layout for a hypervisor looks like this:
# Mirror of two NVMe for VMs (high IOPS demand)
zpool create -o ashift=12 -O compression=zstd -O atime=off \
vmpool mirror /dev/nvme0n1 /dev/nvme1n1
# RAIDZ2 across six HDDs for bulk data and backups
zpool create -o ashift=12 -O compression=zstd -O atime=off \
datapool raidz2 sda sdb sdc sdd sde sdf
# Special VDEV (metadata + small blocks) on NVMe
zpool add datapool special mirror /dev/nvme2n1 /dev/nvme3n1
For HA in a 2-node setup, Proxmox uses ZFS replication: every 1 to 15 minutes, an incremental zfs send is shipped to the second host. In a failover, the VM starts there from the latest snapshot — the data loss equals the replication interval. For most SMB workloads this is acceptable, especially when the application itself (database WAL, file locking) does not strictly require RPO=0.
We also use ZFS as the backend for TrueNAS — as a pure NAS appliance serving Proxmox via NFS or iSCSI. This gives a clean separation between compute and storage, without the complexity of a distributed system.
When Ceph is the right choice
Ceph plays to its strengths as soon as you need real shared storage across multiple hosts. Typical indicators:
- Four or more hypervisors are planned in the cluster, with further growth on the horizon
- Workloads must be live-migratable without delay (patch windows without VM stops)
- You operate container platforms (Kubernetes, OpenShift) with dynamic PVCs
- RPO=0 is a hard requirement — every write must be redundant
- You can invest in a 25 GbE or 100 GbE cluster network
- The ops team is ready to build Ceph expertise or to permanently buy in external know-how
A minimum-viable Ceph setup in a Proxmox cluster looks like this:
3 x Proxmox node, each:
- 2x NVMe (OS, ZFS mirror)
- 4-8x NVMe as Ceph OSDs (each 1.92-3.84 TB enterprise SSD)
- 64-128 GB RAM
- 2x 25 GbE for Ceph public/cluster network, separated
- 2x 10 GbE for VM traffic and management
The rule of thumb for Ceph RAM: 1 GB per TB of OSD capacity, plus at least 4 GB per OSD daemon. A node with 8 OSDs at 3.84 TB therefore needs around 60 GB for storage alone — VMs come on top.
Replication in Ceph is typically configured as size=3, min_size=2. This means three copies of every block on three different nodes, with writes requiring at least two confirmed replicas. Usable capacity is therefore one third of raw capacity. Erasure coding can reduce this overhead but is rarely the right choice for VM workloads in Proxmox.
The hardware trap
The most common mistake in the SMB space is to deploy Ceph on hardware that was specced for ZFS. Symptoms only emerge under load:
- Latencies above 20 ms in databases
- “Slow ops” warnings in cluster status
- High CPU load on nodes from RocksDB compaction
- Inconsistent performance depending on VM placement
Concretely problematic: consumer SSDs without power-loss protection (massive performance degradation under sync writes), shared 10 GbE for VM and Ceph traffic, too few OSDs per node (at least four are sensible), and HDDs as Ceph OSDs without a separate DB device on NVMe.
ZFS, by contrast, is happy with much more modest hardware. A hypervisor with 64 GB RAM, four enterprise NVMe in RAIDZ2 and 10 GbE is perfectly adequate for 20—30 typical SMB VMs.
Performance in comparison
The following figures come from a recent DATAZONE benchmark on identical server hardware (AMD EPYC 9354P, 256 GB RAM, 8x Kioxia CD8-V 3.84 TB NVMe per node, 25 GbE Mellanox):
| Workload | ZFS Mirror (1 host) | Ceph 3-node (size=3) |
|---|---|---|
| 4K random read, 1 thread | 195,000 IOPS | 28,000 IOPS |
| 4K random read, 64 threads | 410,000 IOPS | 1,350,000 IOPS |
| 4K random write, 1 thread | 88,000 IOPS | 11,000 IOPS |
| 4K random write, 64 threads | 195,000 IOPS | 720,000 IOPS |
| Sequential read, 1 MB | 12 GB/s | 18 GB/s (aggregate) |
| Average write latency | 0.4 ms | 1.8 ms |
The message is clear: for individual, latency-critical workloads, ZFS wins. As soon as many parallel workers or distributed applications come into play, Ceph wins through the aggregate bandwidth of multiple nodes.
Operational effort and escalation paths
In day-2 operations, the two solutions differ dramatically. ZFS maintenance generally boils down to monthly zpool scrub, the occasional disk replacement, and monitoring SMART values. A well-built ZFS pool runs for years without intervention.
Ceph, on the other hand, is a living system that requires active care: upgrade paths have to be planned (monitor before OSD before MDS), rebalancing storms after node failures need to be steered, PG counts must match cluster size, and diagnosis in failure scenarios is significantly more complex. A forgotten ceph osd set noout before a reboot can trigger an hour of rebalancing — with a noticeable hit to VM performance.
Realistically, plan for at least 4 hours per month of active maintenance for a Ceph cluster, plus regular patch management and capacity planning. For ZFS, 30 minutes per month is closer to reality. This difference, multiplied by your admin’s hourly rate, is part of the total cost of ownership.
For customers who do not want to run Ceph themselves, we operate it as part of our virtualization services, including monitoring, patch management and escalation to Proxmox enterprise support.
Decision matrix: short and concrete
| Your situation | Recommendation |
|---|---|
| 1—2 hypervisors, classic SMB | Local ZFS |
| 2 hypervisors with HA desire, RPO 5—15 min ok | ZFS + Proxmox replication |
| Central NAS, multiple PVE hosts | TrueNAS with ZFS, NFS/iSCSI |
| 3+ hypervisors, RPO=0, container platform | Ceph (3 nodes minimum, better 4—5) |
| High-frequency OLTP database, one main server | ZFS on NVMe, dedicated DB host |
| Growth to 100+ VMs foreseeable | Ceph with a clear scale-out plan |
| Tight budget, no 25 GbE network | ZFS, Ceph makes no sense here |
Hybrid setups are also common: Ceph as shared storage for most VMs, local ZFS for a latency-critical database VM that is explicitly pinned. The backup layout, too, benefits from ZFS on the PBS host even when production runs on Ceph.
Conclusion
There is no blanket answer to “Ceph or ZFS” — but there are clear indicators. ZFS is the right choice for 80 percent of typical SMB setups: simple in operation, predictable performance, low hardware requirements. Ceph only unfolds its value when real scaling, container workloads or RPO=0 are required — and when the organization is prepared to invest in matching hardware and know-how.
The most expensive option is always the one built for the wrong requirement: Ceph for a 2-node setup without 25 GbE brings only headaches; ZFS for a 6-node container cluster blocks sensible workflows. The right choice saves substantially more in total than the pure hardware delta.
DATAZONE supports you in the storage architecture of your Proxmox environment — from an honest requirements analysis through hardware selection to running Ceph clusters or ZFS-based TrueNAS systems. We have been building both worlds for SMBs for years and know the pitfalls that are missing from the glossy datasheet. Get in touch: /en/kontakt/.
More on these topics:
More articles
Handling ZFS Encryption Keys Right in TrueNAS Replication
TrueNAS replication of encrypted ZFS datasets: raw send, key management at the remote site and real-world recovery walk-through.
Proxmox Live Migration Between Clusters: Moving VMs Without Downtime
Cross-cluster live migration with Proxmox VE 8: how to use qm remote-migrate, prepare API tokens and certificates, and move VMs without any downtime.
Proxmox GPU Passthrough on Mid-Range Servers: Which Cards Are Worth It?
Proxmox GPU passthrough in 2026 for SMB: NVIDIA L4, L40S, AMD Instinct and used Tesla T4 compared. IOMMU, vfio-pci, AI inference and vGPU setup.