Remote Support Start download

Ceph vs. ZFS: When to Pick Which for SMBs

CephZFSStorageProxmox
Ceph vs. ZFS: When to Pick Which for SMBs

In nearly every Proxmox project for SMBs, the same fundamental question comes up sooner or later: Ceph or ZFS? Both solutions are mature, both are open source, both run natively on Proxmox VE 8.x. Yet the concepts differ fundamentally — and so do the requirements for hardware, network and day-2 operations. Get this wrong and you pay twice: once in hardware, once in operational overhead.

This article delivers a decision matrix for the typical SMB use case — three to five hypervisors, 10 to 50 VMs, one or two sites. We compare the architectures, show when each approach makes sense technically and economically, and highlight the pitfalls you will not find in any glossy datasheet.

Two fundamentally different concepts

ZFS and Ceph answer two different questions. ZFS is a filesystem with an integrated volume manager that runs on a single host. It combines the block layer, RAID, snapshots, compression and encryption in a tightly integrated stack. Ceph, by contrast, is a distributed object store that bundles multiple nodes into a single storage pool — with redundancy across hosts rather than just across disks.

PropertyZFSCeph
ArchitectureSingle-host, localScale-out, distributed
RedundancyAcross disks (mirror, RAIDZ)Across hosts (replication, EC)
Minimum nodes13 (better 4—5)
Shared storageNo (only via NFS/iSCSI export)Yes (native)
Live migration in ProxmoxWith replication every 1—15 minInstant, zero delay
HA failure scenarioSeconds to minutes of data lossNo data loss
Single-thread performanceVery highMedium
Distributed performanceLimited to one hostScales linearly with nodes
Network requirement1—10 GbE25—100 GbE recommended
Minimum RAM per host8—16 GB for storage32—64 GB for storage
Operational complexityLowHigh

At first glance this table seems to clearly favour Ceph — until you look at the hardware requirements and the operational complexity.

When ZFS is the right choice

For the vast majority of SMB setups, ZFS is the economically and technically appropriate solution. Specifically whenever:

  • You run one to three hypervisors and no massive growth is planned
  • Your workloads need single-thread performance — databases, ERP, Exchange
  • You have tight budgets for storage networking (10 GbE is plenty)
  • Your team is Linux-savvy but not a storage specialist
  • Downtimes of a few minutes in a disaster scenario are acceptable

ZFS on Proxmox is in production within an hour. A typical pool layout for a hypervisor looks like this:

# Mirror of two NVMe for VMs (high IOPS demand)
zpool create -o ashift=12 -O compression=zstd -O atime=off \
  vmpool mirror /dev/nvme0n1 /dev/nvme1n1

# RAIDZ2 across six HDDs for bulk data and backups
zpool create -o ashift=12 -O compression=zstd -O atime=off \
  datapool raidz2 sda sdb sdc sdd sde sdf

# Special VDEV (metadata + small blocks) on NVMe
zpool add datapool special mirror /dev/nvme2n1 /dev/nvme3n1

For HA in a 2-node setup, Proxmox uses ZFS replication: every 1 to 15 minutes, an incremental zfs send is shipped to the second host. In a failover, the VM starts there from the latest snapshot — the data loss equals the replication interval. For most SMB workloads this is acceptable, especially when the application itself (database WAL, file locking) does not strictly require RPO=0.

We also use ZFS as the backend for TrueNAS — as a pure NAS appliance serving Proxmox via NFS or iSCSI. This gives a clean separation between compute and storage, without the complexity of a distributed system.

When Ceph is the right choice

Ceph plays to its strengths as soon as you need real shared storage across multiple hosts. Typical indicators:

  • Four or more hypervisors are planned in the cluster, with further growth on the horizon
  • Workloads must be live-migratable without delay (patch windows without VM stops)
  • You operate container platforms (Kubernetes, OpenShift) with dynamic PVCs
  • RPO=0 is a hard requirement — every write must be redundant
  • You can invest in a 25 GbE or 100 GbE cluster network
  • The ops team is ready to build Ceph expertise or to permanently buy in external know-how

A minimum-viable Ceph setup in a Proxmox cluster looks like this:

3 x Proxmox node, each:
  - 2x NVMe (OS, ZFS mirror)
  - 4-8x NVMe as Ceph OSDs (each 1.92-3.84 TB enterprise SSD)
  - 64-128 GB RAM
  - 2x 25 GbE for Ceph public/cluster network, separated
  - 2x 10 GbE for VM traffic and management

The rule of thumb for Ceph RAM: 1 GB per TB of OSD capacity, plus at least 4 GB per OSD daemon. A node with 8 OSDs at 3.84 TB therefore needs around 60 GB for storage alone — VMs come on top.

Replication in Ceph is typically configured as size=3, min_size=2. This means three copies of every block on three different nodes, with writes requiring at least two confirmed replicas. Usable capacity is therefore one third of raw capacity. Erasure coding can reduce this overhead but is rarely the right choice for VM workloads in Proxmox.

The hardware trap

The most common mistake in the SMB space is to deploy Ceph on hardware that was specced for ZFS. Symptoms only emerge under load:

  • Latencies above 20 ms in databases
  • “Slow ops” warnings in cluster status
  • High CPU load on nodes from RocksDB compaction
  • Inconsistent performance depending on VM placement

Concretely problematic: consumer SSDs without power-loss protection (massive performance degradation under sync writes), shared 10 GbE for VM and Ceph traffic, too few OSDs per node (at least four are sensible), and HDDs as Ceph OSDs without a separate DB device on NVMe.

ZFS, by contrast, is happy with much more modest hardware. A hypervisor with 64 GB RAM, four enterprise NVMe in RAIDZ2 and 10 GbE is perfectly adequate for 20—30 typical SMB VMs.

Performance in comparison

The following figures come from a recent DATAZONE benchmark on identical server hardware (AMD EPYC 9354P, 256 GB RAM, 8x Kioxia CD8-V 3.84 TB NVMe per node, 25 GbE Mellanox):

WorkloadZFS Mirror (1 host)Ceph 3-node (size=3)
4K random read, 1 thread195,000 IOPS28,000 IOPS
4K random read, 64 threads410,000 IOPS1,350,000 IOPS
4K random write, 1 thread88,000 IOPS11,000 IOPS
4K random write, 64 threads195,000 IOPS720,000 IOPS
Sequential read, 1 MB12 GB/s18 GB/s (aggregate)
Average write latency0.4 ms1.8 ms

The message is clear: for individual, latency-critical workloads, ZFS wins. As soon as many parallel workers or distributed applications come into play, Ceph wins through the aggregate bandwidth of multiple nodes.

Operational effort and escalation paths

In day-2 operations, the two solutions differ dramatically. ZFS maintenance generally boils down to monthly zpool scrub, the occasional disk replacement, and monitoring SMART values. A well-built ZFS pool runs for years without intervention.

Ceph, on the other hand, is a living system that requires active care: upgrade paths have to be planned (monitor before OSD before MDS), rebalancing storms after node failures need to be steered, PG counts must match cluster size, and diagnosis in failure scenarios is significantly more complex. A forgotten ceph osd set noout before a reboot can trigger an hour of rebalancing — with a noticeable hit to VM performance.

Realistically, plan for at least 4 hours per month of active maintenance for a Ceph cluster, plus regular patch management and capacity planning. For ZFS, 30 minutes per month is closer to reality. This difference, multiplied by your admin’s hourly rate, is part of the total cost of ownership.

For customers who do not want to run Ceph themselves, we operate it as part of our virtualization services, including monitoring, patch management and escalation to Proxmox enterprise support.

Decision matrix: short and concrete

Your situationRecommendation
1—2 hypervisors, classic SMBLocal ZFS
2 hypervisors with HA desire, RPO 5—15 min okZFS + Proxmox replication
Central NAS, multiple PVE hostsTrueNAS with ZFS, NFS/iSCSI
3+ hypervisors, RPO=0, container platformCeph (3 nodes minimum, better 4—5)
High-frequency OLTP database, one main serverZFS on NVMe, dedicated DB host
Growth to 100+ VMs foreseeableCeph with a clear scale-out plan
Tight budget, no 25 GbE networkZFS, Ceph makes no sense here

Hybrid setups are also common: Ceph as shared storage for most VMs, local ZFS for a latency-critical database VM that is explicitly pinned. The backup layout, too, benefits from ZFS on the PBS host even when production runs on Ceph.

Conclusion

There is no blanket answer to “Ceph or ZFS” — but there are clear indicators. ZFS is the right choice for 80 percent of typical SMB setups: simple in operation, predictable performance, low hardware requirements. Ceph only unfolds its value when real scaling, container workloads or RPO=0 are required — and when the organization is prepared to invest in matching hardware and know-how.

The most expensive option is always the one built for the wrong requirement: Ceph for a 2-node setup without 25 GbE brings only headaches; ZFS for a 6-node container cluster blocks sensible workflows. The right choice saves substantially more in total than the pure hardware delta.

DATAZONE supports you in the storage architecture of your Proxmox environment — from an honest requirements analysis through hardware selection to running Ceph clusters or ZFS-based TrueNAS systems. We have been building both worlds for SMBs for years and know the pitfalls that are missing from the glossy datasheet. Get in touch: /en/kontakt/.

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch