Remote Support Start download

Proxmox VE with Ceph: Building Hyperconverged Infrastructure

ProxmoxCephVirtualizationStorage
Proxmox VE with Ceph: Building Hyperconverged Infrastructure

In traditional data centres, compute and storage are separate: servers on one side, SAN or NAS on the other. Hyperconverged infrastructure (HCI) unifies both on the same nodes — and Proxmox VE brings native Ceph integration to make this a reality. No additional licences, no proprietary appliances, and no vendor lock-in.

This article explains how a Proxmox-Ceph architecture is structured, what to consider when sizing your hardware, and which best practices have proven effective in production environments.

1. What Is Ceph?

Ceph is a distributed storage system that automatically replicates data across multiple nodes. It has no single point of failure — if a node or disk fails, Ceph autonomously reorganises data across the remaining resources. This behaviour is known as self-healing.

The foundation is RADOS (Reliable Autonomic Distributed Object Store), an object-based storage layer. Built on top of RADOS, Ceph provides three storage types:

ProtocolPurposeTypical Use
RBD (RADOS Block Device)Block storageVM disks in Proxmox VE
CephFSPOSIX file systemShared file storage between VMs
RGW (RADOS Gateway)Object storage (S3/Swift)Backups, media archiving

For Proxmox environments, RBD is the central building block: VM disks and container volumes are stored as block devices directly in the Ceph cluster and are available to all cluster nodes simultaneously. This enables live migration without a shared filesystem.

2. Architecture Overview: Proxmox + Ceph

In a hyperconverged configuration, each node serves a dual role — it provides both compute power (hypervisor) and storage (Ceph OSD). Additionally, each node runs Ceph coordination services.

Services per Node

ServiceFunction
Proxmox VEHypervisor (KVM/LXC), management GUI
Ceph OSDObject Storage Daemon — manages local drives
Ceph Monitor (MON)Maintains the cluster map, monitors cluster health
Ceph Manager (MGR)Provides metrics, dashboard, and modules

Minimum Requirement: 3 Nodes

Ceph requires an odd number of monitors for quorum. Three nodes are the minimum for a functional cluster. With three nodes and a replication factor of 3, the cluster survives the failure of an entire node without data loss.

3. Hardware Sizing

Proper sizing determines the stability and performance of the entire cluster. Ceph is resource-intensive — underestimated OSD requirements lead to performance problems under load.

ComponentMinimumRecommended
Nodes35+
CPU1 core per OSD + VM cores2 cores per OSD + VM cores
RAM5 GB per OSD + VM RAM8 GB per OSD + VM RAM
OSD drivesNVMe SSDsEnterprise NVMe (DWPD >= 1)
Network10 GbE25 GbE
WAL/DBOn the OSD driveSeparate NVMe for WAL/DB with HDD OSDs
Boot disk64 GB SSD128 GB SSD (ZFS mirror)

Note: The RAM and CPU figures for Ceph come in addition to the requirements of your VMs and containers. Plan both together to avoid overprovisioning.

4. Network Design

The network is the most critical component of a Ceph environment. Insufficient bandwidth or missing separation leads to latency that directly impacts VM performance.

Two Separate Networks

NetworkPurposeDescription
Public NetworkClient accessVMs access Ceph storage via this network
Cluster NetworkOSD replicationInternal data replication between OSDs

Separation is essential: without a dedicated cluster network, OSD replication competes with VM traffic on the same link. During a node failure, replication traffic increases massively — this can overload an unseparated network.

Recommendations

  • VLAN separation or physically separate interfaces for public and cluster networks

  • Jumbo frames (MTU 9000) on all Ceph interfaces — reduces CPU overhead and increases throughput

  • Bonding / LACP for redundancy and bandwidth aggregation

  • MTU values must be configured identically on all switches and nodes

5. Ceph Pool Configuration

Ceph organises data in pools, each with its own replication and performance parameters.

Replication

ParameterTypical ValueMeaning
size3Number of copies of each object
min_size2Minimum copies for write operations

With size=3 and min_size=2, the pool continues to accept writes when one copy is missing (e.g. during an OSD failure). If a second copy fails, the pool becomes read-only — this protects against data loss.

Placement Groups (PGs)

Placement Groups are Ceph’s internal distribution unit. Too few PGs lead to uneven data distribution; too many strain RAM and CPU. Current Proxmox versions support PG autoscaling, which adjusts the count automatically — use this feature.

Erasure Coding vs. Replication

MethodStorage EfficiencyPerformanceRecommendation
Replication (3x)33%High (fast reads/writes)Default for VM disks
Erasure Coding (k+m)50–75%Lower (higher CPU load)Archive data, backups

For VM workloads, replication is the recommended choice. Erasure coding is suitable for large data volumes with lower IOPS requirements, such as backup pools.

6. Best Practices for Production

  • Homogeneous hardware: Use identical hardware across all nodes. Different disk sizes or CPU generations lead to uneven load distribution.

  • Do not mix HDDs and SSDs: Create separate pools for different drive types. Mixed pools produce unpredictable latency.

  • Monitor OSD utilisation: Keep occupancy below 85%. Above this threshold, Ceph begins aggressive rebalancing that significantly impacts performance.

  • Regular scrubbing: Ceph verifies data integrity through scrubbing. Schedule deep scrubs during low-load periods (e.g. overnight).

  • Test failover scenarios: Simulate the failure of OSDs and nodes before going to production. Verify that the cluster rebalances correctly and VMs remain accessible.

  • Use the Proxmox dashboard: The built-in web GUI shows Ceph status, OSD utilisation, pool health, and performance metrics — use it for daily monitoring.

  • Keep Ceph versions in sync: Update all nodes to the same Ceph version. Mixed versions are temporarily possible but not a permanent state.

7. When Is Ceph the Right Choice?

Ceph combined with Proxmox is an excellent choice when you:

  • Operate 3 or more nodes and need shared storage without an external SAN

  • Run highly available VMs that should automatically migrate during node failure

  • Want to scale flexibly — additional nodes and OSDs can be added during operation

  • Want to remain independent of proprietary storage vendors

When Is Ceph Not the Best Choice?

  • Single nodes: Ceph requires at least 3 nodes. For single-node setups, local ZFS is the better choice.

  • Very small environments: If 2 nodes with a few VMs suffice, the Ceph overhead is not justified. Use local storage with replication or an external TrueNAS as an iSCSI/NFS backend instead.

  • High sequential throughput requirements: For large files and streaming workloads, a dedicated NAS system may be more performant.

8. DATAZONE: Your Partner for Proxmox and Ceph

We plan, implement, and support Proxmox-Ceph clusters — from initial architecture through network planning to ongoing operations. We bring our experience from numerous production environments to your project.

If a hyperconverged approach does not fit your scenario, we are happy to advise on external storage solutions based on TrueNAS — as an iSCSI, NFS, or SMB backend for your Proxmox environment.


Planning a Proxmox cluster with Ceph or considering hyperconverged infrastructure? Contact us for a no-obligation consultation.

More on these topics:

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch