Remote Support Start download

High Availability & Clustering with Proxmox VE

ProxmoxVirtualisationLinuxCephHardware
High Availability & Clustering with Proxmox VE

Organisations with virtualised infrastructures demand maximum availability and fault tolerance. Proxmox VE combines open-source technologies into a highly available cluster system that continues operating even during hardware failures — without licence costs, but with enterprise-grade performance.

This article explains the architecture of a Proxmox cluster, the HA functionality, typical configurations, and practical recommendations for production environments.

1. Fundamentals: Cluster Architecture in Proxmox VE

A Proxmox cluster is a logical group of at least two nodes that are managed collectively. Communication is handled via the Corosync protocol, which exchanges status information and quorum data.

ComponentFunctionDescription
CorosyncCluster communicationResponsible for status & quorum
pmxcfs (Proxmox Cluster File System)Configuration replicationReplicates VM configs across all nodes
pve-ha-managerHigh availability monitoringRestarts VMs on node failure
Fence mechanismFault isolationIsolates faulty nodes from the cluster

Cluster Network

  • Recommended: dedicated 1 Gbit / 10 Gbit network for Corosync

  • Redundant NICs with bonding (active-backup)

  • MTU value synchronised across all nodes

2. Building an HA Cluster

Example: 3-Node Cluster

NodeRoleDetails
pve01PrimaryQuorum master
pve02SecondaryFailover target
pve03SecondaryBackup node / Ceph client

Step-by-Step Configuration

  1. Initialise the cluster: pvecm create clustername

  2. Add nodes: pvecm add

  3. Verify quorum: pvecm status

  4. Define HA resource: Via the web GUI or CLI (ha-manager add vm:100)

  5. Test functionality: Simulate shutting down a node — the VM should restart automatically.

3. Understanding High Availability (HA)

The HA manager monitors defined resources (e.g. VMs / containers). When a node fails, Corosync detects the loss and restarts the resource on another available host.

EventHA Manager Action
Node offlineVM started on another node
Quorum lostNo action (cluster freeze)
Node back onlineSynchronisation & status recovery

Important: An HA cluster is only as stable as its quorum design. For production systems, the following applies:

  • at least 3 nodes

  • Ceph storage or NFS backend with consistent access

4. Best Practices

AreaRecommendation
NetworkDedicated Corosync LAN, LACP bonding
StorageShared storage (Ceph, iSCSI, NFS via TrueNAS) with multipath
MonitoringEnable syslog, ha-manager status, email alerts
BackupIntegration with Proxmox Backup Server
TestingPerform regular failover simulations

Example: Ceph Cluster Layout

ComponentCountDescription
OSDs6Storage disks per node
MONs3Cluster monitoring
MGRs2Management processes
MDS1Optional for CephFS

5. Benefits for Organisations

  • No licence costs: HA functionality included natively

  • Minimal downtime: automated failover

  • Scalability: straightforward node expansion

  • Centralised management: all resources via web GUI and API

Example calculation (with 10 VM hosts): Licence cost savings vs. proprietary solution approx. 30—40 % annually.

6. Conclusion

Proxmox VE delivers genuine enterprise high availability without additional licences. Thanks to its transparent architecture, open standards, and simple automation, it is ideal for production IT infrastructures with demanding availability requirements.

Request a Consultation or Demo

Your IT should always be running. With Proxmox VE, you ensure high availability without licence costs — our experts assist with design, cluster deployment, and operations.

—> Request a free initial consultation or demo now: Contact

DATAZONE supports you with implementation — contact us for a no-obligation consultation.

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch