A Proxmox cluster where all services run over a single network works — until it does not. The moment a Ceph rebalance saturates the bandwidth, live migrations stall, and Corosync loses its heartbeat, the entire cluster goes down. Professional network design separates the different traffic types and prevents exactly these scenarios.
The Four Traffic Types in a Proxmox Cluster
Every Proxmox cluster produces four fundamentally different types of network traffic:
| Traffic Type | Characteristics | Requirements |
|---|---|---|
| Corosync | Small packets, extremely latency-sensitive | < 2 ms RTT, dedicated network |
| Migration | Burst traffic, high bandwidth | 10 Gbit/s+, must not impact other networks |
| Storage | Constant throughput, IOPS-sensitive | 10–25 Gbit/s, Jumbo Frames |
| Management | Low volume, but critical | Reachability, security |
The golden rule: Corosync traffic must never compete with storage or migration traffic. A delayed heartbeat leads to a fencing event — and consequently unwanted VM restarts.
Corosync Ring: The Cluster’s Nervous System
Corosync is the cluster communication protocol that monitors the state of all nodes. It regularly sends heartbeat packets and expects responses within tight time windows.
Dedicated Corosync Network
For Corosync, a simple Layer 2 network without routers or firewalls is sufficient:
# /etc/network/interfaces — Corosync interface
auto ens18f1
iface ens18f1 inet static
address 10.10.10.1/24
mtu 1500
On additional nodes, configure analogously with 10.10.10.2/24 and 10.10.10.3/24. Use a separate physical interface or a dedicated VLAN.
Corosync with Two Rings (Redundancy)
Proxmox has supported multiple Corosync links since PVE 6. Always configure at least two links on separate physical paths:
# Show Corosync links
pvecm status | grep -A5 "link"
# Add second link (on all nodes simultaneously)
pvecm link update 1 --address 10.10.20.1
The first link (Link 0) uses, for example, the 10.10.10.0/24 network via Switch A, while the second link (Link 1) uses 10.10.20.0/24 via Switch B. If one switch fails, the cluster remains functional.
Corosync Tuning
The default timeouts are suitable for most environments but can be adjusted for unstable networks:
# /etc/pve/corosync.conf (excerpt)
totem {
token: 3000 # Timeout in ms (default: 1000)
token_retransmits_before_loss_const: 6
join: 60
consensus: 3600 # must be > 1.2 * token
}
Caution: Higher timeouts mean slower failure detection. Only increase values if network latency requires it.
Migration Network: Bandwidth for Live Migration
Live migration moves the entire RAM contents of a VM across the network. A VM with 32 GB RAM generates correspondingly heavy traffic. Without a dedicated migration network, this transfer blocks other services.
Configuring the Migration Network
In the Proxmox web interface: Datacenter > Options > Migration Settings:
Migration Network: 10.10.30.0/24
Migration Type: secure (encrypted)
The network configuration on each node:
# /etc/network/interfaces — Migration interface
auto ens18f0
iface ens18f0 inet static
address 10.10.30.1/24
mtu 9000
Bandwidth Limiting
To prevent a migration from saturating the entire network, set a bandwidth limit:
# Limit migration bandwidth to 5 Gbit/s
pvesh set /cluster/options --migration '{"network":"10.10.30.0/24","type":"secure","bandwidth_limit":5120}'
The value 5120 corresponds to 5120 MiB/s. On a 10 Gbit/s link, this leaves enough headroom for parallel migrations.
Storage Network: Ceph and iSCSI
The storage network carries the consistent data traffic between nodes (for Ceph) or to external storage (iSCSI, NFS). Throughput and low latency are paramount here.
Configuring the Ceph Network
Ceph distinguishes between Public Network (client access) and Cluster Network (OSD replication). The cluster network carries replication traffic and should always be dedicated:
# /etc/pve/ceph.conf
[global]
public_network = 10.10.40.0/24
cluster_network = 10.10.50.0/24
ms_bind_ipv4 = true
The network interfaces:
# Storage Public Network
auto ens19f0
iface ens19f0 inet static
address 10.10.40.1/24
mtu 9000
# Storage Cluster Network (OSD replication)
auto ens19f1
iface ens19f1 inet static
address 10.10.50.1/24
mtu 9000
MTU 9000 (Jumbo Frames) for Storage
Jumbo Frames with MTU 9000 significantly reduce CPU overhead per transmitted byte. For storage traffic consisting of many large sequential blocks, this yields 10–20% more throughput.
Important: MTU 9000 must be configured along the entire path — interfaces, switches, and any routers. A single device with MTU 1500 fragments packets and degrades performance.
# Check MTU on all switches
for host in switch1 switch2 switch3; do
ssh admin@$host "show interface mtu"
done
# Test MTU (node to node)
ping -M do -s 8972 10.10.40.2
The value 8972 is derived from 9000 (MTU) - 20 (IP header) - 8 (ICMP header). If the ping succeeds, Jumbo Frames are correctly configured.
iSCSI Connectivity
For external storage via iSCSI (e.g., TrueNAS as target), use the same storage network:
# /etc/network/interfaces — iSCSI Multipath
auto ens19f0
iface ens19f0 inet static
address 10.10.40.1/24
mtu 9000
auto ens19f1
iface ens19f1 inet static
address 10.10.41.1/24
mtu 9000
Configure Multipath I/O with multipathd and open-iscsi for redundancy and load balancing.
Management VLAN
The management network provides access to the Proxmox web interface, SSH, and API. It should reside in its own VLAN and be protected by firewall rules.
# /etc/network/interfaces — Management Bridge
auto vmbr0
iface vmbr0 inet static
address 10.10.1.1/24
gateway 10.10.1.254
bridge-ports ens18f0
bridge-stp off
bridge-fd 0
Access Restrictions
Restrict access to the management interface via the Proxmox firewall:
# Only specific subnets may access the web interface
pvesh create /nodes/pve1/firewall/rules \
--action ACCEPT --type in --source 10.10.1.0/24 \
--dest 10.10.1.1 --dport 8006 --proto tcp
Bonding and LACP
For redundancy and increased bandwidth, combine physical interfaces into bonds. LACP (802.3ad) distributes traffic across multiple links:
# /etc/network/interfaces — LACP Bond
auto bond0
iface bond0 inet manual
bond-slaves ens18f0 ens18f1
bond-mode 802.3ad
bond-miimon 100
bond-xmit-hash-policy layer3+4
mtu 9000
# Bridge over the bond
auto vmbr1
iface vmbr1 inet static
address 10.10.40.1/24
bridge-ports bond0
bridge-stp off
bridge-fd 0
mtu 9000
Hash policy: layer3+4 distributes traffic based on IP addresses and ports. This works well for VM traffic since each VM has its own IPs. For Ceph traffic (few IPs, many connections), layer2+3 may perform better.
Configuring LACP on the Switch
The switch must provide the LACP counterpart for the bond. Example for a managed switch:
# Cisco syntax (example)
interface range GigabitEthernet0/1-2
channel-group 1 mode active
interface Port-channel1
switchport mode trunk
mtu 9000
Example Topology: 3-Node Cluster
A proven topology for a production 3-node cluster with Ceph:
Node 1 (pve1):
ens18f0 → Management VLAN 10 (10.10.1.1/24)
ens18f1 → Corosync Link 0 (10.10.10.1/24)
ens19f0 → Ceph Public (10.10.40.1/24, MTU 9000)
ens19f1 → Ceph Cluster (10.10.50.1/24, MTU 9000)
ens20f0 → Migration (10.10.30.1/24, MTU 9000)
ens20f1 → Corosync Link 1 (10.10.20.1/24)
Switch A: Management, Corosync L0, Migration
Switch B: Corosync L1, Ceph Public, Ceph Cluster
By distributing across two switches, the cluster survives the failure of any single switch.
Example Topology: 2-Node Cluster (Budget)
Not every business has six NICs per node. A minimal but functional configuration with four NICs:
Node 1 (pve1):
ens18f0 → VLAN 10: Management (10.10.1.1/24)
VLAN 11: Corosync L0 (10.10.10.1/24)
ens18f1 → VLAN 12: Corosync L1 (10.10.20.1/24)
bond0 (ens19f0 + ens19f1) → Storage + Migration (10.10.40.1/24, MTU 9000)
Here, management and Corosync L0 share a physical interface via VLANs. This works as long as management traffic is minimal.
Common Mistakes and How to Avoid Them
1. Corosync over the storage network: A Ceph recovery saturates the network, Corosync loses the heartbeat, nodes get fenced. Always separate these.
2. MTU mismatch: A switch with MTU 1500 in the storage path fragments Jumbo Frames. Performance collapses, but there is no obvious error message — just inexplicably slow storage.
3. No second Corosync link: A single link is a single point of failure. A brief cable wobble leads to a split-brain scenario.
4. Spanning Tree on bridge ports: Proxmox bridges do not need STP (bridge-stp off). Enabled STP causes 30–50 second delays on link-up.
5. Bond without LACP counterpart: A bond in 802.3ad mode without a configured LACP group on the switch results in only one link being active.
Network Monitoring
Monitor network performance continuously:
# Check bandwidth per interface
iftop -i ens19f0
# Corosync status and latency
corosync-cfgtool -s
# Check bond status
cat /proc/net/bonding/bond0
Integrate these checks into your monitoring system (e.g., Checkmk, Zabbix, or DATAZONE Control) to detect problems early.
Conclusion
Thoughtful network design is the foundation of a stable Proxmox cluster. The investment in dedicated networks for Corosync, storage, and migration pays off during the first Ceph recovery or when migrating multiple VMs simultaneously. Plan the network before cluster installation — retrofitting requires maintenance windows and carries risks.
More on these topics:
More articles
Backup Strategy for SMBs: Proxmox PBS + TrueNAS as a Reliable Backup Solution
Backup strategy for SMBs with Proxmox PBS and TrueNAS: implement the 3-2-1 rule, PBS as primary backup target, TrueNAS replication as offsite copy, retention policies, and automated restore tests.
Proxmox Notification System: Matchers, Targets, SMTP, Gotify, and Webhooks
Configure the Proxmox notification system from PVE 8.1: matchers and targets, SMTP setup, Gotify integration, webhook targets, notification filters, and sendmail vs. new API.
The Official TrueNAS Plugin for Proxmox VE: NVMe/TCP, Native Integration, and a Generation Change
The official TrueNAS plugin for Proxmox VE brings NVMe/TCP, multipath, CHAP and cluster support. Background, features and the difference from the BoomshankerX plugin we tested in 2025 — with a look at the upcoming native PVE integration.