Remote Support Start download

Proxmox Cluster Network Design: Corosync, Migration, Storage, and Management

ProxmoxNetzwerkVirtualisierung
Proxmox Cluster Network Design: Corosync, Migration, Storage, and Management

A Proxmox cluster where all services run over a single network works — until it does not. The moment a Ceph rebalance saturates the bandwidth, live migrations stall, and Corosync loses its heartbeat, the entire cluster goes down. Professional network design separates the different traffic types and prevents exactly these scenarios.

The Four Traffic Types in a Proxmox Cluster

Every Proxmox cluster produces four fundamentally different types of network traffic:

Traffic TypeCharacteristicsRequirements
CorosyncSmall packets, extremely latency-sensitive< 2 ms RTT, dedicated network
MigrationBurst traffic, high bandwidth10 Gbit/s+, must not impact other networks
StorageConstant throughput, IOPS-sensitive10–25 Gbit/s, Jumbo Frames
ManagementLow volume, but criticalReachability, security

The golden rule: Corosync traffic must never compete with storage or migration traffic. A delayed heartbeat leads to a fencing event — and consequently unwanted VM restarts.

Corosync Ring: The Cluster’s Nervous System

Corosync is the cluster communication protocol that monitors the state of all nodes. It regularly sends heartbeat packets and expects responses within tight time windows.

Dedicated Corosync Network

For Corosync, a simple Layer 2 network without routers or firewalls is sufficient:

# /etc/network/interfaces — Corosync interface
auto ens18f1
iface ens18f1 inet static
    address 10.10.10.1/24
    mtu 1500

On additional nodes, configure analogously with 10.10.10.2/24 and 10.10.10.3/24. Use a separate physical interface or a dedicated VLAN.

Corosync with Two Rings (Redundancy)

Proxmox has supported multiple Corosync links since PVE 6. Always configure at least two links on separate physical paths:

# Show Corosync links
pvecm status | grep -A5 "link"

# Add second link (on all nodes simultaneously)
pvecm link update 1 --address 10.10.20.1

The first link (Link 0) uses, for example, the 10.10.10.0/24 network via Switch A, while the second link (Link 1) uses 10.10.20.0/24 via Switch B. If one switch fails, the cluster remains functional.

Corosync Tuning

The default timeouts are suitable for most environments but can be adjusted for unstable networks:

# /etc/pve/corosync.conf (excerpt)
totem {
    token: 3000          # Timeout in ms (default: 1000)
    token_retransmits_before_loss_const: 6
    join: 60
    consensus: 3600      # must be > 1.2 * token
}

Caution: Higher timeouts mean slower failure detection. Only increase values if network latency requires it.

Migration Network: Bandwidth for Live Migration

Live migration moves the entire RAM contents of a VM across the network. A VM with 32 GB RAM generates correspondingly heavy traffic. Without a dedicated migration network, this transfer blocks other services.

Configuring the Migration Network

In the Proxmox web interface: Datacenter > Options > Migration Settings:

Migration Network:  10.10.30.0/24
Migration Type:     secure (encrypted)

The network configuration on each node:

# /etc/network/interfaces — Migration interface
auto ens18f0
iface ens18f0 inet static
    address 10.10.30.1/24
    mtu 9000

Bandwidth Limiting

To prevent a migration from saturating the entire network, set a bandwidth limit:

# Limit migration bandwidth to 5 Gbit/s
pvesh set /cluster/options --migration '{"network":"10.10.30.0/24","type":"secure","bandwidth_limit":5120}'

The value 5120 corresponds to 5120 MiB/s. On a 10 Gbit/s link, this leaves enough headroom for parallel migrations.

Storage Network: Ceph and iSCSI

The storage network carries the consistent data traffic between nodes (for Ceph) or to external storage (iSCSI, NFS). Throughput and low latency are paramount here.

Configuring the Ceph Network

Ceph distinguishes between Public Network (client access) and Cluster Network (OSD replication). The cluster network carries replication traffic and should always be dedicated:

# /etc/pve/ceph.conf
[global]
public_network = 10.10.40.0/24
cluster_network = 10.10.50.0/24
ms_bind_ipv4 = true

The network interfaces:

# Storage Public Network
auto ens19f0
iface ens19f0 inet static
    address 10.10.40.1/24
    mtu 9000

# Storage Cluster Network (OSD replication)
auto ens19f1
iface ens19f1 inet static
    address 10.10.50.1/24
    mtu 9000

MTU 9000 (Jumbo Frames) for Storage

Jumbo Frames with MTU 9000 significantly reduce CPU overhead per transmitted byte. For storage traffic consisting of many large sequential blocks, this yields 10–20% more throughput.

Important: MTU 9000 must be configured along the entire path — interfaces, switches, and any routers. A single device with MTU 1500 fragments packets and degrades performance.

# Check MTU on all switches
for host in switch1 switch2 switch3; do
    ssh admin@$host "show interface mtu"
done

# Test MTU (node to node)
ping -M do -s 8972 10.10.40.2

The value 8972 is derived from 9000 (MTU) - 20 (IP header) - 8 (ICMP header). If the ping succeeds, Jumbo Frames are correctly configured.

iSCSI Connectivity

For external storage via iSCSI (e.g., TrueNAS as target), use the same storage network:

# /etc/network/interfaces — iSCSI Multipath
auto ens19f0
iface ens19f0 inet static
    address 10.10.40.1/24
    mtu 9000

auto ens19f1
iface ens19f1 inet static
    address 10.10.41.1/24
    mtu 9000

Configure Multipath I/O with multipathd and open-iscsi for redundancy and load balancing.

Management VLAN

The management network provides access to the Proxmox web interface, SSH, and API. It should reside in its own VLAN and be protected by firewall rules.

# /etc/network/interfaces — Management Bridge
auto vmbr0
iface vmbr0 inet static
    address 10.10.1.1/24
    gateway 10.10.1.254
    bridge-ports ens18f0
    bridge-stp off
    bridge-fd 0

Access Restrictions

Restrict access to the management interface via the Proxmox firewall:

# Only specific subnets may access the web interface
pvesh create /nodes/pve1/firewall/rules \
  --action ACCEPT --type in --source 10.10.1.0/24 \
  --dest 10.10.1.1 --dport 8006 --proto tcp

Bonding and LACP

For redundancy and increased bandwidth, combine physical interfaces into bonds. LACP (802.3ad) distributes traffic across multiple links:

# /etc/network/interfaces — LACP Bond
auto bond0
iface bond0 inet manual
    bond-slaves ens18f0 ens18f1
    bond-mode 802.3ad
    bond-miimon 100
    bond-xmit-hash-policy layer3+4
    mtu 9000

# Bridge over the bond
auto vmbr1
iface vmbr1 inet static
    address 10.10.40.1/24
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    mtu 9000

Hash policy: layer3+4 distributes traffic based on IP addresses and ports. This works well for VM traffic since each VM has its own IPs. For Ceph traffic (few IPs, many connections), layer2+3 may perform better.

Configuring LACP on the Switch

The switch must provide the LACP counterpart for the bond. Example for a managed switch:

# Cisco syntax (example)
interface range GigabitEthernet0/1-2
  channel-group 1 mode active

interface Port-channel1
  switchport mode trunk
  mtu 9000

Example Topology: 3-Node Cluster

A proven topology for a production 3-node cluster with Ceph:

Node 1 (pve1):
  ens18f0 → Management VLAN 10    (10.10.1.1/24)
  ens18f1 → Corosync Link 0       (10.10.10.1/24)
  ens19f0 → Ceph Public            (10.10.40.1/24, MTU 9000)
  ens19f1 → Ceph Cluster           (10.10.50.1/24, MTU 9000)
  ens20f0 → Migration              (10.10.30.1/24, MTU 9000)
  ens20f1 → Corosync Link 1       (10.10.20.1/24)

Switch A: Management, Corosync L0, Migration
Switch B: Corosync L1, Ceph Public, Ceph Cluster

By distributing across two switches, the cluster survives the failure of any single switch.

Example Topology: 2-Node Cluster (Budget)

Not every business has six NICs per node. A minimal but functional configuration with four NICs:

Node 1 (pve1):
  ens18f0 → VLAN 10: Management    (10.10.1.1/24)
             VLAN 11: Corosync L0   (10.10.10.1/24)
  ens18f1 → VLAN 12: Corosync L1   (10.10.20.1/24)
  bond0 (ens19f0 + ens19f1) → Storage + Migration (10.10.40.1/24, MTU 9000)

Here, management and Corosync L0 share a physical interface via VLANs. This works as long as management traffic is minimal.

Common Mistakes and How to Avoid Them

1. Corosync over the storage network: A Ceph recovery saturates the network, Corosync loses the heartbeat, nodes get fenced. Always separate these.

2. MTU mismatch: A switch with MTU 1500 in the storage path fragments Jumbo Frames. Performance collapses, but there is no obvious error message — just inexplicably slow storage.

3. No second Corosync link: A single link is a single point of failure. A brief cable wobble leads to a split-brain scenario.

4. Spanning Tree on bridge ports: Proxmox bridges do not need STP (bridge-stp off). Enabled STP causes 30–50 second delays on link-up.

5. Bond without LACP counterpart: A bond in 802.3ad mode without a configured LACP group on the switch results in only one link being active.

Network Monitoring

Monitor network performance continuously:

# Check bandwidth per interface
iftop -i ens19f0

# Corosync status and latency
corosync-cfgtool -s

# Check bond status
cat /proc/net/bonding/bond0

Integrate these checks into your monitoring system (e.g., Checkmk, Zabbix, or DATAZONE Control) to detect problems early.

Conclusion

Thoughtful network design is the foundation of a stable Proxmox cluster. The investment in dedicated networks for Corosync, storage, and migration pays off during the first Ceph recovery or when migrating multiple VMs simultaneously. Plan the network before cluster installation — retrofitting requires maintenance windows and carries risks.

More on these topics:

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch