Multi-WAN setups are among the most misunderstood OPNsense topics. On the data sheet it sounds like “double bandwidth, double availability”; in practice this is often a simplification that can be expensive. This article explains how to cleanly configure real failover between two WAN connections — typically fibre plus LTE/5G backup — on OPNsense, and why load balancing is not the right choice for most SMBs.
The scenario
Typical SMB setup we configure at DATAZONE:
- WAN1: fibre, symmetric 500/500 Mbit/s, fixed ISP contract, static IP
- WAN2: LTE or 5G backup, asymmetric, monthly data quota (e.g. 100 GB), changing IP
- Requirement: on WAN1 failure switch automatically to WAN2, do not disrupt VoIP telephony, keep critical services reachable (mail, ERP, VPN dial-in)
Goal: failover-only, no load balancing. Justification below.
Configure gateways
In OPNsense multi-WAN lives at the gateway level. A gateway is defined per WAN interface (System → Gateways → Single). Important fields:
- Monitor IP: do not use the ISP standard gateway IP, use a real public IP of the ISP internet — e.g.
1.1.1.1(Cloudflare) or9.9.9.9(Quad9). Why: the ISP router often still responds to pings when the connection is “down” (reachable technically but no internet). A real public IP is a more reliable health check. - Latency threshold: default values (500 ms warning / 1000 ms alarm) are too generous for fibre. We typically set 200 ms / 500 ms.
- Packet loss threshold: 10% warning / 20% alarm — at smaller thresholds failover triggers on normal fluctuations.
- Probe interval: 1 second
- Time period: 60 seconds — gateway is considered “down” only after 60 seconds over threshold
These settings are a compromise between fault tolerance (no flapping) and reaction time (failover within ~1 minute). Anyone needing faster failover can go to 30 seconds time period — at the risk of false positives on short ISP hiccups.
Create gateway group
The gateway group is the central configuration for the failover logic (System → Gateways → Group). Example configuration:
Name: WAN_FAILOVER
Gateway priority:
- WAN1_GW: Tier 1
- WAN2_GW: Tier 2
Trigger level: Packet Loss or High Latency
Tier 1 is used as long as it is “up”. On failure OPNsense switches to Tier 2. Several gateways on the same tier would activate load balancing — which we deliberately avoid here.
Switch firewall rules to the gateway group
The real lever: firewall rules that allow outbound traffic must get the gateway group as gateway selection — not the individual WAN gateway.
In Firewall → Rules → LAN for the default outbound rule (“from LAN net to any”):
- Advanced features → Gateway:
WAN_FAILOVERinstead ofdefault
Anyone who forgets this has working gateway health checks but the traffic still takes the system default route — which cannot fail, because OPNsense entered it statically. Classic configuration mistake.
Outbound NAT per WAN
OPNsense default NAT does source NAT to the respective WAN IP. With multi-WAN this must work cleanly per WAN — otherwise packets go out via WAN2 but with WAN1 source IP, and the ISP LTE gateway drops the traffic.
In Firewall → NAT → Outbound:
- Switch mode to Hybrid Outbound NAT (default is Automatic)
- Manual rules per WAN: “Source LAN net → Translation interface address WAN1” and “Source LAN net → Translation interface address WAN2”
- Order is not decisive, because OPNsense applies the NAT rule matching the chosen outgoing interface
Health check with monitor IP — the most important part
We briefly mentioned this above, here in more detail. The monitor IP decides whether failover works. Common mistakes:
- ISP gateway IP as monitor: ISP routers keep responding to pings of themselves even when their internet uplink is down. Failover does not trigger.
- Own public IP as monitor: makes no sense — goes over the same interface we want to check, plus possible asymmetry problems.
- Unreachable IP: same mistake in the other direction — gateway is permanently detected as down.
Well proven:
- WAN1 monitor IP:
1.1.1.1(Cloudflare) - WAN2 monitor IP:
9.9.9.9(Quad9) — or specifically a different public IP from WAN1, so both monitor targets cannot fail at the same time (e.g. during a large DDoS on Cloudflare)
Sticky connections for VoIP
This is the point where most multi-WAN setups fail in practice. VoIP connections (SIP, RTP) do not tolerate mid-call failover. If the gateway switches mid-call, the source IP for RTP changes, the SIP provider drops the packets, and the call drops.
Solution in OPNsense:
- Reply-To mechanism: in firewall rules (Advanced → Reply-to) ensure that established connections keep going over the original gateway, even if the default route has switched.
- Sticky connections (Firewall → Settings → Advanced → Sticky connections active): forces an existing connection to stay on the same gateway.
With sticky active only new connections switch to WAN2. Active calls stay on WAN1 — and drop if WAN1 is really down. That is acceptable: a dropped call is better than losing every second’s packets in the void.
Load balancing — why we rarely recommend it
Multi-tier gateway groups with two Tier 1 gateways activate load balancing. Sounds tempting (“double bandwidth!”) but has hard practice problems:
- Sessions break: TCP sessions started on WAN1 once must stay there — sticky is mandatory, otherwise HTTPS collapses regularly.
- Asymmetric routing: some web applications react sensitively to changing source IPs (cookies, session tokens, captcha).
- Cloud apps with geo-IP: Microsoft 365 or Google Workspace notice when the source IP jumps between ISPs — account security alarms follow.
- VPN performance: WireGuard and IPsec drop their tunnels on source-IP change.
- Asymmetric WAN bandwidth (e.g. 500 Mbit/s fibre plus 50 Mbit/s LTE) profits little from load balancing — the fast line waits for the slow one.
For most SMBs failover-only is the right choice. Load balancing makes sense for high-load setups with two equivalent symmetric lines and workloads that do not need sticky (e.g. backup replication, bulk downloads).
What happens at failover — expectation management
Even a cleanly configured failover is not an uninterrupted connection. What actually happens:
- Active TCP sessions break: HTTPS connections, RDP, SSH — all existing sessions terminate. Browsers reload after that, RDP clients reconnect — mostly within 5–15 seconds.
- VPN tunnels must rebuild: WireGuard is faster than IPsec (typically under 5 seconds), but there is an interruption.
- DNS caches contain old public IPs: outbound connections can choose wrong routes in the first seconds — dynamic DNS for own services can mitigate this.
- VoIP: active calls drop (see sticky), new calls go over WAN2.
A good failover plan communicates this internally: “on WAN outage we switch automatically to LTE. There is a 15–30 second interruption. Calls may drop — dial again. ERP web client reloads.”
Monitoring and alerting
After configuration it must be ensured that a failover is noticed. OPNsense can:
- Email alert on gateway switch (System → Settings → Notifications)
- Webhook alert for integration into Slack, Mattermost or MS Teams
- Zabbix/Prometheus polling via the OPNsense plugin/API
Important: after recovery of WAN1, OPNsense switches back automatically. This switch is also an alarm-worthy event.
Testing — before the real case
Failover without testing is hope-based. What we always do in DATAZONE setups:
- Controlled WAN1 shutdown: pull WAN1 cable (or disable ISP modem), start stopwatch. When does OPNsense detect the failure, when does the first traffic route over WAN2?
- VoIP test during failover: hold an active call during the test. Expected: call drops. New call over WAN2 works.
- Test VPN reconnect: check home-office VPN during failover — does the tunnel rebuild on the new WAN IP?
- Failback test: re-enable WAN1, check whether OPNsense switches back automatically.
Document the result. When the setup is tested next (in 12 months at the maintenance appointment), you have a baseline.
Realistic recommendation for SMB
For the typical mid-market customer under our consulting:
- One fast, stable line (fibre, possibly SDSL as bundle) as WAN1
- LTE/5G backup as WAN2 — with a contract with sufficient data quota (on failover-heavy days the backup can consume several hundred GB)
- Failover-only setup, no load balancing
- Sticky connections on, health checks on real public IPs
- Outbound NAT per WAN cleanly configured
- Alerting to the IT distribution list
This is a setup that helps in emergencies without creating problems in everyday operation. Anyone needing a more complex setup with load balancing should justify this with workload analysis — not because “multi-WAN” is on the data sheet.
DATAZONE recommendation
OPNsense multi-WAN with failover-only is standard repertoire in our firewall setups. We typically configure it with two hours of preparation and a 30-minute test — the result lasts years, as long as ISP contracts and hardware do not change.
Anyone migrating from pfSense or from an old Sophos/Fortinet solution finds the multi-WAN configuration in OPNsense well structured — the UI is clear, the logic comprehensible.
Sources and further reading
- OPNsense documentation — Multi-WAN — official docs
- OPNsense vs. pfSense — fundamental comparison
- WireGuard site-to-site VPN in 30 minutes — for site connections
Anyone who wants their multi-WAN setup configured by an OPNsense expert: please book a meeting — we set this up remotely too.
More on these topics:
More articles
TrueNAS HA: When Is the Dual Controller Worth It?
Dual-controller high availability on TrueNAS is non-trivial — neither in price nor in concept. When HA really pays off, what it does not solve, and when two single-controller systems are the better choice.
OPNsense 26.7 Release: What's Coming
OPNsense 26.7 is due: what to expect from the traditional July major release — HardenedBSD/FreeBSD update, plugin refresh, GUI improvements. An honest look at the public roadmap.
WireGuard Site-to-Site VPN in 30 Minutes
Concrete tutorial — connect two sites (head office and branch) via WireGuard on OPNsense. Generate key pairs, configure peers, set allowed IPs correctly, persistent keepalive for NAT traversal, firewall and routing rules. Production-ready in 30 minutes.