A single fleet host handles up to 16 concurrent calls. To put multiple fleet hosts behind one carrier-facing hostname for inbound WebSocket calls, an HAProxy WSS ingress sits between the carriers (iCallMate, Exotel) and the fleets. On Aetherix this is calls.vohci.com fronting both fleet.vohci.com and fleet2.vohci.com. All ingress assets live in the VoxCore repo under infra/haproxy/ and are brand-portable — nothing client-specific is baked into the templates.

Routing scope

The ingress fronts only long-lived WebSocket call routes. Short POST routes are deliberately rejected because they return before the call pipeline finishes — connection-count load balancing does not represent worker occupancy for them.
RouteBehavior
/ws/{bot_id}Proxied to a fleet (iCallMate WebSocket)
/exotel/{bot_id}Proxied to a fleet (Exotel Voicebot applet)
/haproxy-healthLocal probe, returns 200 ok
/attach, /livekit/dialout, /livekit/widget, /livekit/dispatchRejected (404) — these use VoxBridge/VoxDialler app-level fleet selection via /health/fleet
anything else404 from HAProxy

Capacity-aware health check

leastconn plus per-server maxconn 16 mirrors the per-host worker count, but maxconn only counts sessions HAProxy itself routed. Direct fleet-URL traffic, /attach, and /livekit/* bypass the ingress, so real worker occupancy can exceed HAProxy’s count. To avoid routing into a saturated fleet, the health check inspects the /health/fleet JSON body:
http-check expect rstring "\"fleet_available\":([1-9]|1[0-6])[,}]"
When a fleet’s fleet_available reaches 0 the body stops matching, HAProxy marks that backend DOWN regardless of its own session count, and leastconn routes only to fleets with real capacity left.
Pattern anatomy and footguns (PCRE2):
  • The leading " anchors the key with no preceding comma. A leading comma would break the day fleet_available serializes as the first key in its object (preceding char is {, not ,), marking both identical fleets DOWN at once. The quote alone already rejects decoys like prev_fleet_available / max_fleet_available.
  • Trailing [,}] terminates the value so it matches the whole number.
  • Range 1..16 is intentionally coupled to maxconn 16. If you scale a fleet past 16 workers, fleet_available exceeds 16, the pattern stops matching, and the healthy higher-capacity fleet is marked DOWN — during the exact “add capacity” operation. When you raise maxconn, raise this upper bound in the same change (template + README) and re-validate.
  • haproxy -c only validates syntax — it cannot confirm the expression matches the live body. A wrong anchor validates clean, then black-holes the ingress at runtime. Always validate against a captured body before reload:
curl -sk https://fleet.vohci.com/health/fleet \
  | grep -Pq '"fleet_available":([1-9]|1[0-6])[,}]' && echo OK || echo FAIL

Check timing and flapping tradeoff

Fleet server lines use check inter 1s fall 2 rise 2: a fleet must fail two consecutive 1s probes (~2s) before ejection, and pass two before returning. This is not hair-trigger (fall 1) on purpose:
  • With only two fleets, ejecting on a single transient slow /health/fleet response dumps all carrier load onto the one survivor — manufacturing the exact saturation the check was meant to prevent. Fast ejection + low fleet count produces flapping cascades. fall 2 smooths single-probe blips.
  • The cost: a ~2s window where a newly-saturated fleet can still receive a session or two before ejection. Those land at fleet nginx and return 429, which the carrier sees as-is.
  • With a third fleet, fall 1 becomes safer because a single survivor is no longer the failure mode — revisit then.

Retries and 429

option redispatch + retry-on conn-failure empty-response response-timeout retries a different fleet on connection-level failures.
HAProxy 2.8 retry-on only accepts the HTTP status codes 401, 403, 404, 408, 425, 500, 501, 502, 503, 504429 is not retryable and is passed through to the carrier as-is. Rewriting 429 → 503 to force a retry would lie about real status. If 429 churn becomes operationally significant after the health-check tightening above, add another fleet.

TLS and cert renewal

The :443 bind asserts a TLS floor of TLSv1.2 explicitly rather than relying on the OS OpenSSL policy. Cipher selection is left to the OpenSSL default to avoid rejecting a carrier’s TLS stack. Cert renewal uses HTTP-01 via HAProxy — no downtime:
  1. Cron triggers certbot renew.
  2. Certbot starts a temporary listener on 127.0.0.1:8888.
  3. HAProxy’s :80 frontend routes /.well-known/acme-challenge/* to that listener.
  4. Let’s Encrypt validates, certbot writes the new cert.
  5. The deploy hook (renewal-hook.sh) atomically rebuilds the combined /etc/haproxy/certs/<domain>.pem and reloads HAProxy.
This requires authenticator = standalone and http01_port = 8888 in /etc/letsencrypt/renewal/<domain>.conf. Verify with certbot renew --dry-run.

Tooling

ScriptPurpose
render.sh <domain> [fleet-file] [out-dir]Renders the brand-agnostic templates for one deployment, substituting __INGRESS_DOMAIN__ and __FLEET_SERVERS__ (from fleet-servers.txt). Writes haproxy.cfg + the renewal hook, guards against surviving placeholders, and runs a structural haproxy -c (cert + DNS excluded, validated on the target host at install). Safe to run on CI/laptop.
add-fleet.sh [--dry-run] <hostname>Adds an inbound fleet to the live ingress: pre-checks (root, DNS, /health/fleet), clones the last server line so the new fleet inherits the exact flags, backs up, validates, reloads zero-downtime, and waits for the backend to read UP — rolling back automatically on any failure.
Do not hand-edit server lines in /etc/haproxy/haproxy.cfg — use add-fleet.sh. It exists precisely so ops don’t have to understand HAProxy syntax, SNI, or the health-check coupling.

Add a fleet — two paths

Adding a fleet touches two independent paths. Do both.
1

Provision and deploy

Deploy VoxCore on the new host (standard playbook). Confirm https://<new-host>/health/fleet responds.
2

(1) Inbound WSS — add-fleet.sh

On the ingress host, run ./add-fleet.sh fleet3.vohci.com (or --dry-run to preview). The script clones, validates, reloads, and rolls back on failure. Keep fleet-servers.txt in the repo in sync (append the same line) so a future re-render matches the box — the script edits the live config only.
3

(2) Outbound — VoxBridge

Add https://<new-host> to VoxBridge voxcore_fleet (VoxUI → Settings → Fleet). pick_fleet_server() picks it up on the next dialout — no restart, no code change. The ingress is WSS-only, so this step is what makes the fleet usable for outbound calls.

Sizing and single point of failure

FleetsConcurrent callsIngress sizing
116n/a (no ingress)
2324 vCPU / 8 GB single VM
3484 vCPU / 8 GB single VM
4-564-808 vCPU; monitor TLS handshake CPU
6+96+second ingress + managed LB or VRRP
The ingress is a single point of failure until the HA phase. Plan the second ingress before crossing ~3 fleets.

Operations

# Backend status
echo "show stat" | socat - /run/haproxy/admin.sock | \
    awk -F, '$1=="voxcore_wss_fleet" && $2 ~ /^fleet/ {print $2, $18, "scur="$5, "slim="$6}'

# Drain one backend (no new sessions, existing keep going)
echo "set server voxcore_wss_fleet/fleet1 state drain" | socat - /run/haproxy/admin.sock

# Restore
echo "set server voxcore_wss_fleet/fleet1 state ready" | socat - /run/haproxy/admin.sock

# Stats UI (SSH tunnel from workstation)
ssh -L 8404:127.0.0.1:8404 <ingress-host>   # then open http://localhost:8404/stats