Engineering overview - Vox Systems

This section is for the internal engineering and operations team. It describes implementation details, scaling limits, failure modes, and current trade-offs. Do not treat it as client-facing product collateral.

What Vox is

Vox is a voice AI contact-center platform split into five working repos:

Repo	Plane	Responsibility
`voxcore`	Runtime/media plane	Runs live calls with Pipecat, STT, LLM, TTS, transport adapters, recording, post-call packaging, and Agent Desk handoff hooks.
`voxbridge`	Control plane	Owns auth, bots, campaigns, carriers, settings, runtime config, call records, CRM integrations, API keys, recordings, and fleet routing.
`voxui`	Operator surface	Admin dashboard and agent console for bot building, campaigns, calls, fleet, settings, Agent Desk, and knowledge bases.
`voxdialler`	Campaign execution plane	Reads campaign queues, paces outbound SIP calls, screens AMD, reserves VoxCore fleet slots, and tracks retries.
`vohci-widget`	Embeddable web surface	Browser call widget (React + LiveKit client) that opens a WebRTC call into a LiveKit room handled by VoxCore via `/livekit/widget`.

The key design choice is separation of call execution from business state. VoxCore should be able to run a call, emit results, and die without owning durable application data. VoxBridge owns durable state.

Mental model

Current state

The platform is production-shaped for controlled client deployments and runs as multiple independent multi-fleet deployments today (Aetherix, Ori, Tata AIG, CX Bridge, Pelocal, Novus, Credgenics). See Deployments.

Multiple call transports: WebSocket (iCallMate), LiveKit SIP inbound, LiveKit SIP outbound, Exotel WebSocket, and web widget.
Shared pipeline factory and shared post-call logic across transports.
One-call-per-worker VoxCore capacity model behind nginx least_conn and max_conns=1.
VoxBridge fleet selection for outbound calls; per-deployment fleets can span multiple hosts behind an HAProxy ingress.
Campaign manager and VoxDialler service.
Knowledge Base RAG, custom tool telemetry, CRM API keys, recording access tokens, and Agent Desk.

Shipped cost/quality features

Feature	What it does	Where
Live prompt caching	Gemini/Vertex `CachedContent` for the static policy/system prompt, plus OpenAI `prompt_cache_key` hints, injected before `create_llm_service()`. When active, `factory.py` passes `tools=NOT_GIVEN` and folds tools/system instruction into the cached content.	`voxcore` `pipeline/live_prompt_cache.py`, `pipeline/factory.py`
Post-call caching	Version-based explicit Gemini cache for analysis/QC prompts, gated by `post_call_cache_enabled` / `post_call_cache_version`.	`voxcore` `processors/post_call.py`
Callback scheduling	Two-stage prompt-injection + conditional extraction; double-gated by bot toggle and campaign flag.	`voxbridge` + `voxcore` post-call
TTS caching	`CachedElevenLabsTTSService` reuses generated audio via Redis when `tts.cache_config.enabled`.	`voxcore` services

The platform is not yet shaped for cheap 1K-channel scale:

Fleet discovery is still a configured list of URLs.
Capacity reservation is still health-poll based rather than a central atomic slot registry.
VoxDialler is one primary loop process.
Observability is partial.
Dockerization is not the current production deployment path.

Engineering priorities

Keep the current product behavior stable.
Document the concepts clearly enough that new engineers can debug calls without oral history.
Containerize without rewriting the runtime.
Replace static fleet routing with capacity-aware worker registration before large-scale expansion.
Build an ops runbook so bot and campaign configuration is repeatable.

​What Vox is

​Mental model

​Current state

​Shipped cost/quality features

​Engineering priorities

What Vox is

Mental model

Current state

Shipped cost/quality features

Engineering priorities