This section is for the internal engineering and operations team. It describes implementation details, scaling limits, failure modes, and current trade-offs. Do not treat it as client-facing product collateral.
What Vox is
Vox is a voice AI contact-center platform split into five working repos:
| Repo | Plane | Responsibility |
|---|
voxcore | Runtime/media plane | Runs live calls with Pipecat, STT, LLM, TTS, transport adapters, recording, post-call packaging, and Agent Desk handoff hooks. |
voxbridge | Control plane | Owns auth, bots, campaigns, carriers, settings, runtime config, call records, CRM integrations, API keys, recordings, and fleet routing. |
voxui | Operator surface | Admin dashboard and agent console for bot building, campaigns, calls, fleet, settings, Agent Desk, and knowledge bases. |
voxdialler | Campaign execution plane | Reads campaign queues, paces outbound SIP calls, screens AMD, reserves VoxCore fleet slots, and tracks retries. |
vohci-widget | Embeddable web surface | Browser call widget (React + LiveKit client) that opens a WebRTC call into a LiveKit room handled by VoxCore via /livekit/widget. |
The key design choice is separation of call execution from business state. VoxCore should be able to run a call, emit results, and die without owning durable application data. VoxBridge owns durable state.
Mental model
Current state
The platform is production-shaped for controlled client deployments and runs as multiple independent multi-fleet deployments today (Aetherix, Ori, Tata AIG, CX Bridge, Pelocal, Novus, Credgenics). See Deployments.
- Multiple call transports: WebSocket (iCallMate), LiveKit SIP inbound, LiveKit SIP outbound, Exotel WebSocket, and web widget.
- Shared pipeline factory and shared post-call logic across transports.
- One-call-per-worker VoxCore capacity model behind nginx
least_conn and max_conns=1.
- VoxBridge fleet selection for outbound calls; per-deployment fleets can span multiple hosts behind an HAProxy ingress.
- Campaign manager and VoxDialler service.
- Knowledge Base RAG, custom tool telemetry, CRM API keys, recording access tokens, and Agent Desk.
Shipped cost/quality features
| Feature | What it does | Where |
|---|
| Live prompt caching | Gemini/Vertex CachedContent for the static policy/system prompt, plus OpenAI prompt_cache_key hints, injected before create_llm_service(). When active, factory.py passes tools=NOT_GIVEN and folds tools/system instruction into the cached content. | voxcore pipeline/live_prompt_cache.py, pipeline/factory.py |
| Post-call caching | Version-based explicit Gemini cache for analysis/QC prompts, gated by post_call_cache_enabled / post_call_cache_version. | voxcore processors/post_call.py |
| Callback scheduling | Two-stage prompt-injection + conditional extraction; double-gated by bot toggle and campaign flag. | voxbridge + voxcore post-call |
| TTS caching | CachedElevenLabsTTSService reuses generated audio via Redis when tts.cache_config.enabled. | voxcore services |
The platform is not yet shaped for cheap 1K-channel scale:
- Fleet discovery is still a configured list of URLs.
- Capacity reservation is still health-poll based rather than a central atomic slot registry.
- VoxDialler is one primary loop process.
- Observability is partial.
- Dockerization is not the current production deployment path.
Engineering priorities
- Keep the current product behavior stable.
- Document the concepts clearly enough that new engineers can debug calls without oral history.
- Containerize without rewriting the runtime.
- Replace static fleet routing with capacity-aware worker registration before large-scale expansion.
- Build an ops runbook so bot and campaign configuration is repeatable.