This section is for the internal engineering and operations team. It describes implementation details, scaling limits, failure modes, and current trade-offs. Do not treat it as client-facing product collateral.

What Vox is

Vox is a voice AI contact-center platform split into five working repos:
RepoPlaneResponsibility
voxcoreRuntime/media planeRuns live calls with Pipecat, STT, LLM, TTS, transport adapters, recording, post-call packaging, and Agent Desk handoff hooks.
voxbridgeControl planeOwns auth, bots, campaigns, carriers, settings, runtime config, call records, CRM integrations, API keys, recordings, and fleet routing.
voxuiOperator surfaceAdmin dashboard and agent console for bot building, campaigns, calls, fleet, settings, Agent Desk, and knowledge bases.
voxdiallerCampaign execution planeReads campaign queues, paces outbound SIP calls, screens AMD, reserves VoxCore fleet slots, and tracks retries.
vohci-widgetEmbeddable web surfaceBrowser call widget (React + LiveKit client) that opens a WebRTC call into a LiveKit room handled by VoxCore via /livekit/widget.
The key design choice is separation of call execution from business state. VoxCore should be able to run a call, emit results, and die without owning durable application data. VoxBridge owns durable state.

Mental model

Current state

The platform is production-shaped for controlled client deployments and runs as multiple independent multi-fleet deployments today (Aetherix, Ori, Tata AIG, CX Bridge, Pelocal, Novus, Credgenics). See Deployments.
  • Multiple call transports: WebSocket (iCallMate), LiveKit SIP inbound, LiveKit SIP outbound, Exotel WebSocket, and web widget.
  • Shared pipeline factory and shared post-call logic across transports.
  • One-call-per-worker VoxCore capacity model behind nginx least_conn and max_conns=1.
  • VoxBridge fleet selection for outbound calls; per-deployment fleets can span multiple hosts behind an HAProxy ingress.
  • Campaign manager and VoxDialler service.
  • Knowledge Base RAG, custom tool telemetry, CRM API keys, recording access tokens, and Agent Desk.

Shipped cost/quality features

FeatureWhat it doesWhere
Live prompt cachingGemini/Vertex CachedContent for the static policy/system prompt, plus OpenAI prompt_cache_key hints, injected before create_llm_service(). When active, factory.py passes tools=NOT_GIVEN and folds tools/system instruction into the cached content.voxcore pipeline/live_prompt_cache.py, pipeline/factory.py
Post-call cachingVersion-based explicit Gemini cache for analysis/QC prompts, gated by post_call_cache_enabled / post_call_cache_version.voxcore processors/post_call.py
Callback schedulingTwo-stage prompt-injection + conditional extraction; double-gated by bot toggle and campaign flag.voxbridge + voxcore post-call
TTS cachingCachedElevenLabsTTSService reuses generated audio via Redis when tts.cache_config.enabled.voxcore services
The platform is not yet shaped for cheap 1K-channel scale:
  • Fleet discovery is still a configured list of URLs.
  • Capacity reservation is still health-poll based rather than a central atomic slot registry.
  • VoxDialler is one primary loop process.
  • Observability is partial.
  • Dockerization is not the current production deployment path.

Engineering priorities

  1. Keep the current product behavior stable.
  2. Document the concepts clearly enough that new engineers can debug calls without oral history.
  3. Containerize without rewriting the runtime.
  4. Replace static fleet routing with capacity-aware worker registration before large-scale expansion.
  5. Build an ops runbook so bot and campaign configuration is repeatable.