Every call runs through the same pipeline regardless of transport. The pipeline is built by a factory function that assembles Pipecat processors based on the bot configuration.

Pipeline structure

Audio flows left to right. The Audio Buffer sits after transport output so it captures exactly what was played to the customer.

Services

STT (speech-to-text)

ProviderModelNotes
DeepgramNova-3Default. Real-time streaming.
Sonioxstt-rt-v4Alternative provider.
Configured via stt.provider and stt.model in bot config. Extra options (endpointing, smart formatting, language hints) pass through stt.extra.

LLM (language model)

ProviderExample modelsNotes
Google (Gemini)gemini-2.5-flashDefault. Via Google Generative AI API.
OpenAIgpt-4.1-miniVia OpenAI API.
Google Vertex AIgemini-2.5-flashRequires project_id in llm.extra.
The LLM receives the system prompt and conversation context. It can call built-in functions (end_call, transfer_call, detected_voicemail) and custom tools defined in the bot config.

TTS (text-to-speech)

ProviderExample modelsNotes
ElevenLabseleven_flash_v2_5Default. Multilingual models require language code.
Sarvambulbul:v2Indian language support.
TarangCustom HTTP-based TTS.

VAD and turn detection

Voice Activity Detection (VAD): Silero ONNX model detects when the customer is speaking. Configurable via vad in bot config (confidence threshold, start/stop timing, minimum volume). Turn detection: SmartTurnV3 ONNX model determines when the customer has finished their turn, triggering the LLM response. Interruptions: Enabled by default. When the customer speaks over the bot, TTS output is cancelled and the LLM processes the new input. min_words_interruption (default: 3) prevents accidental interruptions from short utterances.

Built-in functions

The LLM has access to three built-in functions:
FunctionBehavior
end_call()Bot speaks any final message, then hangs up. Sets disconnected_by = "bot".
transfer_call(reason)Speaks pre-transfer message, then transfers to configured number.
detected_voicemail()Speaks voicemail message (if configured), then hangs up. Sets disconnected_by = "voicemail".
Custom tools defined in bot config are also registered. Tool calls and their results are logged in call events.

Re-engagement (dead air handling)

When the customer goes silent mid-call, VoxCore prompts them to respond:
  1. After gap_seconds of silence, speak a re-engagement message (shuffled, non-repeating)
  2. Reduce the gap for subsequent attempts
  3. After max_retries exceeded, end the call with disconnected_by = "RNR"
Configured via re_engagement in bot config: messages (list of prompts), gap_seconds (int or [first, subsequent]), max_retries.

Recording

Audio is captured by an AudioBufferProcessor placed after transport output. On call end, the buffer is encoded as WAV and uploaded to object storage (DigitalOcean Spaces). The recording URL and storage key are included in call results.

Max duration

If max_call_duration_seconds is set in bot config (default: 600), the pipeline automatically ends the call when the limit is reached. Sets disconnected_by = "timeout".

Latency tracking

Per-turn latency is measured across three stages:
MetricWhat it measures
stt_msTime-to-first-byte from STT processor
llm_msTime-to-first-byte from LLM processor
tts_msTime-to-first-byte from TTS processor
total_msEnd-to-end response latency
Samples are collected per turn and averaged in the final call results.