For direct-prompt bots on Gemini / Vertex AI, the static system prompt can be served from an explicit Vertex CachedContent so the large static portion is not re-billed on every call. VoxBridge owns the cache lifecycle (create, refresh, delete, audit); VoxCore consumes the cache name and swaps it if it expires mid-call. This page covers the VoxBridge side. For how VoxCore injects the cache into the LLM request, see Caching.

Ownership split

ConcernOwner
Create / refresh / delete CachedContent on VertexVoxBridge (scheduler + /recreate)
Persisting live_prompt_cache_state on the botVoxBridge
Audit eventsVoxBridge (live_prompt_cache_events)
Injecting cached_content into the live LLM callVoxCore
Recovering from a mid-call expiryVoxCore calls back into /recreate, then swaps

Scheduled lifecycle

start_scheduler() in services/live_prompt_cache_lifecycle.py registers two APScheduler cron jobs (timezone Asia/Kolkata by default):
JobDefault hourSettingAction
Prewarm07:00live_prompt_cache_prewarm_hourrun_prewarm — create fresh CachedContent for every enabled bot.
Cleanup23:00live_prompt_cache_cleanup_hourrun_cleanup — delete caches for disabled bots and clear their state.
The cache TTL is live_prompt_cache_ttl_hours (default 25h).
The TTL deliberately exceeds the 24h prewarm interval. A cache stays valid until the next prewarm replaces it, so an enabled bot is continuously covered. A shorter TTL (e.g. 10h) would leave a multi-hour dead window each day where calls run uncached.
Cleanup only targets bots where live_prompt_cache.enabled != true. Enabled bots’ superseded caches are not deleted nightly — they age out by TTL on Vertex. Deleting an enabled bot’s cache nightly would null a cache a running campaign still needs.

Startup catch-up

On app startup, _catch_up_missed_runs compares the last recorded run (system_runtime doc live_prompt_cache) against the most recent expected cron fire. If a prewarm or cleanup was missed (e.g. the process was down at 07:00), it runs immediately and records a catchup_triggered audit event. Run timestamps are written in a finally block so a crashing job body still advances the timestamp and avoids re-firing on the next boot.

Cross-instance dedup

run_prewarm_for_bot takes a per-bot Redis lock (voxbridge:live_prompt_prewarm:{bot_id}, nx, 300s TTL) so only one VoxBridge instance prewarms a given bot per window. The scheduled path leaves the lock to expire; the /recreate path releases it on completion (release_lock_on_completion=True) so a follow-up recreate (e.g. after a version_override bump) is not blocked.

Cache state on the bot

Both jobs and /recreate write live_prompt_cache_state on the bot document:
FieldMeaning
cache_nameVertex CachedContent resource name (null when none).
created_at / expires_atCreation time and TTL expiry.
last_prewarm_at / last_prewarm_statusLast prewarm outcome.
last_cleanup_at / last_cleanup_statusLast cleanup outcome (disabled bots).
The runtime-config builder slims this to {cache_name, expires_at} before sending it to VoxCore as live_prompt_cache_state.

Vertex operations

services/live_prompt_cache_vertex.py wraps the google-genai Vertex client:
FunctionPurpose
create_cachecaches.create with system_instruction, ttl, optional tools/display_name. Returns the cache name.
delete_cachecaches.delete; returns False on 404 (already gone), True on delete.
extend_ttlcaches.update to push the TTL out (recovery path).
Clients are cached per (project_id, location) in _vertex_clients and evicted on auth errors (401/403) so stale credentials get rebuilt. Credentials come from system settings api_keys.google_vertex (service-account JSON); location defaults to us-east4. A google_vertex bot with no project_id is skipped with a prewarm_failed event (reason: missing_project_id).

Internal endpoints (VoxCore-only)

routes/internal_live_cache.py — both require X-VoxCore-Secret.
EndpointBodyBehaviour
POST /api/v1/internal/live-prompt-cache/recreate{ bot_id }Inline single-flight prewarm for one bot. Returns { cache_name }, or 503 recreate_failed / 404 bot not found. Used by VoxCore when a cache expires mid-call.
POST /api/v1/internal/live-prompt-cache/event{ bot_id, event_type, cache_name?, details? }Append-only audit sink so VoxCore can record events like expired_in_call, swap_after_expiry. Unknown event_type400.

Audit log

services/live_prompt_cache_audit.py writes append-only docs to live_prompt_cache_events. record_cache_event validates event_type against an allow-list and never propagates insert failures (audit must not break a call). Allowed event types: created, expired_in_call, swap_after_expiry, invalidated_at_shutdown, extended, recreated_after_expiry, prewarm_succeeded, prewarm_failed, cleanup_succeeded, cleanup_failed, catchup_triggered. The collection has indexes on (bot_id, ts) and (event_type, ts), plus a 90-day TTL index on ts.

Operational checks

SymptomFirst place to check
No cache created for an enabled botlive_prompt_cache.enabled, static prompt non-empty, prewarm_failed events.
google_vertex bot never cachesMissing project_id in llm.extra (reason: missing_project_id).
Prewarm skippedRedis lock held by another instance (skipped: lock_held).
Cache deleted while still neededA bot was disabled — cleanup targets disabled bots only.
Mid-call expiry not recoveringVoxCore /recreate call, then swap_after_expiry audit event.

VoxCore caching

How VoxCore injects cached_content and swaps on expiry.

Runtime config

Where live_prompt_cache_state is sent to VoxCore.

Conversation policy

Policy bots (which do not use the live prompt cache path).

Post-call processing

The separate post-call/QC explicit cache.