Live prompt cache - Vox Systems

For direct-prompt bots on Gemini / Vertex AI, the static system prompt can be served from an explicit Vertex CachedContent so the large static portion is not re-billed on every call. VoxBridge owns the cache lifecycle (create, refresh, delete, audit); VoxCore consumes the cache name and swaps it if it expires mid-call. This page covers the VoxBridge side. For how VoxCore injects the cache into the LLM request, see Caching.

Ownership split

Concern	Owner
Create / refresh / delete `CachedContent` on Vertex	VoxBridge (scheduler + `/recreate`)
Persisting `live_prompt_cache_state` on the bot	VoxBridge
Audit events	VoxBridge (`live_prompt_cache_events`)
Injecting `cached_content` into the live LLM call	VoxCore
Recovering from a mid-call expiry	VoxCore calls back into `/recreate`, then swaps

Scheduled lifecycle

start_scheduler() in services/live_prompt_cache_lifecycle.py registers two APScheduler cron jobs (timezone Asia/Kolkata by default):

Job	Default hour	Setting	Action
Prewarm	07:00	`live_prompt_cache_prewarm_hour`	`run_prewarm` — create fresh `CachedContent` for every enabled bot.
Cleanup	23:00	`live_prompt_cache_cleanup_hour`	`run_cleanup` — delete caches for disabled bots and clear their state.

The cache TTL is live_prompt_cache_ttl_hours (default 25h).

The TTL deliberately exceeds the 24h prewarm interval. A cache stays valid until the next prewarm replaces it, so an enabled bot is continuously covered. A shorter TTL (e.g. 10h) would leave a multi-hour dead window each day where calls run uncached.

Cleanup only targets bots where live_prompt_cache.enabled != true. Enabled bots’ superseded caches are not deleted nightly — they age out by TTL on Vertex. Deleting an enabled bot’s cache nightly would null a cache a running campaign still needs.

Startup catch-up

On app startup, _catch_up_missed_runs compares the last recorded run (system_runtime doc live_prompt_cache) against the most recent expected cron fire. If a prewarm or cleanup was missed (e.g. the process was down at 07:00), it runs immediately and records a catchup_triggered audit event. Run timestamps are written in a finally block so a crashing job body still advances the timestamp and avoids re-firing on the next boot.

Cross-instance dedup

run_prewarm_for_bot takes a per-bot Redis lock (voxbridge:live_prompt_prewarm:{bot_id}, nx, 300s TTL) so only one VoxBridge instance prewarms a given bot per window. The scheduled path leaves the lock to expire; the /recreate path releases it on completion (release_lock_on_completion=True) so a follow-up recreate (e.g. after a version_override bump) is not blocked.

Cache state on the bot

Both jobs and /recreate write live_prompt_cache_state on the bot document:

Field	Meaning
`cache_name`	Vertex `CachedContent` resource name (null when none).
`created_at` / `expires_at`	Creation time and TTL expiry.
`last_prewarm_at` / `last_prewarm_status`	Last prewarm outcome.
`last_cleanup_at` / `last_cleanup_status`	Last cleanup outcome (disabled bots).

The runtime-config builder slims this to {cache_name, expires_at} before sending it to VoxCore as live_prompt_cache_state.

Vertex operations

services/live_prompt_cache_vertex.py wraps the google-genai Vertex client:

Function	Purpose
`create_cache`	`caches.create` with `system_instruction`, `ttl`, optional `tools`/`display_name`. Returns the cache name.
`delete_cache`	`caches.delete`; returns `False` on `404` (already gone), `True` on delete.
`extend_ttl`	`caches.update` to push the TTL out (recovery path).

Clients are cached per (project_id, location) in _vertex_clients and evicted on auth errors (401/403) so stale credentials get rebuilt. Credentials come from system settings api_keys.google_vertex (service-account JSON); location defaults to us-east4. A google_vertex bot with no project_id is skipped with a prewarm_failed event (reason: missing_project_id).

Internal endpoints (VoxCore-only)

routes/internal_live_cache.py — both require X-VoxCore-Secret.

Endpoint	Body	Behaviour
`POST /api/v1/internal/live-prompt-cache/recreate`	`{ bot_id }`	Inline single-flight prewarm for one bot. Returns `{ cache_name }`, or `503 recreate_failed` / `404 bot not found`. Used by VoxCore when a cache expires mid-call.
`POST /api/v1/internal/live-prompt-cache/event`	`{ bot_id, event_type, cache_name?, details? }`	Append-only audit sink so VoxCore can record events like `expired_in_call`, `swap_after_expiry`. Unknown `event_type` → `400`.

Audit log

services/live_prompt_cache_audit.py writes append-only docs to live_prompt_cache_events. record_cache_event validates event_type against an allow-list and never propagates insert failures (audit must not break a call). Allowed event types: created, expired_in_call, swap_after_expiry, invalidated_at_shutdown, extended, recreated_after_expiry, prewarm_succeeded, prewarm_failed, cleanup_succeeded, cleanup_failed, catchup_triggered. The collection has indexes on (bot_id, ts) and (event_type, ts), plus a 90-day TTL index on ts.

Operational checks

Symptom	First place to check
No cache created for an enabled bot	`live_prompt_cache.enabled`, static prompt non-empty, `prewarm_failed` events.
`google_vertex` bot never caches	Missing `project_id` in `llm.extra` (`reason: missing_project_id`).
Prewarm skipped	Redis lock held by another instance (`skipped: lock_held`).
Cache deleted while still needed	A bot was disabled — cleanup targets disabled bots only.
Mid-call expiry not recovering	VoxCore `/recreate` call, then `swap_after_expiry` audit event.

VoxCore caching

How VoxCore injects cached_content and swaps on expiry.

Runtime config

Where live_prompt_cache_state is sent to VoxCore.

Conversation policy

Policy bots (which do not use the live prompt cache path).

Post-call processing

The separate post-call/QC explicit cache.

​Ownership split

​Scheduled lifecycle

​Startup catch-up

​Cross-instance dedup

​Cache state on the bot

​Vertex operations

​Internal endpoints (VoxCore-only)

​Audit log

​Operational checks

​Related docs

VoxCore caching

Runtime config

Conversation policy

Post-call processing

Ownership split

Scheduled lifecycle

Startup catch-up

Cross-instance dedup

Cache state on the bot

Vertex operations

Internal endpoints (VoxCore-only)

Audit log

Operational checks

Related docs