Skip to content

Codex service-level objectives

Published SLOs for codex-pdf. These are targets, not contractual guarantees — but consumers can size their own SLOs against them, and operators should alert when codex falls below the documented bands.

SurfaceTargetMeasurement window
GET /v1/healthz99.95 %30-day rolling
POST /v1/extract99.9 %30-day rolling
GET /v1/documents/{id}/text-regions99.9 %30-day rolling
POST /v1/documents/{id}/conformance/{p}99.9 %30-day rolling
GET /v1/documents/{id}/renders99.9 %30-day rolling
Render / sample / walk POSTs99.5 %30-day rolling

Availability is 1 - (error_requests / total_requests) where error_requests is the count of responses with status ≥ 500. 429 Too Many Requests is deliberate load-shedding and does NOT count against availability — it’s a contract output, not a failure.

p95 wall-clock from request hit at the codex API to last byte. Numbers are on a warm cache; cold-cache p95 is typically 3-10× higher.

Endpointp50p95p99
GET /v1/healthz5 ms25 ms50 ms
POST /v1/probe (warm)10 ms50 ms150 ms
POST /v1/extract (warm)30 ms200 ms800 ms
POST /v1/extract (cold)300 ms2 s6 s
GET .../text-regions (warm)5 ms30 ms100 ms
POST .../conformance/{p} (warm)5 ms25 ms80 ms
POST .../conformance/{p} (cold, includes parse)50 ms200 ms800 ms
GET .../renders5 ms25 ms60 ms
POST /v1/render/page (cold, Ghostscript)500 ms4 s12 s

Cold-path latency includes the upstream PDF parse (extract_document) which dominates the response. Render endpoints additionally depend on Ghostscript performance.

For each endpoint, recommend two alert lanes:

  • Slowhistogram_quantile(0.95, sum by (le) (rate(codex_api_request_seconds_bucket{endpoint="<name>"}[5m]))) greater than the table’s p95 × 2 for 10 minutes.
  • Failingrate(codex_api_requests_total{endpoint="<name>",status=~"5.."}[5m]) > 1 % of total for 5 minutes.

429-tagged requests are excluded — they’re shed-on-policy, not errors.

Per endpoint, the warm/total ratio:

cache_hit_rate =
rate(codex_api_cache_lookups_total{outcome="hit"}[5m])
/ rate(codex_api_cache_lookups_total[5m])
EndpointExpected hit rate
POST /v1/extract≥ 80 % during steady-state
GET .../text-regions≥ 70 %
POST .../conformance/{p}≥ 90 % (verdicts are idempotent)
POST /v1/render/page≥ 60 % (more cache-key dimensions)

Sustained dip below the floor indicates either a key-shape change (check CODEX_VERSION rotation) or a Redis eviction storm.

The AI signal lane is opt-in (CODEX_AI_ENABLED=true) and optional — these SLOs apply only on deployments where it’s turned on. AI extractors add a per-call Claude latency on top of the regular extract pipeline.

Surfacep50p95p99
POST /v1/extract w/ AI (first hit)4.0 s12 s25 s
POST /v1/extract w/ AI (cache hit)unchanged from non-AI baseline
GET /v1/documents/{hash}/signals/{kind} (cache hit)30 ms150 ms400 ms
GET /v1/documents/{hash}/signals/{kind} (cache miss)1.5 s6 s15 s

Vision-backed kinds (logos, symbols) dominate p99 because of Claude Sonnet vision latency. Text-only kinds (language, spell, classification) land near the p50.

SLOTargetWhy
Per-request spendCODEX_AI_COST_CAP_USD_PER_REQUEST (default $0.10)Hard cap enforced by codex_pdf.ai.budget.AiBudget before each call
ai_budget_exceeded warning rate< 0.1 % of AI-enabled requestsHigher rate means the default cap is too tight for the deployment’s typical PDF size

Tracked via the new codex_ai_signal_calls_total{kind, model, status} counter (1.13.0 +):

ai_success_rate{kind} =
rate(codex_ai_signal_calls_total{kind=..., status="ok"}[5m])
/ rate(codex_ai_signal_calls_total{kind=...}[5m])
kindExpected success rate
language≥ 99 % (text input, Haiku)
classification≥ 99 % (text input, Haiku)
spell≥ 99 % (text input, Haiku)
barcodes≥ 95 % (depends on barcode quality in source PDF)
logos≥ 90 % (vision; Sonnet occasionally times out on dense pages)
symbols≥ 90 % (vision; same characteristic as logos)

Sustained dip below the band means a prompt regression or a Claude model rollover — bump the per-extractor prompt version in codex_pdf.ai.versions to force consumers to invalidate stale caches deliberately.

GET /v1/contract returns ai_model_versions — a map of {kind: {model, prompt, schema}} — so SDK consumers can pin against the exact extractor that produced a signal. Operators who change the prompt MUST bump the per-kind prompt constant in codex_pdf.ai.versions so consumers can invalidate stale caches deliberately.

  • The 1.9.x rc series may not yet hit every band — that’s the “rc” status. Final 1.9.0 ships when these numbers are observed on the deployed surface.
  • SLOs are per replica unless stated otherwise. Multi-replica fleets aggregate. Distributed rate-limit accounting is on the roadmap; see policies.md for the current model.
  • Alert thresholds should track 30-day rolling deployment health, not single-day spikes — codex is in front of upstream PDF parsers whose performance varies widely with PDF size + complexity. Use percentile-of-percentile alerting where available.