Skip to content

Preflight capability map

This document maps what the lint-pdf/ engine can detect today to the major preflight “surfaces” (print production, packaging, accessibility/PDF-UA, and AI regulatory checks), and to the builtin profiles that enable them.

  • lintpdf-default.json
    • Enabled: LPDF_*, PDFX4-*, PDFX1A-*, PDFA-*, AI_*
    • Conformance target: pdfx4
    • Notes: AI enabled (categories=["all"])
  • lintpdf-strict.json
    • Enabled: LPDF_*, PDFX4-*
    • Conformance target: pdfx4
    • Notes: stricter DPI + small-text thresholds; AI disabled (no AI_*)
  • lintpdf-advisory-only.json
    • Enabled: LPDF_* (with max_severity="advisory")
    • Conformance target: none
    • Notes: good for “non-blocking” runs; explicitly disables PDFX4-*

The main pipeline is PreflightOrchestrator.run() in src/lintpdf/profiles/orchestrator.py:

  1. Parse PDF (pikepdf)
  2. Build semantic model (pages, resources, fonts, images, boxes)
  3. Interpret content streams into events (text/image/path paint events)
  4. Run engine analyzers (deterministic LPDF_*)
  5. Run built-in conformance validators (PDF/X, PDF/A variants) where configured
  6. Run veraPDF conformance (PDF/X, PDF/A, PDF/UA) when configured and opted-in
  7. Run OCR text-region pass (best-effort; enables outlined-text heuristics)
  8. Run AI analyzers (if profile enables AI_* and profile.ai.enabled)
  9. Filter/override findings per profile rules; enrich bboxes from events
  • Page boxes / bleed / safety / dimensions: PageGeometryAnalyzer
    • IDs: LPDF_BOX_* (incl. LPDF_BOX_005, LPDF_BOX_006, LPDF_BOX_010)
  • Fonts: FontAnalyzer
    • IDs: LPDF_FONT_*
  • Images / effective DPI / compression: ImageAnalyzer
    • IDs: LPDF_IMG_*
  • Transparency / blend-space: TransparencyAnalyzer
    • IDs: LPDF_TRANS_*
  • Overprint: OverprintAnalyzer
    • IDs: LPDF_OVER_*
  • ICC profiles / output intent: IccProfileAnalyzer
    • IDs: LPDF_ICC_*
  • Metadata / XMP / language: MetadataAnalyzer
    • IDs: LPDF_META_*, LPDF_LANG_*, LPDF_XMP_*
  • Structure / encryption / interactive features: StructureAnalyzer, DocumentAnalyzer, AnnotationAnalyzer
    • IDs: LPDF_STRUCT_*, LPDF_DOC_*, LPDF_ANNOT_* (varies by module)
  • HairlineAnalyzer: detects strokes and text rendered as paths that are too thin to reproduce reliably on press. Walks the full content stream (CTM, ExtGState, color state) to compute effective line widths and font sizes after all transformations.
    • IDs: LPDF_STROKE_001 (hairline stroke), LPDF_STROKE_002 (very thin stroke), LPDF_PATH_001/LPDF_PATH_002 (thin path fill), LPDF_TEXT_001/LPDF_TEXT_002/LPDF_TEXT_003 (thin text stroke / small text)
  • LegibilityCompositeAnalyzer: catches small outlined text (text rendered in stroke-only mode, e.g., converted to outlines at a tiny point size).
    • ID: LPDF_TEXT_OUTLINED_SMALL

Codex path note (0.1.0b23+): these checks previously produced zero findings on the codex path because the event stream was empty. codex_adapter_events.py now walks pages with pikepdf and emits real PathPaintingEvent / TextRenderedEvent objects so all hairline and legibility checks fire correctly.

Color, ink coverage, and prepress heuristics

Section titled “Color, ink coverage, and prepress heuristics”
  • Ink coverage: InkCoverageAnalyzer
    • IDs: LPDF_INK_*
  • Spot colors: SpotColorAnalyzer + spot-name analyzers
    • IDs: LPDF_SPOT_*, LPDF_SPOT_NAME_*
  • Process/pure-K/rich-black classification:
    • ColorAnalyzer (IDs: LPDF_COLOR_009, LPDF_COLOR_010, …)
    • AdvancedColorAnalyzer (IDs: LPDF_ADV_* incl. LPDF_ADV_005)
  • PackagingAnalyzer (only conditionally added when profile id contains "packaging")
    • IDs: LPDF_PKG_*
  • Dieline family: DielineIso19593Analyzer, DielinePerfIndicatorAnalyzer, plus orchestrator dieline detection attachment
    • IDs: AI_DIE_* (AI) and LPDF_* packaging geometry checks depending on analyzer
  • Seal zone / keepout: SealZoneKeepoutAnalyzer
    • ID: LPDF_BOX_SEAL_ZONE_VIOLATION
  • BarcodeAnalyzer
    • 1D candidate detection + grading: LPDF_BARCODE_001013, 019031
    • 2D fill-grid heuristics: LPDF_BARCODE_014018
  • Built-in validators (engine-side) run based on profile.conformance:
    • pdfx4, pdfx1a, pdfx3, pdfa1b/2b/3b
  • veraPDF runner (src/lintpdf/conformance/verapdf_runner.py) runs when configured and emits:
    • LPDF_PDFX_CONF (PDF/X)
    • LPDF_PDFA_CONF (PDF/A)
    • LPDF_UA_CONF (PDF/UA-1; only when profile opts into LPDF_UA_*)

AI analyzers are registered in src/lintpdf/ai/** and run only when the profile enables AI.

Common families:

  • Regulatory compliance: AI_EU1169_*, AI_PHARMA_*, AI_FDA_*, AI_GHS_*, AI_COSM_*, LPDF_TOBACCO_* (implemented as AI analyzer)
  • Accessibility (contrast): AI_WCAG_*
  • “Enabled pattern” != “actually runs”: some analyzers are conditionally added (e.g. packaging analyzer depends on profile id, not checks pattern).
  • veraPDF “silent skip”: when veraPDF is unreachable, findings are suppressed (good for resilience, but you need metadata to know it didn’t run).
  • Outlined-text / hidden-layer text: hairline and legibility checks now work on the codex path (0.1.0b23+). The content_stream string-presence check (LPDF_BOX_004) was tightened in 0.1.0b22 to require four independent structural signals before flagging a page as empty.