Usage walkthrough
Usage walkthrough
Section titled “Usage walkthrough”End-to-end: fetch → generate → benchmark → report → validate. For installation, see install.md. For per-flag CLI detail, see cli.md. For common errors, see troubleshooting.md.
1. Fetch vendor assets
Section titled “1. Fetch vendor assets”uv run assay fetch # add --force to re-download even if checksums matchDownloads the GWG 2022 spec docs, GOS 5.0 suites, and the Processing Steps Test Suite into vendor/. Each download is verified against vendor/checksums.json (SHA-256). Skipped on subsequent runs unless --force is passed.
2. Generate the corpus
Section titled “2. Generate the corpus”uv run assay generate # all 175 filesuv run assay generate --only-rule R0014 # just R0014 negativesuv run assay generate --only-variant sheetcmyk-cmyk # just one variantuv run assay generate --seed 42 # alternate deterministic seedWrites PDFs into corpus/positive/ and corpus/negative/, plus corpus/manifest.json (per-file SHA-256 and expected outcome). Variant kebab names live in src/assay_pdf/generator/variants.py.
3. Benchmark an engine
Section titled “3. Benchmark an engine”uv run assay benchmark --engine pdftoolbox # run all variantsuv run assay benchmark --engine pitstop --profile webcmyk-cmykuv run assay benchmark --engine lintpdf # stub — emits warnings until the API shipsEach run writes both raw EngineResult JSON and a confusion-matrix *.score.json to results/. Engine selection requires the engine binary on PATH (or pointed to via env var — see reproducing.md). Exits 2 with a clear message if the runner isn’t installed.
Aggregate output looks like:
✓ pdftoolbox score: TP=143 FP=2 FN=7 TN=3045 (12834ms aggregate runtime)4. Render a report
Section titled “4. Render a report”uv run assay report --format md > REPORT.mduv run assay report --format html --output REPORT.htmlassay report aggregates every results/*.score.json it finds, so to compare engines run assay benchmark once per engine before rendering.
5. (Optional) Validate
Section titled “5. (Optional) Validate”uv run assay validate # full verapdf PDF/X-4 walkuv run assay validate --schema-only # skip verapdf; check schemas onlyUsed in CI on every commit. Exits 1 if any corpus PDF fails verapdf.
Convenience shortcuts (Justfile)
Section titled “Convenience shortcuts (Justfile)”just install # uv sync --all-extrasjust check-deps # verify ghostscript/qpdf/mutool/exiftool/imagemagick/verapdfjust build # ingest → generate → validatejust bench pdftoolbox sheetcmyk-cmyk # uv run assay benchmark --engine ... --profile ...just report md # uv run assay report --format mdRun just --list for the full task list.