Anonymous Demo — Retention Posture
Route: POST /v1/demo/extract Source: apps/api/src/routes/demo-extract.ts CI gate: scripts/check-demo-no-persistence.mjs Promise: /promises#anonymous-demo Date: 2026-05-21 (lands with P1 A2)
The anonymous demo at /demo reads the visitor's invoice once and forgets the bytes. This document is the operational record of what that promise means, what enforces it, and what to do if it ever breaks.
§1 — What the route does NOT do (the privacy invariants)
The route MUST NOT:
- Write the bytes to R2.
r2Put()is grep-banned in the route file bycheck-demo-no-persistence.mjs. - Create a
documentstable row.documentsStore/documents-storeare grep-banned. - Enter an entry in the hash-chained audit log.
audit.append()/auditLog()are grep-banned. - Touch the D1 database directly.
c.env.DBis grep-banned in the route file. - Log the file bytes.
JSON.stringify(bytes|buf|file|body|content)andconsole.log(bytes|buf|file|body|content|formData)are grep-banned. - Enqueue into the persistent ingest queue.
enqueueIngest()is grep-banned. The route callsrunExtractiondirectly, in-process.
The CI gate runs on every commit. A build that introduces any of these patterns into the route file fails before merge.
§2 — What the route DOES retain (intentional, bounded)
The route writes three values to Cloudflare Workers KV (AUTH_KV binding). All three have explicit expirations and none contain the visitor's invoice content.
| KV key | Value | TTL | Purpose |
|---|---|---|---|
demo:runs:<ip>:<YYYY-MM-DD> |
integer count | 24h | per-IP daily rate-limit (5/day default) |
demo:budget:<YYYY-MM-DD> |
integer cents spent | 48h | org-wide $50/day budget breaker |
demo:replay:<ip>:<sha256-of-bytes> |
parsed invoice JSON | 1h | replay cache so a re-upload of the same bytes from the same IP doesn't burn another extraction |
The SHA-256 hash in the replay key is a uniformly-random 256-bit value derived from the bytes; it is not reversible to the original content. The cached JSON IS the parsed invoice envelope the visitor saw — vendor name, total, line items. That cached value is identical to what we returned to her browser; it represents NO additional data we collected beyond what she saw.
The bytes themselves never reach KV. Only KV-only counters + the parsed envelope.
§3 — What the route logs
One of two events per request (apps/api/src/lib/logger.ts):
funnel.demo_upload_parsed(success):{ format, parse_ms, vendor_recognised: boolean, ip_run_count }funnel.demo_upload_failed(failure):{ format, reason, parse_ms }
Neither event contains:
- The file bytes
- The vendor name (only
vendor_recognised: true|false) - The visitor's IP
- The visitor's User-Agent
- The visitor's email (we don't have one)
- The SHA-256 of the bytes
scripts/check-funnel-no-pii.mjs (lands in P1 A14) greps these log call-sites for forbidden field names and fails the build if any appear.
§4 — Abuse-safety guarantees
| Threat | Mitigation |
|---|---|
| Budget-burn (cost attack) | $50/day org-wide breaker; route returns 503 + fallback: "static" past cap |
| Per-IP flood | 5/day per IP; returns 429 + retry_after_sec: 86400 |
| Bot abuse | Turnstile required from run-2-of-day (when TURNSTILE_SECRET_KEY provisioned). Until provisioned, route degrades to 1/day-per-IP hard cap. |
| Oversized file | 5MB content-length cap, pre-body via Content-Length header + post-read enforcement |
| Decompression bomb | Format restricted via detectFormat: only PDF / JPEG / PNG / WebP / HEIC accepted (no zips, no archives) |
| Replay (cost-amortisation attack) | SHA-256 + per-IP key, 1h TTL — same bytes from same IP within 1h return the cached result without re-extracting |
| Weaponised content (CSAM) | NCMEC hash check — STUBBED in P1 (returns false); founder enrolment + KV import required to activate. Documented in p1-done.md. |
| Emergency rollback | DEMO_ANONYMOUS_EXTRACT=off env-flag drains the route to 503 + fallback: "static" without code revert |
§5 — What to do if the route breaks the promise
If a future commit accidentally introduces persistence (the CI gate should prevent this; the recovery procedure here is for the case where the gate is somehow bypassed):
- Immediate. Flip
DEMO_ANONYMOUS_EXTRACT=offin the Worker's secrets — the route returns 503 + static fallback within seconds. - Within 1 hour. File a security incident at
security@muntin.digital. Revert the offending commit. - Within 24 hours. Publish a post-mortem on
/promisesnaming the incident, the leak surface, the affected window, and the fix. - Within 48 hours. Audit-log the incident in the public transparency report (
/legal/transparency). - Pre-relaunch. Confirm the CI gate is back to green; add a new grep pattern to
check-demo-no-persistence.mjscovering whatever the regression was.
The promise is the contract. If we cannot keep it, we say so and we stop the route until we can.
§6 — How to verify yourself (visitor's perspective)
- Open
/demoand drop a real PDF. - Watch the parsed row appear.
- Open the browser's DevTools → Network → confirm the only request is
POST /v1/demo/extractand its response is JSON (not a redirect, not a follow-up to R2). - Wait 24 hours.
- Refresh
/demo— the dropzone is back to clean. No history. No "your previous run" surface.
For the privacy-skeptical (a journalist, a regulator, a bookkeeper auditing the operator's stack):
- Read
apps/api/src/routes/demo-extract.tsend-to-end (~350 LOC). - Read
scripts/check-demo-no-persistence.mjs(~140 LOC). - Confirm the patterns the gate bans match the patterns a persistence regression would introduce.
The whole privacy mechanism for the anonymous demo lives in two files. You can verify it in under five minutes.
§7 — Open items (P1 debt → P2)
These are real, named, dated:
- NCMEC binding — stubbed in P1; requires founder NCMEC enrolment.
- ASN-aware rate-limit — caps per-/24 to prevent DigitalOcean / AWS / GCP egress floods. Deferred to P2.
- Docling-Fly cold-start pre-warm cron —
0 8 *to warm a docling machine ahead of the morning's first chef-owner. Deferred to P2.
Each of these tightens the safety floor without changing the privacy promise. The promise stands as-is.