Usage Guide¶
This guide covers the day-to-day workflow for using VERA to process scanned documents.
Uploading Documents¶
VERA accepts single images and multi-page PDFs.
Supported formats:
- Images: JPEG, PNG
- Documents: PDF (multi-page supported)
Size limit: Configurable via MAX_UPLOAD_MB (default 25 MB).
To upload, click the upload button in the web UI or use the API directly:
curl -X POST http://localhost:4000/documents/upload \
-F "file=@invoice.pdf" \
-b "vera_session=YOUR_SESSION_COOKIE"
After upload, VERA immediately queues the document for OCR processing. Multi-page PDFs are automatically split into individual pages, each processed independently.
Processing time
OCR processing runs in a background Celery worker. Processing time depends on document complexity and page count. You can monitor progress via the status stream endpoint or the web UI.
Reviewing OCR Results¶
Once processing completes (status ocr_done), the document is ready for review.
Token Display¶
Each page shows its extracted tokens with visual indicators:
| Confidence Label | Meaning | Action Required |
|---|---|---|
trusted |
High confidence (OCR is very sure) | Review optional |
medium |
Moderate confidence | Review recommended |
low |
Low confidence (likely misread) | Review required |
Tokens with forced_review: true must be explicitly reviewed before the page can be validated.
Correcting Tokens¶
To correct a misread token:
- Click on the token in the review UI.
- Edit the text to the correct value.
- The correction is tracked as a diff (original text vs. corrected text).
All corrections are saved to the corrections table and recorded in the audit log.
Bounding Boxes¶
Each token includes a bounding box (bbox) with coordinates [x_min, y_min, x_max, y_max] relative to the page image. The UI overlays these on the scanned image so you can see exactly where each token was detected.
Structured Fields¶
VERA automatically extracts structured key-value pairs from document text based on document type (invoice, receipt, statement). Common fields include:
- Vendor / company name
- Date
- Total / amount due
- Subtotal
You can manually edit structured fields during review or update them via the API:
curl -X POST http://localhost:4000/documents/{id}/fields \
-H "Content-Type: application/json" \
-d '{"structured_fields": {"vendor": "Acme Corp", "total": "142.50"}}' \
-b "vera_session=YOUR_SESSION_COOKIE"
Validating Documents¶
Validation is the core gate in VERA's workflow. A document (or page) cannot be summarized or exported until it is validated.
Single-Page Documents¶
Submit corrections and mark the review as complete:
curl -X POST http://localhost:4000/documents/{id}/validate \
-H "Content-Type: application/json" \
-d '{
"corrections": [
{"token_id": "abc123", "corrected_text": "Invoice"}
],
"reviewed_token_ids": ["abc123", "def456"],
"review_complete": true,
"structured_fields": {"vendor": "Acme Corp"}
}' \
-b "vera_session=YOUR_SESSION_COOKIE"
Multi-Page Documents¶
For multi-page PDFs, validate each page individually:
curl -X POST http://localhost:4000/documents/{doc_id}/pages/{page_id}/validate \
-H "Content-Type: application/json" \
-d '{
"corrections": [],
"reviewed_token_ids": ["tok1", "tok2"],
"review_complete": true,
"page_version": 1
}' \
-b "vera_session=YOUR_SESSION_COOKIE"
Page versioning
Multi-page validation requires a page_version field to prevent conflicts when multiple reviewers work on the same document. If the version does not match, the API returns 409 Conflict.
The parent document transitions to validated only when all pages are validated.
Generating Summaries¶
After validation, request an AI-generated summary via Ollama:
# Document-level summary
curl http://localhost:4000/documents/{id}/summary \
-b "vera_session=YOUR_SESSION_COOKIE"
# Page-level summary
curl http://localhost:4000/documents/{doc_id}/pages/{page_id}/summary \
-b "vera_session=YOUR_SESSION_COOKIE"
You can override the default model with a query parameter:
curl "http://localhost:4000/documents/{id}/summary?model=llama3.2:3b" \
-b "vera_session=YOUR_SESSION_COOKIE"
The response includes:
bullet_summary-- a list of bullet-point stringsstructured_fields-- AI-extracted key-value pairsvalidation_status-- updated document status
Ollama required
Summaries require a running Ollama instance with the configured model pulled. If Ollama is unavailable, the endpoint returns 503.
Exporting¶
Export validated documents in three formats:
| Format | Content-Type | Description |
|---|---|---|
json |
application/json |
Full payload with document ID, validated text, and structured fields |
csv |
text/csv |
Key-value rows: document_id, validated_text, plus each structured field |
txt |
text/plain |
Raw validated text only |
# JSON (default)
curl "http://localhost:4000/documents/{id}/export" \
-b "vera_session=YOUR_SESSION_COOKIE"
# CSV
curl "http://localhost:4000/documents/{id}/export?format=csv" \
-b "vera_session=YOUR_SESSION_COOKIE"
# Plain text
curl "http://localhost:4000/documents/{id}/export?format=txt" \
-b "vera_session=YOUR_SESSION_COOKIE"
Page-level export is also available for multi-page documents:
curl "http://localhost:4000/documents/{doc_id}/pages/{page_id}/export?format=json" \
-b "vera_session=YOUR_SESSION_COOKIE"
Export marks the document
Exporting transitions the document status to exported. This is recorded in the audit log.
Multi-Page Workflow Summary¶
For a multi-page PDF, the typical workflow is:
- Upload the PDF. VERA splits it into individual pages.
- Monitor processing via the status stream or polling endpoint.
- Review each page independently -- correct tokens, fill structured fields.
- Validate each page (with
page_versionfor conflict detection). - Once all pages are validated, the parent document status becomes
validated. - Summarize at the document or page level.
- Export the full document or individual pages.
Canceling Processing¶
If a document is stuck in uploaded or processing status, you can cancel it:
This revokes the Celery task and sets the document status to canceled.
Audit Trail¶
Every document has a complete audit log accessible via:
Events include: ocr_completed, ocr_canceled, fields_updated, validated, exported, and more. Each entry records the event type, actor, detail payload, and timestamp.