Document AI / OCR API
⚡ Performance KPIs (measured)
🎯 Capability matrix
| Metric | Brainiall | AWS Rekognition / Textract | |
|---|---|---|---|
| Printed text OCR | ✅ Brainiall Form Parser (Fast tier) (MIT) | ✅ DetectText | = |
| Languages supported | Brainiall Form Parser English + future PaddleOCR 80+ | 8 (en, ar, ru, de, fr, it, pt, es) | R |
| Receipt/invoice → JSON (structured) | ✅ Brainiall Form Parser engine end-to-end (no separate parser) | 🟡 Use Textract (sister product, $0.05/page Forms) | ★ |
| Cost per 1k pages structured | ~$5 (CPU local) | $50 (Textract Forms) [source] | ★ |
| Open weights you can audit | ✅ Brainiall Form Parser (production-grade) | ❌ Proprietary | ★ |
| LGPD / GDPR-by-default | ✅ EU/BR datacenter | 🟡 us-east default | ★ |
📊 Quality benchmarks
| Metric | Brainiall | AWS Rekognition / Textract | |
|---|---|---|---|
| Printed text accuracy (industry OCR benchmark) | Brainiall Form Parser ~94% F1 on industry OCR benchmarks | Not published (claim 'high accuracy') [source] | |
| Structured form parsing | Brainiall Form Parser engine SOTA on industry receipt benchmark receipts | Textract Forms (separate product) |
Pricing
Free
500 pages/month
Get started.
Fast
$0.0015 / page
Brainiall Form Parser printed text. p50 327ms.
Pro
$0.005 / page
Brainiall Form Parser engine structured parsing (receipts/invoices→JSON). 5-10× cheaper than Textract.
Quickstart (Python)
Request (Pro tier)
import base64, httpx
img = base64.b64encode(open("invoice.jpg", "rb").read()).decode()
resp = httpx.post(
"https://api.brainiall.com/v1/ocr/extract/base64",
headers={"Authorization": "Bearer brnl-..."},
json={"image": img, "tier": "pro"},
)
print(resp.json())Example response (Pro = structured)
{
"request_id": "req_71c9f3…",
"processing_ms": 2385,
"tier": "pro",
"text": "ACME CORP\nInvoice #INV-9214\n…",
"confidence": 0.94,
"low_confidence": false,
"structured": {
"vendor_name": "ACME CORP",
"invoice_date": "2026-04-30",
"invoice_number": "INV-9214",
"currency": "USD",
"subtotal": 480.00,
"tax": 48.00,
"total": 528.00,
"line_items": [
{"description": "Widget A", "qty": 4,
"unit_price": 60.00, "amount": 240.00},
{"description": "Widget B", "qty": 2,
"unit_price": 120.00, "amount": 240.00}
]
},
"warnings": []
}📊 Per-field accuracy (invoices / receipts)
Pro tier (Brainiall Form Parser (Pro tier) quantized) measured on a held-out subset of public datasets: SROIE 2019 (700 receipts), industry receipt benchmark (300 receipts), FUNSD (100 invoices). N=1,100 documents, May 2026. Field-level F1 = harmonic mean of precision and recall on exact string match (vendor_name, invoice_number) or numeric tolerance ±$0.01 (totals, amounts).
| Field | Brainiall Pro F1 | AWS Textract Forms F1* | Mindee API F1* | Source |
|---|---|---|---|---|
| vendor_name | 0.94 | 0.93 | 0.96 | SROIE-2019 |
| invoice_date | 0.91 | 0.92 | 0.94 | SROIE-2019 |
| invoice_number | 0.89 | 0.88 | 0.92 | SROIE-2019 + FUNSD |
| total | 0.96 | 0.95 | 0.97 | SROIE-2019 |
| tax | 0.87 | 0.86 | 0.89 | SROIE-2019 (subset with explicit tax) |
| line_items (description) | 0.82 | 0.78 | 0.85 | industry receipt benchmark |
| line_items (qty) | 0.91 | 0.89 | 0.93 | industry receipt benchmark |
| line_items (amount) | 0.88 | 0.85 | 0.91 | industry receipt benchmark |
| Average | 0.90 | 0.88 | 0.92 | — |
*Textract / Mindee F1 estimated from public benchmarks (Brainiall Form Parser engine paper, Mindee blog, AWS reInvent talks); methodology may differ. Brainiall numbers are self-reported, audit-grade verification pending Q3 2026 alongside the SOC 2 Type II audit. We will publish independent reproduction methodology + raw data on request to hello@brainiall.com.
Bottom line: Brainiall Pro lands within ±2 percentage points of AWS Textract Forms across all standard invoice/receipt fields, while costing 5-10× less per page ($0.005 vs $0.05). Mindee leads on raw F1 by ~2 points on average — reasonable trade-off given our pricing is 20× cheaper.
Comparison methodology & disclaimer
Brainiall measurements: our production infrastructure, May 2026. Models: Brainiall Form Parser (Fast tier) (MIT, quantized) + Brainiall Form Parser (Pro tier) (MIT, quantized). Full report: Phase 1.5 Eval Report.
AWS data: Rekognition DetectText latency from AWS docs; Textract pricing from aws.amazon.com/textract/pricing/. Note that AWS routes structured form parsing to Textract (separate product, ~$50/1k pages Forms tier).
Notes:
- Brainiall S8 v1 covers Brainiall Form Parser (English printed text) + Brainiall Form Parser engine (receipts/invoices end-to-end). Multi-language OCR (PaddleOCR 80+ languages) on roadmap v1.1 Q3 2026.
- Brainiall Form Parser engine is structurally different from Textract — both extract structured fields from receipts/forms, but Brainiall Form Parser engine is end-to-end (image→JSON in one model) while Textract uses OCR + parser pipeline.
- Quality benchmarks come from original papers — independent reproduction may yield different numbers.
- Trademarks: Amazon Web Services, Rekognition, Textract are trademarks of Amazon.com, Inc. This page is informational comparison; not endorsed by AWS.
Last reviewed: May 2026.
vs Mistral OCR 3 (December 2026)
Mistral OCR 3 launched December 2026 at $0.002/page with SOTA quality on tables, figures, and math equations. It is the most disruptive new entrant in the OCR category — and we are honest that on raw OCR quality across complex documents, it likely leads. Here is how S8 (Brainiall Form Parser engine) fits next to it.
- Raw OCR quality: Mistral leads on math/figures/multi-column scientific layouts. S8 (Brainiall Form Parser engine) is solid on receipts, invoices, and technical documentation.
- Structured field extraction: S8 (Brainiall Form Parser engine) returns parsed JSON (vendor, total, dates, line items) in the SAME forward pass — Mistral returns plain text + bounding boxes; you build the parser.
- Schema validation: Pydantic schema applied to each response. Catches malformed extractions at the API boundary instead of leaking into your downstream agents.
- Audit trail: Per-call audit DB row with 90-day retention (Sprint 194). Mistral is a stateless API; you bring your own logging.
- Self-host option: Production-grade engine (Brainiall Document Reader engine + Brainiall Form Parser engine, both permissive license / MIT). Auditable for regulated industries and airgap-deployable. Mistral is API-only.
- Price: $0.005/page (S8 standard tier) vs $0.002 (Mistral). 2.5× more expensive at parity, justified by the workflow features above.
When to pick Mistral: pure cheap OCR, no downstream pipeline, SOTA quality required on math/figures.
When to pick Brainiall: need structured fields out of the box, schema validation, audit trail for compliance, or self-host capability for airgap deployments.