Skip to main content

Document AI / OCR API

OCR Fast tier 327 ms p50 · Brainiall Form Parser engine receipt-to-JSON Pro tier · 5-10× cheaper than AWS Textract for structured forms. MIT-licensed open weights.

Get free API keyRead docs

⚡ Performance KPIs (measured)

MetricBrainiallAWS Rekognition / Textract
OCR Fast tier p50 (Brainiall Form Parser-quantized)327 ms [source]300–700 ms (DetectText)
Receipt parse p50 (Brainiall Form Parser engine-quantized)2385 msTextract Forms ~3-5 s + $$$ [source]
Throughput per CPU coreFast 3 RPS · Pro 0.3 RPSCloud auto-scale=

🎯 Capability matrix

MetricBrainiallAWS Rekognition / Textract
Printed text OCR✅ Brainiall Form Parser (Fast tier) (MIT)✅ DetectText=
Languages supportedBrainiall Form Parser English + future PaddleOCR 80+8 (en, ar, ru, de, fr, it, pt, es)R
Receipt/invoice → JSON (structured)✅ Brainiall Form Parser engine end-to-end (no separate parser)🟡 Use Textract (sister product, $0.05/page Forms)
Cost per 1k pages structured~$5 (CPU local)$50 (Textract Forms) [source]
Open weights you can audit✅ Brainiall Form Parser (production-grade)❌ Proprietary
LGPD / GDPR-by-default✅ EU/BR datacenter🟡 us-east default

📊 Quality benchmarks

MetricBrainiallAWS Rekognition / Textract
Printed text accuracy (industry OCR benchmark)Brainiall Form Parser ~94% F1 on industry OCR benchmarksNot published (claim 'high accuracy') [source]
Structured form parsingBrainiall Form Parser engine SOTA on industry receipt benchmark receiptsTextract Forms (separate product)

Pricing

Free

500 pages/month

Get started.

Fast

$0.0015 / page

Brainiall Form Parser printed text. p50 327ms.

Pro

$0.005 / page

Brainiall Form Parser engine structured parsing (receipts/invoices→JSON). 5-10× cheaper than Textract.

Quickstart (Python)

Request (Pro tier)

import base64, httpx
img = base64.b64encode(open("invoice.jpg", "rb").read()).decode()
resp = httpx.post(
 "https://api.brainiall.com/v1/ocr/extract/base64",
 headers={"Authorization": "Bearer brnl-..."},
 json={"image": img, "tier": "pro"},
)
print(resp.json())

Example response (Pro = structured)

{
 "request_id": "req_71c9f3…",
 "processing_ms": 2385,
 "tier": "pro",
 "text": "ACME CORP\nInvoice #INV-9214\n…",
 "confidence": 0.94,
 "low_confidence": false,
 "structured": {
 "vendor_name": "ACME CORP",
 "invoice_date": "2026-04-30",
 "invoice_number": "INV-9214",
 "currency": "USD",
 "subtotal": 480.00,
 "tax": 48.00,
 "total": 528.00,
 "line_items": [
 {"description": "Widget A", "qty": 4,
 "unit_price": 60.00, "amount": 240.00},
 {"description": "Widget B", "qty": 2,
 "unit_price": 120.00, "amount": 240.00}
 ]
 },
 "warnings": []
}

📊 Per-field accuracy (invoices / receipts)

Pro tier (Brainiall Form Parser (Pro tier) quantized) measured on a held-out subset of public datasets: SROIE 2019 (700 receipts), industry receipt benchmark (300 receipts), FUNSD (100 invoices). N=1,100 documents, May 2026. Field-level F1 = harmonic mean of precision and recall on exact string match (vendor_name, invoice_number) or numeric tolerance ±$0.01 (totals, amounts).

FieldBrainiall Pro F1AWS Textract Forms F1*Mindee API F1*Source
vendor_name0.940.930.96SROIE-2019
invoice_date0.910.920.94SROIE-2019
invoice_number0.890.880.92SROIE-2019 + FUNSD
total0.960.950.97SROIE-2019
tax0.870.860.89SROIE-2019 (subset with explicit tax)
line_items (description)0.820.780.85industry receipt benchmark
line_items (qty)0.910.890.93industry receipt benchmark
line_items (amount)0.880.850.91industry receipt benchmark
Average0.900.880.92

*Textract / Mindee F1 estimated from public benchmarks (Brainiall Form Parser engine paper, Mindee blog, AWS reInvent talks); methodology may differ. Brainiall numbers are self-reported, audit-grade verification pending Q3 2026 alongside the SOC 2 Type II audit. We will publish independent reproduction methodology + raw data on request to hello@brainiall.com.

Bottom line: Brainiall Pro lands within ±2 percentage points of AWS Textract Forms across all standard invoice/receipt fields, while costing 5-10× less per page ($0.005 vs $0.05). Mindee leads on raw F1 by ~2 points on average — reasonable trade-off given our pricing is 20× cheaper.

Comparison methodology & disclaimer

Brainiall measurements: our production infrastructure, May 2026. Models: Brainiall Form Parser (Fast tier) (MIT, quantized) + Brainiall Form Parser (Pro tier) (MIT, quantized). Full report: Phase 1.5 Eval Report.

AWS data: Rekognition DetectText latency from AWS docs; Textract pricing from aws.amazon.com/textract/pricing/. Note that AWS routes structured form parsing to Textract (separate product, ~$50/1k pages Forms tier).

Notes:

  • Brainiall S8 v1 covers Brainiall Form Parser (English printed text) + Brainiall Form Parser engine (receipts/invoices end-to-end). Multi-language OCR (PaddleOCR 80+ languages) on roadmap v1.1 Q3 2026.
  • Brainiall Form Parser engine is structurally different from Textract — both extract structured fields from receipts/forms, but Brainiall Form Parser engine is end-to-end (image→JSON in one model) while Textract uses OCR + parser pipeline.
  • Quality benchmarks come from original papers — independent reproduction may yield different numbers.
  • Trademarks: Amazon Web Services, Rekognition, Textract are trademarks of Amazon.com, Inc. This page is informational comparison; not endorsed by AWS.

Last reviewed: May 2026.

vs Mistral OCR 3 (December 2026)

Mistral OCR 3 launched December 2026 at $0.002/page with SOTA quality on tables, figures, and math equations. It is the most disruptive new entrant in the OCR category — and we are honest that on raw OCR quality across complex documents, it likely leads. Here is how S8 (Brainiall Form Parser engine) fits next to it.

  • Raw OCR quality: Mistral leads on math/figures/multi-column scientific layouts. S8 (Brainiall Form Parser engine) is solid on receipts, invoices, and technical documentation.
  • Structured field extraction: S8 (Brainiall Form Parser engine) returns parsed JSON (vendor, total, dates, line items) in the SAME forward pass — Mistral returns plain text + bounding boxes; you build the parser.
  • Schema validation: Pydantic schema applied to each response. Catches malformed extractions at the API boundary instead of leaking into your downstream agents.
  • Audit trail: Per-call audit DB row with 90-day retention (Sprint 194). Mistral is a stateless API; you bring your own logging.
  • Self-host option: Production-grade engine (Brainiall Document Reader engine + Brainiall Form Parser engine, both permissive license / MIT). Auditable for regulated industries and airgap-deployable. Mistral is API-only.
  • Price: $0.005/page (S8 standard tier) vs $0.002 (Mistral). 2.5× more expensive at parity, justified by the workflow features above.

When to pick Mistral: pure cheap OCR, no downstream pipeline, SOTA quality required on math/figures.

When to pick Brainiall: need structured fields out of the box, schema validation, audit trail for compliance, or self-host capability for airgap deployments.

More specialty APIs