Document Intelligence API
Document image â structured fields, answers, and tables â one endpoint family.
Intelligent document processing powered by Brainiall Document Intelligence engine. Send a document image and get back: structured fields tuned to the document type (receipt, invoice, ID, contract, form, or generic key-value), a grounded answer to a natural-language question about it, or every table reconstructed into headers and rows. $0.01/page â one simple per-page price instead of a per-feature pricing matrix.
How we compare
The hyperscaler IDP services are powerful but priced per feature and per document type â OCR is one rate, forms another, tables another, expense/ID parsers another again, and you stitch the calls together yourself. Brainiall folds recognition, doc-type-aware field extraction, document Q&A and table extraction into one endpoint family at a single per-page price, self-serve from the first call.
| Provider | Surface | Pricing model | Approx. price | Onboarding |
|---|---|---|---|---|
| Brainiall Document Intelligence | One family: /extract (6 doc types), /query, /tables | Per page, all features included | $0.01 / page ($0.012 / page for table extraction) | Self-serve, instant API key |
| AWS Textract | DetectDocumentText / AnalyzeDocument (Forms, Tables, Queries) / AnalyzeExpense / AnalyzeID | Per page, per feature | ~$0.0015 OCR · ~$0.05 forms · ~$0.015 tables/queries · ~$0.10 expense & ID | Self-serve (AWS account + IAM) |
| Azure AI Document Intelligence | Read (OCR) / prebuilt models (invoice, receipt, ID, âŠ) / custom models | Per page, per model class | ~$0.0015 Read · ~$0.01 prebuilt · ~$0.05 custom | Self-serve (Azure resource) |
| Google Document AI | Document OCR / Form Parser / specialized & custom processors | Per page, per processor | ~$0.0015 OCR · ~$0.03 form parser · ~$0.065 specialized | Self-serve (GCP project + processor setup) |
Prices are list-price approximations for orientation, not quotes â hyperscaler IDP pricing is tiered and feature-specific. Always check each vendor's current pricing page.
Pricing
One per-page price covers OCR, field extraction and Q&A; table extraction is a small step up. The free tier is generous enough to wire up an end-to-end pipeline.
Free
$0/mo
50 calls/month · extract + query + tables · forever free
Starter
$29/mo
~3,000 pages/month · all 6 doc types · batch-friendly
Pro
$99/mo
~12,000 pages/month · priority queue · 99.5% SLA
Business
$399/mo
~60,000 pages/month · dedicated capacity · email + Slack
PAYG: $0.01 / page for /extract and /query, $0.012 / page for /tables (Brainiall Document Intelligence engine). One page = one document image. No per-feature surcharges, no minimum spend, no contract.
Three calls: extract, query, tables
# 1. Extract structured fields â doc_type picks the schema
POST https://api.brainiall.com/v1/document/extract
{"image": "<base64 png/jpeg>", "doc_type": "receipt"}
-> {"doc_type": "receipt",
"fields": {"merchant_name": "Blue Bottle Cafe", "date": "2026-05-12",
"items": [{"name": "Cappuccino", "quantity": 2, "unit_price": 4.00, "total": 8.00}, ...],
"subtotal": 15.50, "tax": 1.24, "total": 16.74, "payment_method": null},
"text": "<recognised plain text>",
"extraction_engine": "Brainiall Document Intelligence engine"}
# doc_type â receipt | invoice | id | contract | form | generic
# 2. Ask a natural-language question about the document
POST https://api.brainiall.com/v1/document/query
{"image": "<base64 png/jpeg>", "question": "What was the total amount paid?"}
-> {"answer": "16.74", "found": true, "supporting_text": "TOTAL 16.74"}
# 3. Pull every table out as headers + rows
POST https://api.brainiall.com/v1/document/tables
{"image": "<base64 png/jpeg>"}
-> {"table_count": 1,
"tables": [{"title": "Line items", "headers": ["Item", "Qty", "Price"],
"rows": [["Cappuccino", "2", "8.00"], ["Croissant", "1", "3.50"]],
"row_count": 2, "column_count": 3}]}Pass doc_type: "generic" when you don't know the document kind â you get a short description, a best-guess document type, all labelled key-value pairs, and detected dates, amounts and entities. For multi-page documents, split into page images and call once per page. If a page has no readable text the API returns a 422 rather than guessing.
What it's for
- Accounts-payable & expense automation: drop in a scanned invoice or receipt and get the vendor, dates, line items, tax and total as JSON â straight into your ledger, no template configuration.
- Onboarding & KYC document capture: parse the name, document number, dates and MRZ off an ID document into structured fields your verification flow can check.
- Contract & agreement review: pull parties, effective dates, term, governing law and key obligations out of a contract page, or ask a direct question ("what's the notice period?") and get the answer plus the supporting line.
- Form & questionnaire intake: turn a filled-in form into a list of label/value pairs and checkbox states â useful for digitising paper intake at scale.
- Table-heavy reports: lift every table out of a financial statement, price list or lab report into clean headers and rows for downstream analysis.
- One bill, one key: this runs on the same Brainiall API key and usage-based billing as PDF â Markdown and the rest of the catalog â no separate IDP vendor to procure.
Latency profile
Document Intelligence is a thin orchestration layer â it does not host a model itself. End-to-end timing is dominated by the underlying document reader and, where applicable, the OCR step. Plan around these envelopes.
- Native-text input (the fast path): a document image with sharp printed text or a PDF page with an embedded text layer recognises in ~1.3 s/page.
/extract,/queryand/tablesall sit comfortably inside a synchronous HTTP call here. - Scanned or image-only input (the OCR path): camera scans, photos of paper, faxed documents â the document model has to run full OCR on CPU. Budget ~20â30 s/page. A two-page scanned contract through
/extractor/v1/document/layoutcan run ~25 s end-to-end. - Translation flow:
/v1/document/translate(available via the Document AI Expansion bundle) routes text through the translation model after extraction; on CPU this is around 12â15 s end-to-end, almost all of it the translation step. - Recommended pattern for long or scanned input: treat the request as an async job, the same way dubbing and speech-to-speech already do â submit, render a progress indicator, fetch the result when ready.
Press kit & resources
What reviewers, integrators and procurement teams typically ask for.
One-page datasheet
Pricing, doc types and a copy-pasteable curl snippet on one page â built for buyer review.
Download PDFAPI reference
OpenAPI spec, request/response shapes, the six doc-type schemas, error codes and rate limits.
Read docs âCompare the catalog
How Brainiall's specialty APIs line up against AWS, Azure and Google, use case by use case.
See the comparison âMore document & text APIs
Same single API key, same usage-based pricing, different problem solved.



