PDF-to-Markdown API
Brainiall Document Reader engine. $0.001/page. Built for RAG and LLM ingestion.
Drop a PDF → get clean Markdown with preserved structure: headings, tables, code blocks, math, footnotes. $0.001/page — 5× cheaper than Datalab, 50× cheaper than Adobe Extract.

How we compare
Brainiall Document Reader engine is research SOTA on academic and technical PDFs (~95% structure preservation on multi-column papers). v1.0 calibration vs published competitor pricing and metrics; v1.1 will replace with a 100-document head-to-head benchmark.
| Provider | Quality | Price/page | vs market avg | Position |
|---|---|---|---|---|
| Datalab (Brainiall Document Reader engine on-demand) | 9.4/10 | $0.0050 | 33% | — |
| LlamaIndex Cloud (LlamaParse) | 9.0/10 | $0.0030 | 20% | — |
| Adobe PDF Services Extract | 9.2/10 | $0.050 | 329% | — |
| Microsoft Document Intelligence | 8.8/10 | $0.010 | 66% | — |
| Reducto AI | 9.3/10 | $0.0080 | 53% | — |
| Brainiall FAST | 9.4/10 | $0.0010(93% cheaper) | 7% | Parity |
Pricing rule: 90% off when inferior · 80% off at parity · 50% off when superior. Position determined by objective benchmark, refreshed quarterly.
vs Mistral OCR 3 (December 2026)
Mistral OCR 3 launched December 2026 at $0.002/page with SOTA quality on tables, figures, and math equations. It is the most disruptive new entrant in the category — and we are honest that on raw OCR quality across complex documents, it likely leads. Here is how S4 (Brainiall Document Reader engine) fits next to it.
- Raw OCR quality: Mistral leads on math/figures/multi-column scientific layouts. S4 (Brainiall Document Reader engine) is solid on technical docs with code, tables, and lists — and ships the layout-aware markdown that downstream RAG pipelines actually want.
- Markdown output: S4 (Brainiall Document Reader engine) returns clean GitHub-flavored markdown preserving headings, tables, lists, math, code blocks. Mistral returns plain text + bounding boxes; you build the markdown converter.
- Self-host option: Production-grade engine (Brainiall Document Reader engine, permissive license). Auditable for regulated industries and airgap-deployable. Mistral is API-only.
- Per-page consistency: Brainiall Document Reader engine handles 1000-page documents without context-window splits. Mistral has practical limits on long inputs.
- Audit trail: Per-call audit DB row with 90-day retention. Mistral is stateless.
- Price: $0.001/page (standard tier) vs $0.002 (Mistral). 2× cheaper.
When to pick Mistral: pure raw OCR with SOTA quality on math/scientific layouts, willing to build your own markdown converter.
When to pick Brainiall: need clean markdown out of the box, self-host capability, audit trail, or 1000-page document handling.
Pricing
Discount derived from quality position vs the closest competitor. 90% off when inferior, 80% off at parity, 50% off when superior.
Free
$0/mo
30 pages/month · fast tier · forever free
Starter
$19/mo
1,500 pages/month · markdown + JSON output · all formats
Pro
$99/mo
15,000 pages/month · priority queue · 99.5% SLA
Business
$299/mo
75,000 pages/month · dedicated capacity · email + Slack
PAYG: $0.001/page (Brainiall Document Reader engine). HD tier (Brainiall Document Reader engine + Surya OCR refinement for scanned/multilingual/handwritten) is on the v1.1 roadmap — not yet available.
One endpoint, structured output
# Convert PDF to clean Markdown
POST https://api.brainiall.com/v1/document/pdf-to-markdown/base64
{"pdf": "<base64 pdf>"}
# With page range (skip cover, ToC, etc.)
POST https://api.brainiall.com/v1/document/pdf-to-markdown/base64
{"pdf": "<base64>", "page_range": "3-50"}
# Markdown-only response (no JSON wrapper)
POST https://api.brainiall.com/v1/document/pdf-to-markdown/base64
{"pdf": "<base64>", "output_format": "markdown"}
# Response includes structure metadata
# { "markdown": "# Title\n\n...", "metadata": {"pages": 48, "char_count": 12048}, "tier": "fast" }What Brainiall Document Reader engine does well
- Multi-column layouts: academic papers, magazines, technical reports — preserves reading order.
- Tables: extracted as Markdown tables with header/cell preservation, not flattened text.
- Math + code: equations rendered as LaTeX inline; code blocks preserved with monospace fences.
- Headings + structure: H1/H2/H3 hierarchy detected from font sizes + position cues.
- Footnotes + references: linked at-paragraph, not dropped.
- Multilingual: HD tier adds Surya OCR for non-English PDFs and scanned documents.
Built for RAG pipelines
Most PDF parsers output JSON or HTML — your RAG pipeline then has to re-flatten it back to text-with-structure. Brainiall Document Reader engine outputs Markdown directly, which embedders (OpenAI, Voyage, Cohere) handle natively. Cut the conversion step.
Press kit & resources
Everything reviewers, integrators and procurement teams typically ask for.
One-page datasheet
Pricing, KPIs and a copy-pasteable curl snippet on one page. Ideal for buyer review.
Download PDFTry it with our sample
Sample academic paper PDF — feed it through the API and compare the output against your own input.
Download sampleAPI reference
OpenAPI spec, request/response shapes, error codes, rate limits and quota model.
Read docs →More specialty APIs
Same single API key, same usage-based pricing, different problem solved.



