PDF-to-Markdown API
Brainiall Document Reader engine. $0.001/page. Built for RAG and LLM ingestion.
Drop a PDF → get clean Markdown with preserved structure: headings, tables, code blocks, math, footnotes. $0.001/page — 5× cheaper than Datalab, 50× cheaper than Adobe Extract.

How we compare
Brainiall Document Reader engine is research SOTA on academic and technical PDFs (~95% structure preservation on multi-column papers). v1.0 calibration vs published competitor pricing and metrics; v1.1 will replace with a 100-document head-to-head benchmark.
| Provider | Quality | Price/page | vs market avg | Position |
|---|---|---|---|---|
| Datalab (Brainiall Document Reader engine on-demand) | 9.4/10 | $0.0050 | 33% | — |
| LlamaIndex Cloud (LlamaParse) | 9.0/10 | $0.0030 | 20% | — |
| Adobe PDF Services Extract | 9.2/10 | $0.050 | 329% | — |
| Microsoft Document Intelligence | 8.8/10 | $0.010 | 66% | — |
| Reducto AI | 9.3/10 | $0.0080 | 53% | — |
| Brainiall FAST | 9.4/10 | $0.0010(93% cheaper) | 7% | Parity |
Pricing rule: 90% off when inferior · 80% off at parity · 50% off when superior. Position determined by objective benchmark, refreshed quarterly. Market average excludes retired / free / no-offer entries.
vs Mistral OCR 3 (December 2026)
Mistral OCR 3 launched December 2026 at $0.002/page with SOTA quality on tables, figures, and math equations. It is the most disruptive new entrant in the category — and we are honest that on raw OCR quality across complex documents, it likely leads. Here is how S4 (Brainiall Document Reader engine) fits next to it.
- Raw OCR quality: Mistral leads on math/figures/multi-column scientific layouts. S4 (Brainiall Document Reader engine) is solid on technical docs with code, tables, and lists — and ships the layout-aware markdown that downstream RAG pipelines actually want.
- Markdown output: S4 (Brainiall Document Reader engine) returns clean GitHub-flavored markdown preserving headings, tables, lists, math, code blocks. Mistral returns plain text + bounding boxes; you build the markdown converter.
- Self-host option: Production-grade engine (Brainiall Document Reader engine, permissive license). Auditable for regulated industries and airgap-deployable. Mistral is API-only.
- Per-page consistency: Brainiall Document Reader engine handles 1000-page documents without context-window splits. Mistral has practical limits on long inputs.
- Audit trail: Per-call audit DB row with 90-day retention. Mistral is stateless.
- Price: $0.001/page (standard tier) vs $0.002 (Mistral). 2× cheaper.
When to pick Mistral: pure raw OCR with SOTA quality on math/scientific layouts, willing to build your own markdown converter.
When to pick Brainiall: need clean markdown out of the box, self-host capability, audit trail, or 1000-page document handling.
Pricing
Discount derived from quality position vs the closest competitor. 90% off when inferior, 80% off at parity, 50% off when superior.
Free
$0/mo
30 pages/month · fast tier · forever free
Starter
$19/mo
1,500 pages/month · markdown + JSON output · all formats
Pro
$99/mo
15,000 pages/month · priority queue · 99.5% SLA
Business
$299/mo
75,000 pages/month · dedicated capacity · email + Slack
PAYG: $0.001/page (Brainiall Document Reader engine). HD tier (Brainiall Document Reader engine + Surya OCR refinement for scanned/multilingual/handwritten) is on the v1.1 roadmap — not yet available.
One endpoint, structured output
# Convert PDF to clean Markdown
POST https://api.brainiall.com/v1/document/pdf-to-markdown/base64
{"pdf": "<base64 pdf>"}
# With page range (skip cover, ToC, etc.)
POST https://api.brainiall.com/v1/document/pdf-to-markdown/base64
{"pdf": "<base64>", "page_range": "3-50"}
# Markdown-only response (no JSON wrapper)
POST https://api.brainiall.com/v1/document/pdf-to-markdown/base64
{"pdf": "<base64>", "output_format": "markdown"}
# Response includes structure metadata
# { "markdown": "# Title\n\n...", "metadata": {"pages": 48, "char_count": 12048}, "tier": "fast" }What Brainiall Document Reader engine does well
- Multi-column layouts: academic papers, magazines, technical reports — preserves reading order.
- Tables: extracted as Markdown tables with header/cell preservation, not flattened text.
- Math + code: equations rendered as LaTeX inline; code blocks preserved with monospace fences.
- Headings + structure: H1/H2/H3 hierarchy detected from font sizes + position cues.
- Footnotes + references: linked at-paragraph, not dropped.
- Multilingual: HD tier adds Surya OCR for non-English PDFs and scanned documents.
Built for RAG pipelines
Most PDF parsers output JSON or HTML — your RAG pipeline then has to re-flatten it back to text-with-structure. Brainiall Document Reader engine outputs Markdown directly, which embedders (OpenAI, Voyage, Cohere) handle natively. Cut the conversion step.
Latency profile — two regimes
PDF conversion runs in one of two modes, picked automatically from the input. Plan your integration around the right one — surprises here are the most common reason people call us.
- Native-text PDFs (the fast path): if the file already carries an embedded text layer (most exports from Word, LaTeX, Google Docs, modern report generators), conversion runs at ~1.3 s/page. A 5-page text PDF lands in roughly 6.3 s end-to-end. Synchronous request/response is fine here.
- Scanned or image-only PDFs (the OCR path): if the file has no extractable text (camera scans, fax output, photos of paper, older archives), the document model has to run OCR page-by-page on CPU. Budget ~20–30 s/page. A 5-page scan takes 1.5–2.5 minutes.
- Recommended pattern for scanned input: treat the request like a job, not a synchronous call. Submit, hold the connection or queue the work server-side, render a progress indicator. The same async-job pattern that dubbing and speech-to-speech use applies cleanly to long scanned-PDF conversions.
Press kit & resources
Everything reviewers, integrators and procurement teams typically ask for.
One-page datasheet
Pricing, KPIs and a copy-pasteable curl snippet on one page. Ideal for buyer review.
Download PDFTry it with our sample
Sample academic paper PDF — feed it through the API and compare the output against your own input.
Download sampleAPI reference
OpenAPI spec, request/response shapes, error codes, rate limits and quota model.
Read docs →More specialty APIs
Same single API key, same usage-based pricing, different problem solved.



