Web API
Turn the open web into clean, structured data
Web data extraction powered by Brainiall Web engine. Point it at a URL and get back clean Markdown — the main content, links and metadata, with navigation, ads and boilerplate stripped out. Crawl a whole site within page and depth limits, map every URL from its sitemap, or search the web for a query. Ethical by design: every request honors the site's robots.txt and crawl-delay, and a bot-detection challenge is reported as blocked — never bypassed.
How we compare
Web scraping is sold as a credit-metered crawl API (Firecrawl), an HTML-fetch API billed per call (ScrapingBee), a compute-unit automation platform (Apify), or an enterprise data-collection suite (Bright Data). The hyperscalers ship no general web scraping or crawling API at all. Brainiall is one REST call per job — scrape, crawl, map or search — priced per operation, self-serve from the first call, and ethical by construction.
| Provider | Shape | Pricing model | Approx. price | Onboarding |
|---|---|---|---|---|
| Brainiall Web | REST: /scrape /crawl /map /search — clean Markdown out | Per operation | $0.002 / operation | Self-serve, instant API key |
| Firecrawl | Scrape + crawl API; Markdown output | Credit bundles / subscription | ~$0.003+ / page equivalent | Self-serve, credit-metered |
| ScrapingBee | HTML / scrape API; credit-metered | Credit bundles / subscription | ~$0.001–0.007 / call | Self-serve, credit-metered |
| Apify | Actor / automation platform; compute-unit billing | Subscription + compute units | Plan-dependent | Self-serve, subscription |
| Bright Data | Enterprise data-collection & proxy suite | Subscription + usage | Enterprise-oriented | Sales-assisted |
| AWS / Azure / GCP | No general web scraping or crawling API | — | Build it yourself | — |
Prices are list-price approximations for orientation, not quotes — most scraping vendors price by credits or subscription, so per-operation figures are estimates. Always check each vendor's current terms.
Pricing
One per-operation price for every job — scrape, each crawled page, map and search all count the same. The free tier is enough to wire web data into a real pipeline before you pay anything.
Free
$0/mo
100 web operations/month · all 4 endpoints · no card
Starter
$19/mo
~12,000 operations/month · clean Markdown output included
Pro
$79/mo
~60,000 operations/month · priority queue · 99.5% SLA
Business
$299/mo
~250,000 operations/month · dedicated capacity · email + Slack support
PAYG: $0.002 / operation (Brainiall Web engine) — a scrape, each page of a crawl, a map and a search are billed the same. No minimum spend, no contract — the same single API key and usage-based billing as the rest of the catalog.
Four endpoints
# 1. Scrape — one URL into clean Markdown + links + metadata
POST https://api.brainiall.com/v1/web/scrape
body: {"url": "https://example.com", "formats": ["markdown","links","metadata"]}
-> {"url": "https://example.com", "title": "Example Domain",
"markdown": "# Example Domain\n\nThis domain is for use in ...",
"links": ["https://www.iana.org/domains/example"],
"metadata": {"description": "...", "lang": "en"},
"engine": "Brainiall Web engine"}
# 2. Crawl — breadth-first within one site, bounded by page + depth caps
POST https://api.brainiall.com/v1/web/crawl
body: {"url": "https://example.com/docs", "max_pages": 8, "max_depth": 2}
-> {"start_url": "...", "pages_crawled": 8,
"pages": [{"url": "...", "title": "...", "markdown": "...", "depth": 0}],
"skipped": [{"url": "...", "reason": "robots.txt disallow"}]}
# 3. Map — enumerate a site's URLs from its sitemap + same-origin links
POST https://api.brainiall.com/v1/web/map
body: {"url": "https://example.com", "limit": 50}
-> {"site": "https://example.com", "url_count": 50,
"urls": ["https://example.com/", "..."], "source": "sitemap"}
# 4. Search — a query into a ranked list of web results
POST https://api.brainiall.com/v1/web/search
body: {"query": "open source web scraping", "max_results": 10}
-> {"query": "...", "result_count": 10,
"results": [{"rank": 1, "title": "...", "url": "...", "snippet": "..."}]}Every endpoint is one synchronous REST call. crawl and map take page and depth limits so a job always finishes in bounded time — and a crawl stays on the origin site, never wandering off-domain.
How the Web API works
One call does the fetching, the cleanup and the structuring — you get data, not raw HTML to untangle.
- Main content, not boilerplate. A main-content pass strips the navigation, ads, cookie banners and footers and keeps the article, doc or post — then converts it to clean Markdown ready to feed an LLM or a search index.
- Bounded crawls. A crawl walks a site breadth-first within page and depth caps and a wall-time budget, stays same-origin, and reports every URL it had to skip and why.
- Sitemap-aware mapping.
mapreadssitemap.xml— including nested sitemap indexes — and falls back to same-origin link discovery when a site publishes no sitemap. - Ethical by construction. Every target URL is checked against the site's
robots.txtbefore any fetch, the published crawl-delay is honored, and a CAPTCHA or bot-detection challenge is reported as blocked — never solved or bypassed. - Safe by default. URLs that resolve to private, loopback or internal addresses are refused, so the endpoint can never be turned against your own infrastructure.
What it's for
- LLM & RAG pipelines: turn documentation sites, articles and knowledge bases into clean Markdown context for retrieval and generation.
- Market & competitive research: pull pricing, catalog and news pages on a schedule and feed them straight into analysis.
- Knowledge bases: crawl a whole documentation site into a single searchable corpus, then keep it fresh.
- Lead & content enrichment: scrape a company's public site for structured facts to enrich a CRM or a dataset.
- Change monitoring: re-scrape a page on a cadence and diff the Markdown to catch what changed.
- One bill, one key: same Brainiall API key and usage-based billing as the rest of the catalog — no separate scraping vendor to procure.
Press kit & resources
What reviewers, integrators and procurement teams typically ask for.
OpenAPI spec
The machine-readable OpenAPI 3.1 definition for the whole catalog, including all four web endpoints and their schemas.
View specAPI reference
OpenAPI spec, the request/response schema for each endpoint, error codes and rate limits.
Read docs →Compare the catalog
How Brainiall's specialty APIs line up against AWS, Azure and the specialists, use case by use case.
See the comparison →More specialty APIs
Same single API key, same usage-based pricing, different problem solved.



