Skip to main content

Web API
Turn the open web into clean, structured data

Web data extraction powered by Brainiall Web engine. Point it at a URL and get back clean Markdown — the main content, links and metadata, with navigation, ads and boilerplate stripped out. Crawl a whole site within page and depth limits, map every URL from its sitemap, or search the web for a query. Ethical by design: every request honors the site's robots.txt and crawl-delay, and a bot-detection challenge is reported as blocked — never bypassed.

How we compare

Web scraping is sold as a credit-metered crawl API (Firecrawl), an HTML-fetch API billed per call (ScrapingBee), a compute-unit automation platform (Apify), or an enterprise data-collection suite (Bright Data). The hyperscalers ship no general web scraping or crawling API at all. Brainiall is one REST call per job — scrape, crawl, map or search — priced per operation, self-serve from the first call, and ethical by construction.

ProviderShapePricing modelApprox. priceOnboarding
Brainiall WebREST: /scrape /crawl /map /search — clean Markdown outPer operation$0.002 / operationSelf-serve, instant API key
FirecrawlScrape + crawl API; Markdown outputCredit bundles / subscription~$0.003+ / page equivalentSelf-serve, credit-metered
ScrapingBeeHTML / scrape API; credit-meteredCredit bundles / subscription~$0.001–0.007 / callSelf-serve, credit-metered
ApifyActor / automation platform; compute-unit billingSubscription + compute unitsPlan-dependentSelf-serve, subscription
Bright DataEnterprise data-collection & proxy suiteSubscription + usageEnterprise-orientedSales-assisted
AWS / Azure / GCPNo general web scraping or crawling APIBuild it yourself

Prices are list-price approximations for orientation, not quotes — most scraping vendors price by credits or subscription, so per-operation figures are estimates. Always check each vendor's current terms.

Pricing

One per-operation price for every job — scrape, each crawled page, map and search all count the same. The free tier is enough to wire web data into a real pipeline before you pay anything.

Free

$0/mo

100 web operations/month · all 4 endpoints · no card

Starter

$19/mo

~12,000 operations/month · clean Markdown output included

Pro

$79/mo

~60,000 operations/month · priority queue · 99.5% SLA

Business

$299/mo

~250,000 operations/month · dedicated capacity · email + Slack support

PAYG: $0.002 / operation (Brainiall Web engine) — a scrape, each page of a crawl, a map and a search are billed the same. No minimum spend, no contract — the same single API key and usage-based billing as the rest of the catalog.

Four endpoints

# 1. Scrape — one URL into clean Markdown + links + metadata
POST https://api.brainiall.com/v1/web/scrape
  body: {"url": "https://example.com", "formats": ["markdown","links","metadata"]}
  -> {"url": "https://example.com", "title": "Example Domain",
      "markdown": "# Example Domain\n\nThis domain is for use in ...",
      "links": ["https://www.iana.org/domains/example"],
      "metadata": {"description": "...", "lang": "en"},
      "engine": "Brainiall Web engine"}

# 2. Crawl — breadth-first within one site, bounded by page + depth caps
POST https://api.brainiall.com/v1/web/crawl
  body: {"url": "https://example.com/docs", "max_pages": 8, "max_depth": 2}
  -> {"start_url": "...", "pages_crawled": 8,
      "pages": [{"url": "...", "title": "...", "markdown": "...", "depth": 0}],
      "skipped": [{"url": "...", "reason": "robots.txt disallow"}]}

# 3. Map — enumerate a site's URLs from its sitemap + same-origin links
POST https://api.brainiall.com/v1/web/map
  body: {"url": "https://example.com", "limit": 50}
  -> {"site": "https://example.com", "url_count": 50,
      "urls": ["https://example.com/", "..."], "source": "sitemap"}

# 4. Search — a query into a ranked list of web results
POST https://api.brainiall.com/v1/web/search
  body: {"query": "open source web scraping", "max_results": 10}
  -> {"query": "...", "result_count": 10,
      "results": [{"rank": 1, "title": "...", "url": "...", "snippet": "..."}]}

Every endpoint is one synchronous REST call. crawl and map take page and depth limits so a job always finishes in bounded time — and a crawl stays on the origin site, never wandering off-domain.

How the Web API works

One call does the fetching, the cleanup and the structuring — you get data, not raw HTML to untangle.

  • Main content, not boilerplate. A main-content pass strips the navigation, ads, cookie banners and footers and keeps the article, doc or post — then converts it to clean Markdown ready to feed an LLM or a search index.
  • Bounded crawls. A crawl walks a site breadth-first within page and depth caps and a wall-time budget, stays same-origin, and reports every URL it had to skip and why.
  • Sitemap-aware mapping. map reads sitemap.xml — including nested sitemap indexes — and falls back to same-origin link discovery when a site publishes no sitemap.
  • Ethical by construction. Every target URL is checked against the site's robots.txt before any fetch, the published crawl-delay is honored, and a CAPTCHA or bot-detection challenge is reported as blocked — never solved or bypassed.
  • Safe by default. URLs that resolve to private, loopback or internal addresses are refused, so the endpoint can never be turned against your own infrastructure.

What it's for

  • LLM & RAG pipelines: turn documentation sites, articles and knowledge bases into clean Markdown context for retrieval and generation.
  • Market & competitive research: pull pricing, catalog and news pages on a schedule and feed them straight into analysis.
  • Knowledge bases: crawl a whole documentation site into a single searchable corpus, then keep it fresh.
  • Lead & content enrichment: scrape a company's public site for structured facts to enrich a CRM or a dataset.
  • Change monitoring: re-scrape a page on a cadence and diff the Markdown to catch what changed.
  • One bill, one key: same Brainiall API key and usage-based billing as the rest of the catalog — no separate scraping vendor to procure.

Press kit & resources

What reviewers, integrators and procurement teams typically ask for.

OpenAPI spec

The machine-readable OpenAPI 3.1 definition for the whole catalog, including all four web endpoints and their schemas.

View spec

API reference

OpenAPI spec, the request/response schema for each endpoint, error codes and rate limits.

Read docs →

Try it now

Free API key in 30 seconds — 100 web operations/month, no card.

Get a key →

Compare the catalog

How Brainiall's specialty APIs line up against AWS, Azure and the specialists, use case by use case.

See the comparison →

More specialty APIs

Same single API key, same usage-based pricing, different problem solved.

Get your free API key in 30 seconds

Start free →
Web API — Brainiall (ethical web scraping, crawling & search; clean Markdown for LLMs; Firecrawl / ScrapingBee alternative) | Brainiall