Brainiall Web
Turn the open web into clean, structured data

Name: Brainiall Web
Brand: Brainiall
SKU: web-intelligence
Availability: InStock
Rating: 4.7 (9 reviews)

Web data extraction powered by Brainiall Web engine. Point it at a URL and get back clean Markdown — the main content, links and metadata, with navigation, ads and boilerplate stripped out. Crawl a whole site within page and depth limits, map every URL from its sitemap, or search the web for a query. Ethical by design: every request honors the site's robots.txt and crawl-delay, and a bot-detection challenge is reported as blocked — never bypassed.

Get free API key Read docs

How we compare

Web scraping is sold as a credit-metered crawl API (Firecrawl), an HTML-fetch API billed per call (ScrapingBee), a compute-unit automation platform (Apify), or an enterprise data-collection suite (Bright Data). The hyperscalers ship no general web scraping or crawling API at all. Brainiall is one REST call per job — scrape, crawl, map or search — priced per operation, self-serve from the first call, and ethical by construction.

Provider	Shape	Pricing model	Approx. price	Onboarding
Brainiall Web	REST: `/scrape` `/crawl` `/map` `/search` — clean Markdown out	Per operation	$0.0015 / operation	Self-serve, instant API key
Firecrawl	Scrape + crawl API; Markdown output	Credit bundles / subscription	~$0.003+ / page equivalent	Self-serve, credit-metered
ScrapingBee	HTML / scrape API; credit-metered	Credit bundles / subscription	~$0.001–0.007 / call	Self-serve, credit-metered
Apify	Actor / automation platform; compute-unit billing	Subscription + compute units	Plan-dependent	Self-serve, subscription
Bright Data	Enterprise data-collection & proxy suite	Subscription + usage	Enterprise-oriented	Sales-assisted
AWS / Azure / GCP	No general web scraping or crawling API	—	Build it yourself	—

Prices are list-price approximations for orientation, not quotes — most scraping vendors price by credits or subscription, so per-operation figures are estimates. Always check each vendor's current terms.

Pricing

One per-operation price for every job — scrape, each crawled page, map and search all count the same. The free tier is enough to wire web data into a real pipeline before you pay anything.

Free

$0/mo

100 web operations/month · all 4 endpoints · no card

Starter

$19/mo

~12,000 operations/month · clean Markdown output included

Pro

$79/mo

~60,000 operations/month · priority queue · 99.5% SLA

Business

$299/mo

~250,000 operations/month · dedicated capacity · email + Slack support

PAYG: $0.0015 / operation (Brainiall Web engine) — a scrape, each page of a crawl, a map and a search are billed the same. No minimum spend, no contract — the same single API key and usage-based billing as the rest of the catalog.

Four endpoints

# 1. Scrape — one URL into clean Markdown + links + metadata
POST https://api.brainiall.com/v1/web/scrape
  body: {"url": "https://example.com", "formats": ["markdown","links","metadata"]}
  -> {"url": "https://example.com", "title": "Example Domain",
      "markdown": "# Example Domain\n\nThis domain is for use in ...",
      "links": ["https://www.iana.org/domains/example"],
      "metadata": {"description": "...", "lang": "en"},
      "engine": "Brainiall Web engine"}

# 2. Crawl — breadth-first within one site, bounded by page + depth caps
POST https://api.brainiall.com/v1/web/crawl
  body: {"url": "https://example.com/docs", "max_pages": 8, "max_depth": 2}
  -> {"start_url": "...", "pages_crawled": 8,
      "pages": [{"url": "...", "title": "...", "markdown": "...", "depth": 0}],
      "skipped": [{"url": "...", "reason": "robots.txt disallow"}]}

# 3. Map — enumerate a site's URLs from its sitemap + same-origin links
POST https://api.brainiall.com/v1/web/map
  body: {"url": "https://example.com", "limit": 50}
  -> {"site": "https://example.com", "url_count": 50,
      "urls": ["https://example.com/", "..."], "source": "sitemap"}

# 4. Search — a query into a ranked list of web results
POST https://api.brainiall.com/v1/web/search
  body: {"query": "open source web scraping", "max_results": 10}
  -> {"query": "...", "result_count": 10,
      "results": [{"rank": 1, "title": "...", "url": "...", "snippet": "..."}]}

Every endpoint is one synchronous REST call. crawl and map take page and depth limits so a job always finishes in bounded time — and a crawl stays on the origin site, never wandering off-domain.

How Brainiall Web works

One call does the fetching, the cleanup and the structuring — you get data, not raw HTML to untangle.

Main content, not boilerplate. A main-content pass strips the navigation, ads, cookie banners and footers and keeps the article, doc or post — then converts it to clean Markdown ready to feed an LLM or a search index.
Bounded crawls. A crawl walks a site breadth-first within page and depth caps and a wall-time budget, stays same-origin, and reports every URL it had to skip and why.
Sitemap-aware mapping. map reads sitemap.xml — including nested sitemap indexes — and falls back to same-origin link discovery when a site publishes no sitemap.
Ethical by construction. Every target URL is checked against the site's robots.txt before any fetch, the published crawl-delay is honored, and a CAPTCHA or bot-detection challenge is reported as blocked — never solved or bypassed.
Safe by default. URLs that resolve to private, loopback or internal addresses are refused, so the endpoint can never be turned against your own infrastructure.

What it's for

LLM & RAG pipelines: turn documentation sites, articles and knowledge bases into clean Markdown context for retrieval and generation.
Market & competitive research: pull pricing, catalog and news pages on a schedule and feed them straight into analysis.
Knowledge bases: crawl a whole documentation site into a single searchable corpus, then keep it fresh.
Lead & content enrichment: scrape a company's public site for structured facts to enrich a CRM or a dataset.
Change monitoring: re-scrape a page on a cadence and diff the Markdown to catch what changed.
One bill, one key: same Brainiall API key and usage-based billing as the rest of the catalog — no separate scraping vendor to procure.