Skip to main content

Vision Labels API

Object detection 236 ms p50 + multi-task captioning Brainiall Vision Tagger engine. Capability AWS Rekognition does NOT offer: zero-shot detection by custom text prompt + image captioning in one model.

Get free API keyRead docs
Live demo

Try Florence-2 zero-shot, no signup

Standard tier hitting /v1/vision/labels/base64 live. Rate-limited to 5 calls / 5 min per IP. For unlimited access, sign up for $10 free credit.

Florence-2 dense caption β€” describes the entire image in natural language. AWS Rekognition does NOT offer this.

Indoor scene

Click Run to send a request.

AWS Rekognition Labels would return generic top-5 COCO classes for this image. Florence-2 can caption, OCR, segment, and answer free-text queries β€” see full capability matrix.

⚑ Performance KPIs (measured)

MetricBrainiallAWS Rekognition
Fast tier p50 (Brainiall object detection module closed-set)236 ms [source]250–500 ms (DetectLabels)β˜…
Standard tier p50 (Brainiall Vision Tagger engine multi-task)1624 msNot available (separate calls needed)β˜…
Throughput per CPU coreFast 4 RPS Β· Standard 0.6 RPSCloud auto-scale=

🎯 Capability matrix

MetricBrainiallAWS Rekognition
Closed-set object detection (~80 COCO)βœ… Brainiall object detection module βœ… DetectLabels (~2500 categories)R
Zero-shot detection (custom text prompt)βœ… Brainiall Vision Tagger engine grounding + future OWL-v2❌ Not supported (only pre-trained labels)β˜…
Caption generation (image β†’ natural language)βœ… Brainiall Vision Tagger engine SOTA captioning❌ Not supportedβ˜…
Multi-task (caption + detect + OCR + segment in 1 model)βœ… Brainiall Vision Tagger engine (single model, MIT)❌ Requires separate API calls per taskβ˜…
OCR within imageβœ… Brainiall Vision Tagger engine multi-task includes OCR🟑 Use DetectText (separate)β˜…
Open weights you can auditβœ… Brainiall object detection module permissive license + Brainiall Vision Tagger engine MIT❌ Proprietary closedβ˜…
LGPD / GDPR-by-defaultβœ… EU/BR datacenter🟑 us-east defaultβ˜…

πŸ“Š Quality benchmarks

MetricBrainiallAWS Rekognition
COCO mAPBrainiall object detection module-Base ~50 mAPNot published
Multi-task SOTABrainiall Vision Tagger engine SOTA on caption + grounding + OCR + segmentSingle-task per API

Pricing

Free

500 imgs/month

Get started.

Fast

$0.0015 / image

Brainiall object detection module closed-set object detection. p50 236ms.

Standard

$0.008 / image

Brainiall Vision Tagger engine multi-task: caption + detect + OCR + segment in one call. p50 1.6s.

Quickstart (Python)

Request (Standard tier, multi-task)

import base64, httpx
img = base64.b64encode(open("photo.jpg", "rb").read()).decode()
resp = httpx.post(
 "https://api.brainiall.com/v1/vision/labels/base64",
 headers={"Authorization": "Bearer brnl-..."},
 json={
 "image": img,
 "tier": "standard",
 "task": "<CAPTION>" # or <OCR>, <OD>, <CAPTION_TO_PHRASE_GROUNDING>
 },
)
print(resp.json())

Example response

{
 "request_id": "req_4d8e2a…",
 "processing_ms": 1624,
 "tier": "standard",
 "task": "<CAPTION>",
 "caption": "Two cats laying on a pink couch with remote controls.",
 "labels": [
 {"label": "cat", "confidence": 0.97,
 "box": [120, 240, 380, 520]},
 {"label": "couch", "confidence": 0.94,
 "box": [0, 100, 800, 600]}
 ],
 "output": {
 "<CAPTION>": "Two cats laying on a pink couch…"
 },
 "warnings": []
}

πŸ’° Why "paridade per-call" is the wrong comparison

Per-image, Brainiall Standard ($0.008) and Google Vision ($0.0015) look comparable. But Google Vision charges per feature. To get caption + OCR + object detection on the same image with Google Vision Premium, you need 3 separate features Γ— $0.0015 = $0.0045. With Brainiall Brainiall Vision Tagger engine Standard, all three come in one call.

Use caseBrainiallGoogle VisionAWS RekognitionSavings
Object detection only (Fast tier)$0.0015 / img (Brainiall object detection module)$0.0015 / img$0.001 / imgparity
Caption + OCR + detection (one image)$0.008 / img (1 call)$0.0045 / img (3 features)$0.0035 + Textract $0.0015 (2 APIs)β€”
Same workload for 1M images / month$8,000 / mo$4,500 / mo$5,000 / moβ€”
+ Zero-shot (find specific object via text query)βœ… Same call❌ Not supported❌ Not supportedcapability
+ Caption in 80+ languages (Brainiall Vision Tagger engine multilingual)βœ… Same call🟑 Translation API extra❌ English onlycapability

Real-world TCO: at 1M images/mo, Google Vision is ~44% cheaper for the multi-feature use case but you lose zero-shot prompting (cannot ask "find a Brazilian flag in this image") and pay extra for caption translation. For PT-BR, ZH, AR, JP catalogs, Brainiall Vision Tagger engine ships native multilingual captioning in the same $0.008 β€” Google would charge add'l Translation API + still lack zero-shot.

Volume tier: starting at 500K imgs/mo, Standard tier drops to $0.0056 / img (-30%). See /pricing for the full ladder.

Comparison methodology & disclaimer

Brainiall measurements: our production infrastructure, May 2026. Models: Brainiall object detection module-Base (Roboflow) + Brainiall Vision Tagger engine-base (MIT, Microsoft). Full report: Phase 1.5 Eval Report.

AWS data: Rekognition DetectLabels claims ~2500 categories (broader catalog than Brainiall object detection module's 80 COCO classes β€” that's why we mark capability βœ… for Rekognition there). However, Rekognition does NOT offer zero-shot prompting (e.g., "find a Brazilian flag" on demand) nor image captioning β€” both confirmed via AWS documentation as of May 2026.

Notes:

  • Brainiall S9 v1 covers Brainiall object detection module (closed-set ~80 COCO classes) + Brainiall Vision Tagger engine (multi-task: caption + detect + OCR + segment). For broader closed-set coverage similar to Rekognition's 2500 categories, use Brainiall Vision Tagger engine multi-task or wait for v1.1 (planned: OWL-v2 zero-shot for arbitrary prompts).
  • Brainiall Vision Tagger engine is permissive licensed and runs entirely on CPU β€” no API egress, no cloud dependency.
  • Brainiall object detection module mAP from Roboflow benchmark; methodologies may differ vs AWS evaluation.
  • Trademarks: Amazon Web Services, Rekognition are trademarks of Amazon.com, Inc. This page is informational comparison; not endorsed by AWS.

Last reviewed: May 2026.

More specialty APIs