Skip to main content

Streaming STT API
Voice-agent ready WebSocket transcription

Send audio chunks over WebSocket, receive partial transcripts as they're produced. Brainiall Speech (Pro tier) backbone, 99 languages, drop-in for voice agents and live captioning. First partial transcripted in roughly 1.5s; tighter buffering on roadmap (Phase 2: VAD + 500ms flush).

How it works

  1. Connect to wss://api.brainiall.com/v1/stt/stream with Bearer authentication.
  2. Send binary frames containing 16 kHz int16 PCM mono audio chunks (any size). Server buffers internally.
  3. Receive JSON text frames every ~1.5 s of accumulated audio: {"type":"partial","text":"...","offset_ms":0,"duration_ms":1500}
  4. Send text frame "__END__" to flush remaining buffer. Receive final transcript: {"type":"final","text":"...","chunks_processed":N}
# Python client (websockets lib)
import asyncio, json, websockets, wave

async def stream(wav_path):
 async with websockets.connect(
 "wss://api.brainiall.com/v1/stt/stream",
 additional_headers=[("Authorization", "Bearer $BRAINIALL_API_KEY")],
 ) as ws:
 with wave.open(wav_path, "rb") as wf:
 data = wf.readframes(wf.getnframes)
 # Send in 200ms chunks
 chunk = 16000 * 2 // 5 # int16 @ 16kHz, 200ms
 for i in range(0, len(data), chunk):
 await ws.send(data[i:i+chunk])
 await asyncio.sleep(0.2)
 await ws.send("__END__")
 async for msg in ws:
 event = json.loads(msg)
 print(event["type"], event["text"])
 if event["type"] == "final":
 break

asyncio.run(stream("audio.wav"))

Use cases

Voice agents (LLM + tool-use)

Real-time conversational agents need transcripts as the user is still speaking. WebSocket emits partial transcripts every 1.5s — your agent reasons in parallel with the user finishing their sentence.

Live captioning

Conferences, lectures, accessibility tools. Stream audio in, render partials on screen with low latency. 99 languages out of the box.

Phone-call analytics

Real-time supervisor dashboards for call centers. Stream agent + customer audio, get partials for compliance flagging without batch wait.

Voice search / dictation

Application voice input where users want to see transcripts appear word-by-word. Similar UX to native OS dictation but through your API.

Multi-language live translation chain

Pair with our Translation API (S10): stream STT → translate partial → emit. End-to-end live translation pipeline in a single Bearer key.

Quality & price comparison

Quality on a 0–10 scale derived from published WER benchmarks (clean-speech benchmarks, multilingual benchmarks, CallHome). Price is per audio-minute in USD.

ProviderQualityPrice/audio-minvs market avgPosition
Deepgram Nova-3 (streaming)9.5/10$0.007761%
AssemblyAI Universal-Streaming9.0/10$0.002520%
AWS Transcribe Streaming8.0/10$0.024191%
GCP Chirp 3 streaming9.0/10$0.016127%
Brainiall STREAMING8.5/10$0.0050(60% cheaper)40%Parity

Pricing rule: 90% off when inferior · 80% off at parity · 50% off when superior. Position determined by objective benchmark, refreshed quarterly.

Where we're honest

  • First-partial latency: ~1.5 s today (buffer-by-time MVP). Deepgram Nova-3 streaming is closer to ~300 ms first-partial. We're working on Silero VAD-based segmentation + 500 ms flush in Phase 2.
  • Sliding window: Each partial is transcribed in isolation today (no overlap). The transcription engine occasionally hallucinates on short isolated segments. Phase 2 adds 500 ms overlap to fix this.
  • Speaker diarization: Not in streaming mode yet (only batch). Our existing batch /transcribe endpoint includes Brainiall Speaker ID engine diarization.
  • Languages: Auto-language detection (good for 99 languages). For tighter latency, future language= parameter will skip detection.

Pricing

Discount derived from quality position. 80% off at parity with the industry leaders on coverage and accuracy.

Free

$0/mo

60 minutes/month · Brainiall Speech (Pro tier) · Forever free

Starter

$19/mo

1,200 minutes/month · 99 languages · Standard latency

Pro

$99/mo

10,000 minutes/month · Priority queue · 99.5% SLA

Business

$299/mo

50,000 minutes/month · Dedicated capacity · Email + Slack

Streaming STT, transparent latency, 1/3 the Deepgram price

Get free API key See competitive comparison

Streaming STT API — voice-agent ready · Brainiall | Brainiall