Speech-to-Speech API
Translate the spoken word, keep it spoken
Speech translation powered by Brainiall Speech-to-Speech engine. Submit a spoken-audio clip and a target language; an asynchronous job transcribes the speech, translates the text and synthesizes audio in the target language. Poll the job for the translated audio — plus the source transcript and the translation, so every result is reviewable. Priced per translation job, self-serve from the first call — and job-status polling is always free.
How we compare
Speech-to-speech translation is shipped as a real-time streaming SDK inside a broad cloud speech service (Azure AI Speech), or not as a single API at all — the other hyperscalers make you chain transcription, translation and speech synthesis into a pipeline yourself. Brainiall is one REST job: submit a clip and a target language, poll the job, get translated audio back — with both transcripts — self-serve from the first call and priced per job.
| Provider | Shape | Pricing model | Approx. price | Onboarding |
|---|---|---|---|---|
| Brainiall Speech-to-Speech | REST: /create /job — translated audio + source & translated transcripts | Per translation job | $0.04 / job | Self-serve, instant API key |
| Azure AI Speech | Real-time speech-translation SDK inside a broad speech service | Per hour of audio | ~$2.50 / audio-hour | Self-serve, Azure account |
| Google Cloud | Translation API + Text-to-Speech, chained by you | Per character, each API | Two metered APIs | Assemble it yourself |
| AWS | Transcribe + Translate + Polly — three separate APIs, no single call | Per minute + per character + per character | Three separate bills | Assemble it yourself |
Prices are list-price approximations for orientation, not quotes — the hyperscalers meter speech by the hour, the minute or the character, so a per-job figure is not directly comparable. Azure's offering is a real-time stream; Brainiall's is an asynchronous job. Always check each vendor's current terms.
Pricing
One per-job price for the whole pipeline — transcription, translation and synthesis included. The free tier is enough to translate real clips end to end and wire speech-to-speech into a workflow.
Free
$0/mo
30 translation jobs/month · both transcripts included · polling always free
Starter
$19/mo
~600 translation jobs/month · pick the voice and speaking speed
Pro
$79/mo
~3,000 translation jobs/month · priority queue · 99.5% SLA
Business
$299/mo
~15,000 translation jobs/month · dedicated capacity · email + Slack support
PAYG: $0.04 / translation job (Brainiall Speech-to-Speech engine). Job-status polling is always free — you only pay when you create a translation. No minimum spend, no contract — the same single API key and usage-based billing as the rest of the catalog.
Two endpoints
# Submit audio as raw bytes, a multipart `file` upload, or {"audio": "<base64>"}.
# 1. Create — submit a clip + target language, get a job_id back immediately
POST https://api.brainiall.com/v1/s2s/create
body: {"audio": "<base64>", "target_lang": "es", "source_lang": "en"}
-> {"job_id": "5c2b12c1e5d7415b", "status": "queued",
"poll_url": "/v1/s2s/job/5c2b12c1e5d7415b",
"engine": "Brainiall Speech-to-Speech engine"}
# 2. Job — poll until the translated audio is ready (polling is free)
GET https://api.brainiall.com/v1/s2s/job/5c2b12c1e5d7415b
-> {"status": "completed", "progress": 1.0,
"source_lang": "en", "target_lang": "es",
"source_text": "Hello, this is a speech-to-speech translation test.",
"translated_text": "Hola, este es un test de traducción de habla a habla.",
"result": {"audio": "<base64 wav>", "audio_format": "wav",
"voice": "af_heart"}}create is asynchronous: it returns a job_id instantly and runs the pipeline in the background, so a long clip never blocks your request. Poll job for progress until status is completed — polling never counts against your quota. Optional parameters let you pin the source_lang, pick a voice and set the speaking speed.
How speech-to-speech works
One create call runs the whole pipeline. Every stage is explainable — the finished job hands back the transcripts it produced along the way.
- Transcribe the speech. The spoken clip is transcribed to text; if you do not pass a
source_langit is detected automatically. - Translate the transcript. The text is translated into the target language you asked for, so meaning is carried before any audio is synthesized.
- Synthesize the target speech. The translation is spoken back in a natural voice — choose the voice and the speaking speed, or take the defaults.
- Both transcripts, always. The finished job returns the source transcript and the translation alongside the audio, so every result is reviewable and easy to caption.
- One async job.
createreturns ajob_idinstantly and the pipeline runs in the background; a short clip is typically ready in well under a minute.
What it's for
- Support & contact centers: let an agent and a caller who speak different languages hear each other in their own language.
- Localization & media: re-voice an announcement, a message or a short clip into another language without booking a studio.
- Accessibility & reach: pair the translated audio with the source and translated transcripts for captions and searchable copy in both languages.
- Language learning: let learners hear exactly how a phrase sounds spoken aloud in the language they are studying.
- Voice messaging & apps: translate a spoken message into the recipient's language before you deliver it.
- One bill, one key: same Brainiall API key and usage-based billing as the rest of the catalog — no separate speech vendor to procure.
Press kit & resources
What reviewers, integrators and procurement teams typically ask for.
OpenAPI spec
The machine-readable OpenAPI 3.1 definition for the whole catalog, including both speech-to-speech endpoints and their schemas.
View specAPI reference
OpenAPI spec, the request/response schema, the job lifecycle, error codes and rate limits.
Read docs →Compare the catalog
How Brainiall's specialty APIs line up against AWS, Azure and the specialists, use case by use case.
See the comparison →More specialty APIs
Same single API key, same usage-based pricing, different problem solved.



