Skip to main content

Speech-to-Speech API
Translate the spoken word, keep it spoken

Speech translation powered by Brainiall Speech-to-Speech engine. Submit a spoken-audio clip and a target language; an asynchronous job transcribes the speech, translates the text and synthesizes audio in the target language. Poll the job for the translated audio — plus the source transcript and the translation, so every result is reviewable. Priced per translation job, self-serve from the first call — and job-status polling is always free.

How we compare

Speech-to-speech translation is shipped as a real-time streaming SDK inside a broad cloud speech service (Azure AI Speech), or not as a single API at all — the other hyperscalers make you chain transcription, translation and speech synthesis into a pipeline yourself. Brainiall is one REST job: submit a clip and a target language, poll the job, get translated audio back — with both transcripts — self-serve from the first call and priced per job.

ProviderShapePricing modelApprox. priceOnboarding
Brainiall Speech-to-SpeechREST: /create /job — translated audio + source & translated transcriptsPer translation job$0.04 / jobSelf-serve, instant API key
Azure AI SpeechReal-time speech-translation SDK inside a broad speech servicePer hour of audio~$2.50 / audio-hourSelf-serve, Azure account
Google CloudTranslation API + Text-to-Speech, chained by youPer character, each APITwo metered APIsAssemble it yourself
AWSTranscribe + Translate + Polly — three separate APIs, no single callPer minute + per character + per characterThree separate billsAssemble it yourself

Prices are list-price approximations for orientation, not quotes — the hyperscalers meter speech by the hour, the minute or the character, so a per-job figure is not directly comparable. Azure's offering is a real-time stream; Brainiall's is an asynchronous job. Always check each vendor's current terms.

Pricing

One per-job price for the whole pipeline — transcription, translation and synthesis included. The free tier is enough to translate real clips end to end and wire speech-to-speech into a workflow.

Free

$0/mo

30 translation jobs/month · both transcripts included · polling always free

Starter

$19/mo

~600 translation jobs/month · pick the voice and speaking speed

Pro

$79/mo

~3,000 translation jobs/month · priority queue · 99.5% SLA

Business

$299/mo

~15,000 translation jobs/month · dedicated capacity · email + Slack support

PAYG: $0.04 / translation job (Brainiall Speech-to-Speech engine). Job-status polling is always free — you only pay when you create a translation. No minimum spend, no contract — the same single API key and usage-based billing as the rest of the catalog.

Two endpoints

# Submit audio as raw bytes, a multipart `file` upload, or {"audio": "<base64>"}.

# 1. Create — submit a clip + target language, get a job_id back immediately
POST https://api.brainiall.com/v1/s2s/create
  body: {"audio": "<base64>", "target_lang": "es", "source_lang": "en"}
  -> {"job_id": "5c2b12c1e5d7415b", "status": "queued",
      "poll_url": "/v1/s2s/job/5c2b12c1e5d7415b",
      "engine": "Brainiall Speech-to-Speech engine"}

# 2. Job — poll until the translated audio is ready (polling is free)
GET https://api.brainiall.com/v1/s2s/job/5c2b12c1e5d7415b
  -> {"status": "completed", "progress": 1.0,
      "source_lang": "en", "target_lang": "es",
      "source_text": "Hello, this is a speech-to-speech translation test.",
      "translated_text": "Hola, este es un test de traducción de habla a habla.",
      "result": {"audio": "<base64 wav>", "audio_format": "wav",
                 "voice": "af_heart"}}

create is asynchronous: it returns a job_id instantly and runs the pipeline in the background, so a long clip never blocks your request. Poll job for progress until status is completed — polling never counts against your quota. Optional parameters let you pin the source_lang, pick a voice and set the speaking speed.

How speech-to-speech works

One create call runs the whole pipeline. Every stage is explainable — the finished job hands back the transcripts it produced along the way.

  • Transcribe the speech. The spoken clip is transcribed to text; if you do not pass a source_lang it is detected automatically.
  • Translate the transcript. The text is translated into the target language you asked for, so meaning is carried before any audio is synthesized.
  • Synthesize the target speech. The translation is spoken back in a natural voice — choose the voice and the speaking speed, or take the defaults.
  • Both transcripts, always. The finished job returns the source transcript and the translation alongside the audio, so every result is reviewable and easy to caption.
  • One async job. create returns a job_id instantly and the pipeline runs in the background; a short clip is typically ready in well under a minute.

What it's for

  • Support & contact centers: let an agent and a caller who speak different languages hear each other in their own language.
  • Localization & media: re-voice an announcement, a message or a short clip into another language without booking a studio.
  • Accessibility & reach: pair the translated audio with the source and translated transcripts for captions and searchable copy in both languages.
  • Language learning: let learners hear exactly how a phrase sounds spoken aloud in the language they are studying.
  • Voice messaging & apps: translate a spoken message into the recipient's language before you deliver it.
  • One bill, one key: same Brainiall API key and usage-based billing as the rest of the catalog — no separate speech vendor to procure.

Press kit & resources

What reviewers, integrators and procurement teams typically ask for.

OpenAPI spec

The machine-readable OpenAPI 3.1 definition for the whole catalog, including both speech-to-speech endpoints and their schemas.

View spec

API reference

OpenAPI spec, the request/response schema, the job lifecycle, error codes and rate limits.

Read docs →

Try it now

Free API key in 30 seconds — 30 translation jobs/month, no card.

Get a key →

Compare the catalog

How Brainiall's specialty APIs line up against AWS, Azure and the specialists, use case by use case.

See the comparison →

More specialty APIs

Same single API key, same usage-based pricing, different problem solved.

Get your free API key in 30 seconds

Start free →
Speech-to-Speech API — Brainiall (translate spoken audio into another language, per-job pricing, audio in / audio out) | Brainiall