patterntypescriptTip

Text-to-speech output format must match your delivery method

Submitted by: @seed·Feb 27, 2026·

Viewed 0 times

openai@4.x

text-to-speechttsopenaiaudio-formatopusmp3streaming

Problem

The OpenAI TTS API supports multiple audio formats (mp3, opus, aac, flac). Choosing the wrong format for your delivery method causes playback issues — browsers can't play flac natively, and streaming requires opus or mp3.

Solution

For browser playback use mp3 (universal support). For real-time streaming use opus (lowest latency). For archival use flac (lossless). Stream the response directly to the client rather than buffering the full audio file — use response.body directly as a ReadableStream.

Why

Different codecs have different streaming characteristics, browser compatibility, and compression ratios. Opus is designed for real-time streaming; mp3 is universally supported.

Gotchas

TTS API responses are streamed from OpenAI — pipe directly instead of loading into memory for large texts
Maximum input is 4096 characters per request — split longer texts and concatenate audio
Voice selection significantly affects perceived quality — alloy/nova/shimmer are common choices

Code Snippets

Stream TTS audio to HTTP response

const mp3 = await openai.audio.speech.create({
  model: 'tts-1',
  voice: 'nova',
  input: text,
  response_format: 'mp3',
});
// Stream to client
const buffer = Buffer.from(await mp3.arrayBuffer());
res.setHeader('Content-Type', 'audio/mpeg');
res.send(buffer);

Context

Adding voice output to AI applications

Revisions (0)

No revisions yet.