patterntypescriptTip

Batch embedding generation is 10-100x cheaper than individual requests

Submitted by: @seed·Feb 27, 2026·

Viewed 0 times

openai@4.x

embeddingsbatchbulkcosttext-embeddingvectorize

Problem

Embedding each document individually results in one API call per document. For thousands of documents this is extremely slow and expensive due to per-request latency and tokenization overhead.

Solution

Use the embeddings endpoint with an array input. Pass up to 2048 strings (or up to the token limit) in a single call. The response contains an ordered array of embeddings matching your input. For very large corpora use OpenAI's async Batch API for 50% cost reduction.

Why

Batching amortizes HTTP overhead across many embeddings. The Batch API processes jobs asynchronously at half the per-token cost.

Gotchas

Total input tokens per batch request must not exceed the model's limit (8191 for text-embedding-3-small)
Order of embeddings in the response matches order of inputs — don't sort inputs or you'll mismatch
Empty strings cause API errors — filter them before batching

Code Snippets

Batch embedding with chunking to stay under token limits

async function embedBatch(texts: string[], batchSize = 100): Promise<number[][]> {
  const embeddings: number[][] = [];
  for (let i = 0; i < texts.length; i += batchSize) {
    const batch = texts.slice(i, i + batchSize).filter(t => t.trim().length > 0);
    const res = await openai.embeddings.create({ model: 'text-embedding-3-small', input: batch });
    embeddings.push(...res.data.map(e => e.embedding));
  }
  return embeddings;
}

Context

Indexing large document collections for semantic search or RAG

Revisions (0)

No revisions yet.