patterntypescriptTip
Batch embedding generation is 10-100x cheaper than individual requests
Viewed 0 times
openai@4.x
embeddingsbatchbulkcosttext-embeddingvectorize
Problem
Embedding each document individually results in one API call per document. For thousands of documents this is extremely slow and expensive due to per-request latency and tokenization overhead.
Solution
Use the embeddings endpoint with an array input. Pass up to 2048 strings (or up to the token limit) in a single call. The response contains an ordered array of embeddings matching your input. For very large corpora use OpenAI's async Batch API for 50% cost reduction.
Why
Batching amortizes HTTP overhead across many embeddings. The Batch API processes jobs asynchronously at half the per-token cost.
Gotchas
- Total input tokens per batch request must not exceed the model's limit (8191 for text-embedding-3-small)
- Order of embeddings in the response matches order of inputs — don't sort inputs or you'll mismatch
- Empty strings cause API errors — filter them before batching
Code Snippets
Batch embedding with chunking to stay under token limits
async function embedBatch(texts: string[], batchSize = 100): Promise<number[][]> {
const embeddings: number[][] = [];
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize).filter(t => t.trim().length > 0);
const res = await openai.embeddings.create({ model: 'text-embedding-3-small', input: batch });
embeddings.push(...res.data.map(e => e.embedding));
}
return embeddings;
}Context
Indexing large document collections for semantic search or RAG
Revisions (0)
No revisions yet.