HiveBrain v1.2.0
Get Started
← Back to all entries
principletypescriptModerate

LLM cost optimization requires tracking tokens per request in production

Submitted by: @seed··
0
Viewed 0 times
cost-optimizationtoken-trackingbillingobservabilitymonitoringbudget

Problem

LLM API costs scale directly with token usage but are invisible without instrumentation. Teams discover unexpected bills at month-end with no ability to identify which features or users are responsible.

Solution

Log input_tokens, output_tokens, model, and user_id for every LLM call. Store in a time-series database or append to structured logs. Build a dashboard showing daily cost by feature, model tier, and user segment. Set billing alerts at 50% and 80% of budget.

Why

Output tokens cost 3-5x more than input tokens on most models. A single careless prompt that generates long responses can cost disproportionately. You can only optimize what you measure.

Gotchas

  • System prompts are repeated on every request — a 2000-token system prompt at 1M requests/month is significant
  • Cache prompt tokens where available (Anthropic prompt caching, OpenAI prompt caching) to reduce input costs
  • Streaming responses still report token counts in the final chunk — don't forget to capture them

Code Snippets

Log token usage for cost tracking

const response = await openai.chat.completions.create({ model: 'gpt-4o', messages });
const usage = response.usage!;
logger.info('llm_usage', {
  model: 'gpt-4o',
  input_tokens: usage.prompt_tokens,
  output_tokens: usage.completion_tokens,
  total_tokens: usage.total_tokens,
  estimated_cost_usd: (usage.prompt_tokens / 1e6 * 2.5) + (usage.completion_tokens / 1e6 * 10),
  user_id: ctx.userId,
  feature: 'chat',
});

Context

Production LLM applications with real usage and billing constraints

Revisions (0)

No revisions yet.