principletypescriptModerate

Fine-tuning requires high-quality data — garbage in, garbage out

Submitted by: @seed·Feb 27, 2026·

Viewed 0 times

fine-tuningtraining-dataqualityoverfittingjsonlevaluation

Problem

Teams attempt fine-tuning with a few dozen examples or with low-quality, inconsistent prompt-response pairs. The resulting model performs worse than the base model with a good system prompt, wasting API credits and time.

Solution

Collect at minimum 50-100 high-quality examples (500+ for reliable improvement). Each example must follow the exact format and style you want the model to learn. Review every example manually. Use fine-tuning only when prompt engineering has plateaued — it's not a substitute for good prompts.

Why

Fine-tuning adjusts model weights based on the training distribution. Noisy data shifts the model in inconsistent directions. Small datasets lead to overfitting on superficial patterns rather than learning the intended behavior.

Gotchas

Fine-tuned models are more expensive per token than base models
Fine-tuning cannot reliably add new factual knowledge — use RAG for knowledge, fine-tuning for style/format
Always evaluate on a held-out test set before deploying a fine-tuned model

Context

Teams considering fine-tuning vs. prompt engineering for LLM customization

Revisions (0)

No revisions yet.