Fine-tune, RAG, or prompt? A ruthless decision framework
Start cheap. Most teams fine-tune too early. Here's a ruthless decision framework for when to use prompting, RAG, or fine-tuning.
Start cheap
The order of complexity (and cost) is:
1. **Prompting** — fast, cheap, flexible
2. **RAG (Retrieval-Augmented Generation)** — adds knowledge without retraining
3. **Fine-tuning** — deeper behavior change, more maintenance
Most teams jump to fine-tuning too early. It's expensive, requires labeled data, and creates maintenance burden. Only fine-tune if cheaper options fail—and you can prove it with evals.
The decision tree
Problem: Missing knowledge
The model doesn't know facts specific to your domain.
Solution: RAGRetrieve relevant context at runtime and inject it into the prompt. No retraining needed. Knowledge can be updated without touching the model.
Problem: Missing format discipline
The model outputs are inconsistent, unstructured, or don't follow your schema.
Solution: Prompt + validation firstUse JSON mode, function calling, or structured output APIs. Add validation and retry logic. Most format problems are solved by better prompting and post-processing.
Problem: Missing stable behavior or style
The model's tone, reasoning pattern, or decision-making is inconsistent even with good prompts.
Solution: Consider fine-tuning—if you have the dataFine-tuning can instill consistent behavior, but only if:
The hidden cost of fine-tuning
Fine-tunes drift. Base models get updated. Your training data becomes stale. If you can't commit to:
...then don't fine-tune yet. The maintenance burden will eat your productivity.
The checklist
Before fine-tuning, answer these:
If any answer is "no," go back and fix that first.
The path forward
If you're considering post-training, I can help you prove whether it's necessary—and install the eval harness that makes it safe.
Book a call to discuss your specific case.
Want to discuss this topic?
I'm happy to chat about how these ideas apply to your specific situation.
Book a 20-min call