Fine-tune, RAG, or prompt? A ruthless decision framework

Start cheap

The order of complexity (and cost) is:

1. **Prompting** — fast, cheap, flexible

2. **RAG (Retrieval-Augmented Generation)** — adds knowledge without retraining

3. **Fine-tuning** — deeper behavior change, more maintenance

Most teams jump to fine-tuning too early. It's expensive, requires labeled data, and creates maintenance burden. Only fine-tune if cheaper options fail—and you can prove it with evals.

The decision tree

Problem: Missing knowledge

The model doesn't know facts specific to your domain.

Solution: RAG

Retrieve relevant context at runtime and inject it into the prompt. No retraining needed. Knowledge can be updated without touching the model.

Problem: Missing format discipline

The model outputs are inconsistent, unstructured, or don't follow your schema.

Solution: Prompt + validation first

Use JSON mode, function calling, or structured output APIs. Add validation and retry logic. Most format problems are solved by better prompting and post-processing.

Problem: Missing stable behavior or style

The model's tone, reasoning pattern, or decision-making is inconsistent even with good prompts.

Solution: Consider fine-tuning—if you have the data

Fine-tuning can instill consistent behavior, but only if:

You have labeled examples of correct behavior (hundreds to thousands)

You have evals that prove the fine-tune is better than prompting

You're prepared for ongoing maintenance (models drift, fine-tunes need updates)

The hidden cost of fine-tuning

Fine-tunes drift. Base models get updated. Your training data becomes stale. If you can't commit to:

Regular evaluation against a golden set

Periodic retraining cadence

A/B testing fine-tuned vs. base models

...then don't fine-tune yet. The maintenance burden will eat your productivity.

The checklist

Before fine-tuning, answer these:

[ ] Have I tried better prompting with structured outputs?

[ ] Have I added RAG for missing knowledge?

[ ] Do I have 500+ labeled examples of correct behavior?

[ ] Do I have evals that quantify the gap between current and desired?

[ ] Am I prepared for ongoing maintenance and retraining?

If any answer is "no," go back and fix that first.

The path forward

If you're considering post-training, I can help you prove whether it's necessary—and install the eval harness that makes it safe.

Book a call to discuss your specific case.