Rogue Iteration Studio
Back to Insights
fine-tuningragpromptingarchitecture
December 28, 2023

Fine-tune, RAG, or prompt? A ruthless decision framework

Start cheap. Most teams fine-tune too early. Here's a ruthless decision framework for when to use prompting, RAG, or fine-tuning.

Start cheap

The order of complexity (and cost) is:

1. **Prompting** — fast, cheap, flexible

2. **RAG (Retrieval-Augmented Generation)** — adds knowledge without retraining

3. **Fine-tuning** — deeper behavior change, more maintenance

Most teams jump to fine-tuning too early. It's expensive, requires labeled data, and creates maintenance burden. Only fine-tune if cheaper options fail—and you can prove it with evals.

The decision tree

Problem: Missing knowledge

The model doesn't know facts specific to your domain.

Solution: RAG

Retrieve relevant context at runtime and inject it into the prompt. No retraining needed. Knowledge can be updated without touching the model.

Problem: Missing format discipline

The model outputs are inconsistent, unstructured, or don't follow your schema.

Solution: Prompt + validation first

Use JSON mode, function calling, or structured output APIs. Add validation and retry logic. Most format problems are solved by better prompting and post-processing.

Problem: Missing stable behavior or style

The model's tone, reasoning pattern, or decision-making is inconsistent even with good prompts.

Solution: Consider fine-tuning—if you have the data

Fine-tuning can instill consistent behavior, but only if:

  • You have labeled examples of correct behavior (hundreds to thousands)
  • You have evals that prove the fine-tune is better than prompting
  • You're prepared for ongoing maintenance (models drift, fine-tunes need updates)
  • The hidden cost of fine-tuning

    Fine-tunes drift. Base models get updated. Your training data becomes stale. If you can't commit to:

  • Regular evaluation against a golden set
  • Periodic retraining cadence
  • A/B testing fine-tuned vs. base models
  • ...then don't fine-tune yet. The maintenance burden will eat your productivity.

    The checklist

    Before fine-tuning, answer these:

  • [ ] Have I tried better prompting with structured outputs?
  • [ ] Have I added RAG for missing knowledge?
  • [ ] Do I have 500+ labeled examples of correct behavior?
  • [ ] Do I have evals that quantify the gap between current and desired?
  • [ ] Am I prepared for ongoing maintenance and retraining?
  • If any answer is "no," go back and fix that first.

    The path forward

    If you're considering post-training, I can help you prove whether it's necessary—and install the eval harness that makes it safe.

    Book a call to discuss your specific case.

    Want to discuss this topic?

    I'm happy to chat about how these ideas apply to your specific situation.

    Book a 20-min call