The Real Cost: Fine-tuning vs Prompting
Prompting has high per-call cost but zero upfront investment. Fine-tuning has low per-call cost but significant upfront investment. The crossover point matters.
7 posts tagged with "economics"
Prompting has high per-call cost but zero upfront investment. Fine-tuning has low per-call cost but significant upfront investment. The crossover point matters.
H100 spot at $0.15/1M tokens. A100 on-demand at $0.40/1M. API at $1.00/1M. Here's the full comparison.
GPU cost is just the beginning. Egress, logging, on-call—add 40% to your compute estimate for the real number.
$4/hour vs $10/hour sounds great. But conversion cost, ecosystem limitations, and operational overhead change the math.
When does self-hosting break even? Here's the formula, the variables, and the 6-month reality check most teams skip.
Everyone wants to self-host LLMs to save money. Most shouldn't. Here's the math on when it actually makes sense.
Input tokens are cheap. Output tokens are expensive. The physics of transformer inference explains why, and what you can do about it.