#cost

23 posts tagged with "cost"

Dec 20, 2025

The Techniques That Actually Cut Costs

Not all optimizations are equal. Prefix caching saves 40%. Quantization saves 50%. Smart routing saves 60%. Know which levers move the needle for your workload.

Nov 5, 2025

The Real Cost: Fine-tuning vs Prompting

Prompting has high per-call cost but zero upfront investment. Fine-tuning has low per-call cost but significant upfront investment. The crossover point matters.

Oct 11, 2025

Getting 95% Quality at 12% Cost

Most queries don't need the full context. Selecting the right 12% often preserves 95% of quality at a fraction of the cost and latency.

Sep 20, 2025

Mapping Quality Against Cost

Every configuration lives on a quality-cost curve. Some are on the efficient frontier, most aren't. Map the frontier, then choose your spot deliberately.

Aug 20, 2025

Using Rate Limits to Control Spend

One runaway bug can burn $50K in a weekend. Rate limits aren't just for abuse prevention. They're your circuit breaker.

Aug 16, 2025

Starting Cheap and Escalating When Needed

Try the small model first. If it fails or isn't confident, try the large one. Cascade routing gets 80% savings on 80% of requests.

Aug 13, 2025

Deciding Which Model Handles Each Request

Send classification to Haiku, reasoning to Opus. Routing requests to the right model saves money without sacrificing quality.

Jun 28, 2025

Calculating If Quantization Pays Off

Quantization saves memory. But does it improve cost per token? The ROI depends on whether you're memory-bound or compute-bound.

May 17, 2025

Using Spot Instances for Inference Workloads

Spot instances are 50-70% cheaper. But they can disappear. Here's how to use them without breaking production.

May 14, 2025

Cost Per Token Across Hardware Options

H100 spot at $0.15/1M tokens. A100 on-demand at $0.40/1M. API at $1.00/1M. Here's the full comparison.