#optimization

38 posts tagged with "optimization"

Dec 27, 2025

Matching the Right Model to Each Task

8B models handle classification well. 70B models handle summarization. Code-specialized models beat generalists at code. Match the model to the task.

Dec 20, 2025

The Techniques That Actually Cut Costs

Not all optimizations are equal. Prefix caching saves 40%. Quantization saves 50%. Smart routing saves 60%. Know which levers move the needle for your workload.

Dec 10, 2025

Where Speculative Decoding Actually Helps

Speculative decoding shines when outputs are predictable. Code completion, structured generation, and templates see 2x+ gains. Creative writing doesn't.

Dec 6, 2025

How Speculative Decoding Works

A small model proposes tokens, a large model verifies in parallel. When predictions match, you get 2-3x speedup. When they don't, you're no worse off.

Dec 3, 2025

The Formula for Offloading Decisions

Transfer cost vs recompute cost. If moving data off GPU costs less than recomputing it, offload. If not, keep it. The math is straightforward.

Nov 29, 2025

Reducing KV Cache Size Without Quality Loss

KV cache is 40% of memory for long contexts. Compression techniques trade compute for memory without significant quality loss. Know when to use them.

Nov 1, 2025

What Actually Works with LoRA

LoRA tutorials make it look easy. Production LoRA requires learning rate adjustments, layer selection, rank tuning, and careful validation. Here's what actually works.

Oct 11, 2025

Getting 95% Quality at 12% Cost

Most queries don't need the full context. Selecting the right 12% often preserves 95% of quality at a fraction of the cost and latency.

Oct 4, 2025

Knowing If You're Memory or Compute Limited

Optimizing for compute when you're memory bound wastes effort. Optimizing for memory when you're compute bound wastes opportunity. Profile first, then optimize.

Sep 20, 2025

Mapping Quality Against Cost

Every configuration lives on a quality-cost curve. Some are on the efficient frontier, most aren't. Map the frontier, then choose your spot deliberately.