All Tags

#performance

7 posts tagged with "performance"

Where Speculative Decoding Actually Helps

Speculative decoding shines when outputs are predictable. Code completion, structured generation, and templates see 2x+ gains. Creative writing doesn't.

Adding GPUs Without Linear Speedup

Four GPUs don't give you 4x throughput. Communication overhead, load imbalance, and synchronization eat into gains. Know the scaling curve before you buy.