#benchmarks

2 posts tagged with "benchmarks"

Feb 26, 2025

FlashAttention claims 2-4x speedup. CUDA graphs claim 10x. What actually helps in production, and what's just good marketing?

Feb 22, 2025

That benchmark showing 10,000 tokens/second? It probably used batch size 64 and measured mean latency. Here's how to benchmark for reality.