Choosing Benchmarks That Predict Production
That benchmark showing 10,000 tokens/second? It probably used batch size 64 and measured mean latency. Here's how to benchmark for reality.
1 post tagged with "methodology"
That benchmark showing 10,000 tokens/second? It probably used batch size 64 and measured mean latency. Here's how to benchmark for reality.