Choosing Benchmarks That Predict Production
That benchmark showing 10,000 tokens/second? It probably used batch size 64 and measured mean latency. Here's how to benchmark for reality.
11 posts tagged with "evaluation"
That benchmark showing 10,000 tokens/second? It probably used batch size 64 and measured mean latency. Here's how to benchmark for reality.