#inference

3 posts tagged with "inference"

Jul 12, 2025

When to Use FP8 for Inference

H100's FP8 gives near-FP16 quality at near-INT8 speed. It's becoming the new default. Here's when and how to use it.

Jan 18, 2025

Calculating End-to-End Latency Correctly

E2EL = TTFT + generation time sounds simple. But where does that time actually go? Understanding the equation reveals where to optimize.

Jan 1, 2025

Four Metrics That Actually Matter for LLM Inference

Your monitoring dashboard shows 180ms average latency. Your users say the app is slow. Both are telling the truth. The disconnect is what you're measuring.