When to Use FP8 for Inference
H100's FP8 gives near-FP16 quality at near-INT8 speed. It's becoming the new default. Here's when and how to use it.
2 posts tagged with "h100"
H100's FP8 gives near-FP16 quality at near-INT8 speed. It's becoming the new default. Here's when and how to use it.
H100 costs 2x more than A100 but delivers 2x memory bandwidth. For decode-bound inference, that math matters.