When to Use FP8 for Inference
H100's FP8 gives near-FP16 quality at near-INT8 speed. It's becoming the new default. Here's when and how to use it.
2 posts tagged with "precision"
H100's FP8 gives near-FP16 quality at near-INT8 speed. It's becoming the new default. Here's when and how to use it.
FP16 to INT8 is usually safe. INT8 to INT4 requires careful testing. Here's how to choose.