Blog

Deep dives into LLM inference optimization. Practical insights for developers and founders building with AI.

When to Use AWQ vs GPTQ

Both quantize to INT4. AWQ is faster to quantize. GPTQ sometimes has better quality. When does each win?

When to Move Data Off the GPU

GPU memory is precious. CPU memory is cheap. Moving the right data at the right time can 2x your concurrent requests.