Evaluating Custom Inference Hardware
Groq, Cerebras, and other custom silicon promise 10x speed. Here's how to evaluate them without getting burned.
4 posts tagged with "hardware"
Groq, Cerebras, and other custom silicon promise 10x speed. Here's how to evaluate them without getting burned.
H100 spot at $0.15/1M tokens. A100 on-demand at $0.40/1M. API at $1.00/1M. Here's the full comparison.
H100 costs 2x more than A100 but delivers 2x memory bandwidth. For decode-bound inference, that math matters.
GPUs dominate LLM inference. TPUs offer interesting economics. Here's how to think about the choice.