#infrastructure

7 posts tagged with "infrastructure"

May 17, 2025

Using Spot Instances for Inference Workloads

Spot instances are 50-70% cheaper. But they can disappear. Here's how to use them without breaking production.

May 7, 2025

Understanding Inference Platform Economics

GPU cost is just the beginning. Egress, logging, on-call—add 40% to your compute estimate for the real number.

Apr 23, 2025

Making the GPU vs TPU Decision

GPUs dominate LLM inference. TPUs offer interesting economics. Here's how to think about the choice.

Apr 12, 2025

The Math on Self-Hosting vs API

When does self-hosting break even? Here's the formula, the variables, and the 6-month reality check most teams skip.

Apr 9, 2025

When Self-Hosting Actually Saves Money

Everyone wants to self-host LLMs to save money. Most shouldn't. Here's the math on when it actually makes sense.

Feb 15, 2025

The Streaming Bug That Costs You 3 Seconds

Your code says streaming is enabled. Your load balancer says otherwise. Here's where streaming breaks and how to fix it.

Jan 4, 2025

Why Streaming Breaks and How to Fix It

Your code says streaming enabled. Your monitoring shows 0% actual streams. The bytes are getting collected somewhere between your model and the user's screen.