#deployment

8 posts tagged with "deployment"

Nov 8, 2025

Deploying and Serving Fine-tuned Models

Merge adapters for single-tenant deployments. Keep them separate for multi-tenant. The serving architecture depends on how many customizations you're running.

Oct 29, 2025

Running Fine-tuned Models in Production

Fine-tuning a model is the easy part. Running it in production with checkpoints, evals, rollback, and serving is the hard part. Here's the full picture.

Sep 10, 2025

Treating Evals as Non-Negotiable Constraints

If your optimization breaks an eval, the optimization is wrong. Evals are invariants, not suggestions. Ship nothing that fails them.

Aug 9, 2025

Managing Model Versions Without Downtime

Models change. Prompts change. How do you update without breaking clients? Immutable versions and controlled rollout.

Aug 6, 2025

Safe Rollouts for LLM Changes

Model changes are high-risk deployments. 1% traffic to new, compare outputs, then gradually expand. Here's the playbook.

Jun 18, 2025

Taking PyTorch Models to Production

Raw PyTorch is 3-5x slower than optimized serving. Here's the gap and how to close it.

May 28, 2025

The Checklist Before You Deploy

12 things to check before your LLM goes to production. Most teams skip at least half. That's how incidents happen.

Jan 25, 2025

Adding Token Budgets to Your Deploy Process

Your API has rate limits. Your database has connection limits. Your LLM endpoints should have token limits. Here's how to add them without breaking production.