#multi-tenant

4 posts tagged with "multi-tenant"

Dec 13, 2025

Running Multiple Customers on One GPU

One GPU can serve many customers without sharing data. Isolation at the request level, not the hardware level. The economics work when you get it right.

Nov 12, 2025

Switching LoRA Adapters at Runtime

S-LoRA enables switching adapters in ~10ms without reloading the base model. One deployment serves hundreds of customizations.

Mar 29, 2025

Implementing Request Priority in LLM Serving

Premium users expect faster responses. Batch jobs can wait. Here's how to implement priority queues that don't starve anyone.

Mar 26, 2025

Balancing Fast Responses and Fair Queuing

A 10,000-token request takes 20 seconds. Behind it, a hundred 50-token requests wait. Is that fair? What even is fair in LLM serving?