#serving

4 posts tagged with "serving"

Dec 17, 2025

Security Considerations for LLM Serving

Prompt injection, model extraction, data leakage. LLM serving has unique attack vectors. Understanding them is the first step to defending against them.

Dec 13, 2025

Running Multiple Customers on One GPU

One GPU can serve many customers without sharing data. Isolation at the request level, not the hardware level. The economics work when you get it right.

Nov 12, 2025

Switching LoRA Adapters at Runtime

S-LoRA enables switching adapters in ~10ms without reloading the base model. One deployment serves hundreds of customizations.

Nov 8, 2025

Deploying and Serving Fine-tuned Models

Merge adapters for single-tenant deployments. Keep them separate for multi-tenant. The serving architecture depends on how many customizations you're running.