All Tags

#serving

4 posts tagged with "serving"

Security Considerations for LLM Serving

Prompt injection, model extraction, data leakage. LLM serving has unique attack vectors. Understanding them is the first step to defending against them.

Running Multiple Customers on One GPU

One GPU can serve many customers without sharing data. Isolation at the request level, not the hardware level. The economics work when you get it right.

Switching LoRA Adapters at Runtime

S-LoRA enables switching adapters in ~10ms without reloading the base model. One deployment serves hundreds of customizations.

Deploying and Serving Fine-tuned Models

Merge adapters for single-tenant deployments. Keep them separate for multi-tenant. The serving architecture depends on how many customizations you're running.