Security Considerations for LLM Serving
Prompt injection, model extraction, data leakage. LLM serving has unique attack vectors. Understanding them is the first step to defending against them.
4 posts tagged with "serving"
Prompt injection, model extraction, data leakage. LLM serving has unique attack vectors. Understanding them is the first step to defending against them.
One GPU can serve many customers without sharing data. Isolation at the request level, not the hardware level. The economics work when you get it right.
S-LoRA enables switching adapters in ~10ms without reloading the base model. One deployment serves hundreds of customizations.
Merge adapters for single-tenant deployments. Keep them separate for multi-tenant. The serving architecture depends on how many customizations you're running.