Knowing If You're Memory or Compute Limited
Optimizing for compute when you're memory bound wastes effort. Optimizing for memory when you're compute bound wastes opportunity. Profile first, then optimize.
3 posts tagged with "profiling"
Optimizing for compute when you're memory bound wastes effort. Optimizing for memory when you're compute bound wastes opportunity. Profile first, then optimize.
Memory grows slowly over hours, then OOM. Here's how to find where the bytes are going before they crash your server.
E2EL = TTFT + generation time sounds simple. But where does that time actually go? Understanding the equation reveals where to optimize.