All Tags

#memory

17 posts tagged with "memory"

The Formula for Offloading Decisions

Transfer cost vs recompute cost. If moving data off GPU costs less than recomputing it, offload. If not, keep it. The math is straightforward.

Attention That Fits in Memory

Standard attention needs O(n²) memory. Memory-efficient variants need O(n). Same output, 10x less peak memory.

What Flash Attention Actually Does

Flash Attention doesn't make attention faster. It makes attention fit in memory. The speedup is a side effect of better memory access.