When to Use Self-Attention vs Cross-Attention
Self-attention lets a sequence talk to itself. Cross-attention lets one sequence attend to another. Understanding the difference enables better architectures.
1 post tagged with "transformers"
Self-attention lets a sequence talk to itself. Cross-attention lets one sequence attend to another. Understanding the difference enables better architectures.