A researcher's journey from RNN bottlenecks to FlashAttention-3 — with runnable code at every step.
44 min read · March 11, 2025
2025 · attention transformers deep-learning nlp · research