-
Data Quality in the Age of LLMs: A Comprehensive Survey of Synthetic Data Generation
A survey-style blog post covering definitions, risks, current research, and open challenges in data quality for large language model training and fine-tuning.
-
Understanding Attention from the Bottom Up
A researcher's journey from RNN bottlenecks to FlashAttention-3 — with runnable code at every step.