Recent Posts

Self-distillation through the ages

21 minute read

A couple of really phenomenal recent papers have brought up new approaches to self distillation in language models and reinforcement learning, and motivated ...

Moduli Regularization

7 minute read

This post is a brief, intuitive summary of my paper, “Geometric sparsification in recurrent neural networks.” Academic publications emphasize formal descript...