Self-distillation through the ages
A couple of really phenomenal recent papers have brought up new approaches to self distillation in language models and reinforcement learning, and motivated ...
A couple of really phenomenal recent papers have brought up new approaches to self distillation in language models and reinforcement learning, and motivated ...
There have been quite a few interesting takes on what I’m going to call “residual expansion” over the last year, particularly notably Deepseek’s Manifold-con...
This post is a brief, intuitive summary of my paper, “Geometric sparsification in recurrent neural networks.” Academic publications emphasize formal descript...