Wyatt Mackey

Researcher in applied topology and machine learning. I also write poems sometimes.

Self-distillation through the ages

21 minute read

A couple of really phenomenal recent papers have brought up new approaches to self distillation in language models and reinforcement learning, and motivated ...

Residual expansion: hyper-connections, virtual width, and attention in the depth direction

13 minute read

There have been quite a few interesting takes on what I’m going to call “residual expansion” over the last year, particularly notably Deepseek’s Manifold-con...

Moduli Regularization

7 minute read

This post is a brief, intuitive summary of my paper, “Geometric sparsification in recurrent neural networks.” Academic publications emphasize formal descript...

Wyatt Mackey

Recent Posts

Self-distillation through the ages

Residual expansion: hyper-connections, virtual width, and attention in the depth direction

Moduli Regularization