Posts
- Mar 23, 2023
I explain what is so unique about the RWKV language model.
- Mar 23, 2023
I go through and explain a minimal implementation of RWKV in detail.
- Dec 28, 2022
Here are 94 lines of pytorch code for 94% CIFAR-10 classification accuracy in 94 seconds of training time, and some interesting observations.
- Oct 25, 2022
Large deep learning models can converge in a single epoch. I showcase this phenomenon, and motivate why it is a promising setting for theoretical analysis.
- Aug 29, 2022
I explain what I mean by "implicit biases" in deep learning and my motivations for researching them.
- Aug 10, 2022
I mathematically analyze the implicit regularization in deep linear networks induced by large learning rate and label noise.
- Jul 22, 2022
I demonstrate how large learning rates can lead to implicit biases in a simple regression task.
- Jul 6, 2022
In this blog post I will show an example of implicit bias on a synthetic classification task.
- Jun 29, 2022
In this blog post I explain kernel methods and the intuition that they're "basically just overparameterized linear regression".