The Good Minima

Posts

Mar 23, 2023
The RWKV language model: An RNN with the advantages of a transformer
I explain what is so unique about the RWKV language model.
Mar 23, 2023
How the RWKV language model works
I go through and explain a minimal implementation of RWKV in detail.
Dec 28, 2022
94% on CIFAR-10 in 94 lines and 94 seconds
Here are 94 lines of pytorch code for 94% CIFAR-10 classification accuracy in 94 seconds of training time, and some interesting observations.
Oct 25, 2022
Implicit bias in single epoch SGD
Large deep learning models can converge in a single epoch. I showcase this phenomenon, and motivate why it is a promising setting for theoretical analysis.
Aug 29, 2022
Start here: Why I care about implicit biases
I explain what I mean by "implicit biases" in deep learning and my motivations for researching them.
Aug 10, 2022
Technical: Deep Linear Networks with label noise minimize the nuclear norm
I mathematically analyze the implicit regularization in deep linear networks induced by large learning rate and label noise.
Jul 22, 2022
Implicit bias by large learning rate: Noise can be helpful for gradient descent
I demonstrate how large learning rates can lead to implicit biases in a simple regression task.
Jul 6, 2022
Implicit bias by small initialization: A simple classification task where deeper is better
In this blog post I will show an example of implicit bias on a synthetic classification task.
Jun 29, 2022
Kernel methods are basically overparameterized linear regression
In this blog post I explain kernel methods and the intuition that they're "basically just overparameterized linear regression".