User Comment Replies — AI Alignment Forum

0Weiahe H22d

Interesting idea. The natural next question is: how would you use that second model to determine the kolmogorov complexity (or a metric similar to kolmogorov complexity) of the first model's weights? Let's say you want to use the complexity of the second model, assuming that it is the simplest possible model that can predict the first models weights, to help you determine that. But in order to satisfy that assumption, you could use a third model in a similar way to minimize the complexity of the second. And so on. Eventually you need to determine the complexity of the weights without training another model, using some metric (whether its weight norms, performance after pruning, or [insert clever method from the future]). Why not just apply this metric to the first model and not train additional ones? That said, I could be overlooking something and empirical results could suggest otherwise, so it could still be worth testing the idea out.

A Mechanistic Interpretability Analysis of Grokking

Thomas Dullien2y00

Good stuff. A few thoughts:

1. Assuming a model has memorized the training data, and still have enough "spare capacity" to play lottery ticket hypothesis to find generalizing solutions to a subset of the memorized data, you'll eventually end up with a number of partial solutions that generalize to a subset of the memorized data (obviously assuming some form of regularization towards simplicity). So this may be where the "underparametrized" regime of ML of the past went wrong: That approach tried to force the model into generalization without memorization, b... (read more)

AI ALIGNMENT FORUM
AF

All of Thomas Dullien's Comments + Replies