All of Qria's Comments + Replies

Qria
-10

Does this framework also explain grokking phenomenon?

I haven't yet fully understood your hypothesis except that behaviour gradient is useful for measuring something related to inductive bias, but above paper seems to touch a similar topic (generalization) with similar methods (experiments on fully known toy examples such as SO5).

2Vivek Hebbar
I'm pretty sure my framework doesn't apply to grokking.  I usually think about training as ending once we hit zero training loss, whereas grokking happens much later.