User Comment Replies — AI Alignment Forum

An exploration of GPT-2's embedding weights

Image layout is a little broken. I'll try to fix it tomorrow.

Sorry if this is a spoiler for your next post, but I take issue with the heading "Standard measures of information theory do not work" and the implication that this post contains the pre-Crutchfield state of the art.

The standard approach to this in information theory (which underlies the loss function of autoregressive LMs) isn't to try to match the Shannon entropy of the marginal distribution of bits (a 50-50 distribution in your post), it's to treat the generative model as a distribution for each bit conditional on the previous bits and use the cross-ent... (read more)

3Alexander Gietelink Oldenziel2y

Yeah follow-up posts will definitely get into that! To be clear: (1) the initial posts won't be about Crutchfield work yet - just introducing some background material and overarching philosophy (2) The claim isn't that standard measures of information theory are bad. To the contrary! If anything we hope these posts will be somewhat of an ode to information theory as a tool for interpretability. Adam wanted to add a lot of academic caveats - I was adamant that we streamline the presentation to make it short and snappy for a general audience but it appears I might have overshot ! I will make an edit to clarify. Thank you! I agree with you about the importance of Kolmogorov complexity philosophically and would love to read a follow-up post on your thoughts about Kolmogorov complexity and LLM interpretability:)

Richard Ngo2y86

A couple of differences between Kolmogorov complexity/Shannon entropy and the loss function of autoregressive LMs (just to highlight them, not trying to say anything you don't already know):

The former are (approximately) symmetric, the latter isn't (it can be much harder to predict a string front-to-back than back-to-front)
The former calculate compression values as properties of a string (up to choice of UTM). The latter calculates compression values as properties of a string, a data distribution, and a model (and even then doesn't strictly determine the r

... (read more)

AI ALIGNMENT FORUM
AF

All of Adam Scherlis's Comments + Replies