User Comment Replies — AI Alignment Forum

Transformers Represent Belief State Geometry in their Residual Stream

That is a fair summary.

On Solving Problems Before They Appear: The Weird Epistemologies of Alignment

This post really helped me make concrete some of the admittedly gut reaction type concerns/questions/misunderstandings I had about alignment research, thank you. I have a few thoughts after reading:

(1) I wonder how different some of these epistemic strategies are from everyday normal scientific research in practice. I do experimental neuroscience and I would argue that we also are not even really sure what the "right" questions are (in a local sense, as in, what experiment should I do next), and so we are in a state where we kinda fumble around using whate... (read more)

1Adam Shimi3y

Thanks for the kind words and thoughtful comment! Glad it helped! That was definitely one goal, the hardest to check with early feedback because I mostly know people who already work in the field or have never been confronted to it, while you're in the middle. :) Completely! One thing I tried to make clear in this draft (maybe not successfully given your comment ^^) is that many field, including natural sciences, leverage far more epistemic strategies than the Popperian "make a model, predict something and test it in the real world". My points are more that: 1. Alignment is particularly weird in terms of epistemic strategies because neither the problem nor the technology exists 2. Given the potential urgency of alignment, it's even more important to clarify these epistemic subtleties. But I'm convinced that almost all disciplines could be the subject of a deep study in the methods people use to wrestle knowledge from the world. That's part of my hopes, since I want to steal epistemic strategies from many different fields and see how they apply to alignment. Fascinating! Yeah, I agree with you that the analogy definitely exists, particularly with fundamental science. And that's part of the difficulty in alignment. Maybe the best comparison would be trying to cure a neurological pathology without having access to a human brain, but only to brains of individuals of very old species in our evolutionary lineage. It's harder, but linking the experimental results to the concrete object is still part of the problem. (Would be very interested in having a call about the different epistemic strategies you use in experimental neuroscience by the way) So I disagree, but you're touching on a fascinating topic, one that confused me for the longest time. My claim is that pure maths is fundamentally the study of abstraction (Platonists would disagree, but that's more of a minority position nowadays). Patterns is also a word commonly used when mathematicians wax poetic. Wha

Testing The Natural Abstraction Hypothesis: Project Update

Adam Shai4y60

It's great to see someone working on this subject. I'd like to point you to Jim Crutchfield's work, in case you aren't familiar with it, where he proposes a "calculii of emergence" wherein you start with a dynamical system and via a procedure of teasing out the equivalence classes of how the past constrains the future, can show that you get the "computational structure" or "causal structure" or "abstract structure" (all loaded terms, I know, but there's math behind it), of the system. It's a compressed symbolic representation of what the dynamical system i... (read more)

AI ALIGNMENT FORUM
AF

All of Adam Shai's Comments + Replies