x

AI ALIGNMENT FORUM
AF

Tamsin Leake — AI Alignment Forum

Tamsin Leake

Top postsTop post

Tamsin Leake

Message

i unlisted some of my posts because of wide concerns about people being careless about exfohazards.

pgp key, email, proof, matrix: @carado4:matrix.org.

2924

Ω

71

8

206

5y

Tamsin Leake

i unlisted some of my posts because of wide concerns about people being careless about exfohazards.

pgp key, email, proof, matrix: @carado4:matrix.org.

Top postsTop post

Orthogonal's Formal-Goal Alignment theory of change

We recently announced [Orthogonal, an agent foundations alignment research organization. In this post, I give a thorough explanation of the formal-goal alignment framework, the motivation behind it, and the theory of change it fits in. The overall shape of what we're doing is: * Building a formal goal which would lead to good worlds when pursued — our best candidate for this is QACI * Designing an AI which takes as input a formal goal, and returns actions which pursue that goal in the distribution of worlds we likely inhabit Backchaining: aiming at solutions One core aspect of our theory of change is backchaining: come up with an at least remotely plausible story for how the world is saved from AI doom, and try to think about how to get there. This avoids spending lots of time getting confused about concepts that are confusing because they were the wrong thing to think about all along, such as "what is the shape of human values?" or "what does GPT4 want?" — our intent is to study things that fit together to form a full plan for saving the world. Alignment engineering and agent foundations Alignment is not just not the default, it's a very narrow target. As a result, there are many bits of non-obvious work which need to be done. Alignment isn't just finding the right weight to sign-flip to get the AI to switch from evil to good; it is the hard work of putting together something which coherently and robustly points in a direction we like. as yudkowsky puts it: > The idea with agent foundations, which I guess hasn't successfully been communicated to this day, was finding a coherent target to try to get into the system by any means (potentially including DL ones). Agent foundations/formal-goal alignment is not fundamentally about doing math or being theoretical or thinking abstractly or proving things. Agent foundations/formal-goal alignment is about building a coherent target which is fully made of math — not of human words with unspecified meaning — and fig

How LDT helps reduce the AI arms race

formalizing the QACI alignment formal-goal

Epistemic states as a potential benign prior

Epistemic states as a potential benign prior

Malignancy in the prior seems like a strong crux of the goal-design part of alignment to me. Whether your prior is going to be used to model: * processes in the multiverse containing the AI which does said modeling, * processes which would output all of some blog so we...

Aug 31, 2024•31

How LDT helps reduce the AI arms race

(Epistemic status: I think this is right?) Alice is the CEO of ArmaggAI, and Bob is the CEO of BigModelsAI, two major AI capabilities organizations. They're racing to be the first to build a superintelligence aligned to their respective CEV which would take over the universe and satisfy their values....

Dec 10, 2023•65

formalizing the QACI alignment formal-goal

this work was done by Tamsin Leake and Julia Persson at Orthogonal. thanks to mesaoptimizer for his help putting together this post. what does the QACI plan for formal-goal alignment actually look like when formalized as math? in this post, we'll be presenting our current formalization, which we believe has...

Jun 10, 2023•54

Orthogonal's Formal-Goal Alignment theory of change

We recently announced [Orthogonal, an agent foundations alignment research organization. In this post, I give a thorough explanation of the formal-goal alignment framework, the motivation behind it, and the theory of change it fits in. The overall shape of what we're doing is: * Building a formal goal which would...

May 5, 2023•69