AI ALIGNMENT FORUM
AF

All of Tamsin Leake's Comments + Replies

Reposting myself from discord, on the topic of donating 5000$ to EA causes.

if you're doing alignment research, even just a bit, then the 5000$ are probly better spent on yourself
if you have any gears level model of AI stuff then it's better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you're essentially contributing to the "picking what to donate to" effort by thinking about it yourself
if you have no gears level model of AI then it's hard to judge which alignment orgs it's helpful to donate to (or, if gi

... (read more)

Zac Hatfield-Dodds1y812

I agree that there's no substitute for thinking about this for yourself, but I think that morally or socially counting "spending thousands of dollars on yourself, an AI researcher" as a donation would be an apalling norm. There are already far too many unmanaged conflicts of interest and trust-me-it's-good funding arrangements in this space for me, and I think it leads to poor epistemic norms as well as social and organizational dysfunction. I think it's very easy for donating to people or organizations in your social circle to have substantial negative ... (read more)

More people getting into AI safety should do a PhD

Tamsin Leake1y*31

So this option looks unattractive if you think transformative AI systems are likely to developed within the next 5 years. However, with a 10-years timeframe things look much stronger: you would still have around 5 years to contribute as a research.

This phrasing is tricky! If you think TAI is coming in approximately 10 years then sure, you can study for 5 years and then do research for 5 years.

But if you think TAI is coming within 10 years (for example, if you think that the current half-life on worlds surviving is 10 years; if you think 10 years is the ... (read more)

1Stephen McAleese1y

I think this section of the post is slightly overstating the opportunity cost of doing a PhD. PhD students typically spend most of their time on research so ideally, they should be doing AI safety research during the PhD (e.g. like Stephen Casper). If the PhD is in an unrelated field or for the sake of upskilling then there is a more significant opportunity cost relative to working directly for an AI safety organization.

5Richard Ngo1y

Note that these are very different claims, both because the half-life for a given value is below its mean, and because TAI doesn't imply doom. Even if you do have very high P(doom), it seems odd to just assume everyone else does too. So? Your research doesn't have to be useful in every possible world. If a PhD increases the quality of your research by, say, 3x (which is plausible, since research is heavy-tailed) then it may well be better to do that research for half the time. (In general I don't think x-risk-motivated people should do PhDs that don't directly contribute to alignment, to be clear; I just think this isn't a good argument for that conclusion.)

0th Person and 1st Person Logic

Tamsin Leake1y42

I'm confused about why 1P-logic is needed. It seems to me like you could just have a variable X which tracks "which agent am I" and then you can express things like sensor_observes(X, red) or is_located_at(X, northwest). Here and Absent are merely a special case of True and False when the statement depends on X.

4Adele Lopez1y

Because you don't necessarily know which agent you are. If you could always point to yourself in the world uniquely, then sure, you wouldn't need 1P-Logic. But in real life, all the information you learn about the world comes through your sensors. This is inherently ambiguous, since there's no law that guarantees your sensor values are unique. If you use X as a placeholder, the statement sensor_observes(X, red) can't be judged as True or False unless you bind X to a quantifier. And this could not mean the thing you want it to mean (all robots would agree on the judgement, thus rendering it useless for distinguishing itself amongst them). It almost works though, you just have to interpret "True" and "False" a bit differently!

Don't Share Information Exfohazardous on Others' AI-Risk Models

Tamsin Leake1y32

Hence, the policy should have an escape clause: You should feel free to talk about the potential exfohazard if your knowledge of it isn't exclusively caused by other alignment researchers telling you of it. That is, if you already knew of the potential exfohazard, or if your own research later led you to discover it.

In an ideal world, it's good to relax this clause in some way, from a binary to a spectrum. For example: if someone tells me of a hazard that I'm confident I would've discovered one my own one week later, then they only get to dictate me not... (read more)

Tamsin Leake1y10

My current belief is that you do make some update upon observing existing, you just don't update as much as if we were somehow able to survive and observe unaligned AI taking over. I do agree that the no update at all because you can't see the counterfactual is wrong, but anthropics is still somewhat filtering your evidence; you should update less.

(I don't have my full reasoning for {why I came to this conclusion} fully loaded rn, but I could probably do so if needed. Also, I only skimmed your post, sorry. I have a post on updating under anthropics with actual math I'm working on, but unsure when I'll get around to finishing it.)

How LDT helps reduce the AI arms race

Tamsin Leake1y31

Due to my timelines being this short, I'm hopeful that convincing just "the current crop of major-AI-Lab CEOs" might actually be enough to buy us the bulk of time that something like this could buy.

5. Risks from preventing legitimate value change (value collapse)

Tamsin Leake1y36

commenting on this post because it's the latest in the sequence; i disagree with the premises of the whole sequence. (EDIT: whoops, the sequence posts in fact discuss those premises so i probably should've commented on those. ohwell.)

the actual, endorsed, axiomatic (aka terminal aka intrinsic) values we have are ones we don't want to change, ones we don't want to be lost or modified over time. what you call "value change" is change in instrumental values.

i agree that, for example, our preferences about how to organize the society we live in should change o... (read more)

Tamsin Leake2y20

one solution to this problem is to simply never use that capability (running expensive computations) at all, or to not use it before the iterated counterfactual researchers have developed proofs that any expensive computation they run is safe, or before they have very slowly and carefully built dath-ilan-style corrigible aligned AGI.

Tamsin Leake2y20

nothing fundamentally, the user has to be careful what computation they invoke.

2Adele Lopez2y

That... seems like a big part of what having "solved alignment" would mean, given that you have AGI-level optimization aimed at (indirectly via a counter-factual) evaluating this (IIUC).

Tamsin Leake2y70

an approximate illustration of QACI:

4Adele Lopez2y

Nice graphic! What stops e.g. "QACI(expensive_computation())" from being an optimization process which ends up trying to "hack its way out" into the real QACI?