AI ALIGNMENT FORUM
AF

All of Gordon Seidoh Worley's Comments + Replies

Gordon Seidoh Worley4mo10Review for 2023 Review

I'd really like to see more follow up on the ideas made in this post. Our drive to care is arguably why we're willing to cooperate, and making AI that cares the same way we do is a potentially viable path to AI aligned with human values, but I've not seen anyone take it up. Regardless, I think this is an important idea and think folks should look at it more closely.

Teleosemantics!

Gordon Seidoh Worley4mo30Review for 2023 Review

I think this post is important because it brings old insights from cybernetics into a modern frame that relates to how folks are thinking about AI safety today. I strongly suspect that the big idea in this post, that ontology is shaped by usefulness, matters greatly to addressing fundamental problems in AI alignment.

Finding the Wisdom to Build Safe AI

Gordon Seidoh Worley9mo10

Seems reasonable. I do still worry quite a bit about Goodharting, but perhaps this could be reasonably addressed with careful oversight by some wise humans to do the wisdom equivalent of red teaming.

1Chris_Leong9mo

You mean it might still Goodhart to what we think they might say? Ideally, the actual people would be involved in the process.

We might be dropping the ball on Autonomous Replication and Adaptation.

Gordon Seidoh Worley10mo56

According to METR, the organization that audited OpenAI, a dozen tasks indicate ARA capabilities.

Small comment, but @Beth Barnes of METR posted on Less Wrong just yesterday to say "We should not be considered to have ‘audited’ GPT-4 or Claude".

This doesn't appear to be a load-bearing point in your post, but would still be good to update the language to be more precise.

How to talk about reasons why AGI might not be near?

Gordon Seidoh Worley1y10

Ah I see. I have to admit, I write a lot of my comments between things and I missed that the context of the post could cause my words to be interpreted this way. These days I'm often in executive mode rather than scholar mode and miss nuance if it's not clearly highlighted, hence my misunderstanding, but also reflects where I'm coming from with this answer!

How to talk about reasons why AGI might not be near?

Gordon Seidoh Worley1y10

I left a comment over in the other thread, but I think Joachim misunderstands my position.

In the above comment I've taken for granted that there's a non-trivial possibility that AGI is near, so I'm not arguing we should say that "AGI is near" regardless of whether it is or not, because we don't know if it is or not, we only have our guesses about it, and so long as there's a non-trivial chance that AGI is near, I think that's the more important message to communicate.

Overall it would be better if we can communicate something like "AGI is probably near", bu... (read more)

2orthonormal1y

I agree with "When you say 'there's a good chance AGI is near', the general public will hear 'AGI is near'". However, the general public isn't everyone, and the people who can distinguish between the two claims are the most important to reach (per capita, and possibly in sum). So we'll do better by saying what we actually believe, while taking into account that some audiences will round probabilities off (and seeking ways to be rounded closer to the truth while still communicating accurately to anyone who does understand probabilistic claims). The marginal gain by rounding ourselves off at the start isn't worth the marginal loss by looking transparently overconfident to those who can tell the difference.

2Joachim Bartosik1y

I'm replying only here because spreading discussion over multiple threads makes it harder to follow. You left a reply on a question asking how to communicate about reasons why AGI might not be near. The question refers to costs of "the community" thinking that AI closer than it really is as a reason to communicate about reasons it might not be so close. So I understood the question as asking about communication with the community (my guess: of people seriously working and thinking about AI-safety-as-in-AI-not-killing-everyone). Where it's important to actually try to figure out truth. You replied (as I understand) that when we communicate to general public we can transmit only 1 idea that so we should communicate that AGI is near (if we assign not-very-low probability to that). I think the biggest problem I have with your posting "general public communication" as a reply to question asking about "community communication" pushes towards less clarity in the community, where I think clarity is important. I'm also not sold on the "you can communicate only one idea" thing but I mostly don't care to talk about it right now (it would be nice if someone else worked it out for me but now I don't have capacity to do it myself).

How to talk about reasons why AGI might not be near?

Answer by Gordon Seidoh WorleySep 17, 20231-2

From a broad policy perspective, it can be tricky to know what to communicate. I think it helps if we think a bit more about the effects of our communication and a bit less about correctly conveying our level of credence in particular claims. Let me explain.

If we communicate the simple idea that AGI is near then it pushes people to work on safety projects that would be good to work on even if AGI is not near while paying some costs in terms of reputation, mental health, and personal wealth.

If we communicate the simple idea that AGI is not near then people ... (read more)

2orthonormal1y

I reached this via Joachim pointing it out as an example of someone urging epistemic defection around AI alignment, and I have to agree with him there. I think the higher difficulty posed by communicating "we think there's a substantial probability that AGI happens in the next 10 years" vs "AGI is near" is worth it even from a PR perspective, because pretending you know the day and the hour smells like bullshit to the most important people who need convincing that AI alignment is nontrivial.