Towards Evaluating AI Systems for Moral Status Using Self-Reports

Robbo

24 Towards Evaluating AI Systems for Moral Status Using Self-Reports

by Ethan Perez, Robbo

16th Nov 2023

2 min read

3

24

This is a linkpost for https://arxiv.org/abs/2311.08576

AI Rights / WelfareConsciousnessEthics & MoralityLanguage Models (LLMs)AI

Frontpage

New Comment

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 1:32 PM

[-]Charlie Steiner2y42

What you are doing is training the AI to have an accurate model of itself, used with language like "I" and "you". You can use your brain to figure out what will happen if you ask "are you conscious?" without having previously trained in any position on similarly nebulous questions. Training text was written overwhelmingly by conscious things, so maybe it says yes because that's so favored by the training distribution. Or maybe you trained it to answer "you" questions as about nonfiction computer hardware and it makes the association that nonfiction computer hardware is rarely conscious.

Basically, I don't think you can start out confused about consciousness and cheat by "just asking it." You'll still be confused about consciousness and the answer won't be useful.

I'm worried this is going to lead, either directly or indirectly, to training foundation models to have situational awareness, which we shouldn't be doing.

And perhaps you should be worried that having an accurate model of onesself, associated with language like "I" and "you", is in fact one of the ingredients in human consciousness, and maybe we shouldn't be making AIs more conscious.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

24

Towards Evaluating AI Systems for Moral Status Using Self-Reports

24