Cognitive Biases in Large Language Models

Jan

Late 2021 MIRI Conversations: AMA / Discussion

I see how my above question seems naive. Maybe it is. But if one potential answer to the alignment problem lies in the way our brains work, maybe we should try to understand that better, instead of (or in addition to) letting a machine figure it out for us through some kind of "value learning". (Copied from my answer to AprilSR:) I stumbled across two papers from a few years ago by a psychologist, Mark Muraven, who thinks that the way humans deal with conflicting goals could be important for AI alignment (https://arxiv.org/abs/1701.01487 and h ttps://arxiv.... (read more)

4Alex Turner3y

I think your question is excellent. "How does the single existing kind of generally intelligent agent form its values?" is one of the most important and neglected questions in all of alignment, I think.

Late 2021 MIRI Conversations: AMA / Discussion

Karl von Wendt3y110

"my fellow humans get nice stuff" happens to be the weird unpredictable desire that I ended up with at the equilibrium of reflection on the weird unpredictable godshatter that ended up inside me

This may not be what evolution had "in mind" when it created us. But couldn't we copy something like this into a machine so that it "thinks" of us (and our descendants) as its "fellow humans" who should "get nice stuff"? I understand that we don't know how to do that yet. But the fact that Eliezer has some kind of "don't destroy the world from a fellow human perspec... (read more)

ESRogs3y60

I think we (mostly) all agree that we want to somehow encode human values into AGIs. That's not a new idea. The devil is in the details.

Late 2021 MIRI Conversations: AMA / Discussion

Karl von Wendt3y60

A question for Eliezer: If you were superintelligent, would you destroy the world? If not, why not?

If your answer is "yes" and the same would be true for me and everyone else for some reason I don't understand, then we're probably doomed. If it is "no" (or even just "maybe"), then there must be something about the way we humans think that would prevent world destruction even if one of us were ultra-powerful. If we can understand that and transfer it to an AGI, we should be able to prevent destruction, right?

Eliezer Yudkowsky3y160

I would "destroy the world" from the perspective of natural selection in the sense that I would transform it in many ways, none of which were making lots of copies of my DNA, or the information in it, or even having tons of kids half resembling my old biological self.

From the perspective of my highly similar fellow humans with whom I evolved in context, they'd get nice stuff, because "my fellow humans get nice stuff" happens to be the weird unpredictable desire that I ended up with at the equilibrium of reflection on the weird unpredictable godshatter that... (read more)

AI ALIGNMENT FORUM
AF

All of Karl von Wendt's Comments + Replies