User Comment Replies — AI Alignment Forum

Response to Katja Grace's AI x-risk counterarguments

This problem of human irrelevancy seems somewhat orthogonal to the alignment problem; even a maximally aligned AI will strip humans of their agency, as it knows best. Making the AI value human agency will not be enough; humans suck enough that the other objectives will override the agency penalty most of the time, especially in important matters.

1Erik Jenner2y

I agree that aligned AI could also make humans irrelevant, but not sure how that's related to my point. Paraphrasing what I was saying: given that AI makes humans less relevant, unaligned AI would be bad even if no single AI system can take over the world. Whether or not aligned AI would also make humans irrelevant just doesn't seem important for that argument, but maybe I'm misunderstanding what you're saying.

Humans provide an untapped wealth of evidence about alignment

Luna Rimar3y-38

I am skeptical of your premise. I know of zero humans who terminally value “diamonds” as defined by their chemical constitution.

Indeed, diamonds are widely considered to be a fake scarce good, elevated to their current position by deceptive marketing and monopolistic practices. So this seems more like a case study of how humans’ desires to own scarce symbols of wealth have been manipulated to lead to an outcome that is misaligned with the original objective.

2Alex Turner3y

I just introspected. I am weakly attracted to the idea of acquiring diamonds. I therefore know of at least one human who values diamonds. I never claimed that humans are hardwired to value diamonds. I pointed out that some people do value diamonds, and pointed out that true facts have guaranteed-to-exist explanations. If you're interested in building a mind which values diamonds, first ask why some already-existing minds value diamonds.

3Logan Riggs Smith3y

I believe the diamond example is true, but not the best example to use. I bet it was mentioned because of the arbital article linked in the post. The premise isn't dependent on diamonds being terminal goals; it could easily be about valuing real life people or dogs or nature or real life anything. Writing an unbounded program that values real world objects is an open-problem in alignment; yet humans are a bounded program that values real world objects all of the time, millions of times a day. The post argues that focusing on the causal explanations behind humans growing values is way more informative than other sources of information, because humans exist in reality and anchoring your thoughts to reality is more informative about reality.

AI ALIGNMENT FORUM
AF

All of Rudi C's Comments + Replies