User Comment Replies — AI Alignment Forum

What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

Hey! Absolutely, I think a lot of this makes sense. I assume you were meaning this paragraph with the Reverse Engineering Roles and Norms paragraph:

I want to be clear that I do not mean AI systems should go off and philosophize on their own until they implement the perfect moral theory without human consent. Rather, our goal should be to design them in such a way that this will be a interactive, collaborative process, so that we continue to have autonomy over our civilizational future^[10].

For both points here, I guess I was getting more at this question by... (read more)

What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

phillchris3y10

Great post! I'm curious if you could elaborate on when you would feel comfortable making an agent to make some kind of "enlightened" decision, as opposed to one based more on "mere compliance"? Especially given an AI system that is perhaps not very interpretable, or operates on very high-stakes applications, what sort of certificate / guarantee / piece of reasoning would you want from a system to allow it to enact fundamental social changes? The nice thing about "mere compliance" is there are benchmarks for 'right' and 'wrong' decisions. But here I would e... (read more)

2Xuan (Tan Zhi Xuan)3y

I hope the above is at least partially addressed by the last paragraph of the section on Reverse Engineering Roles and Norms! I agree with the worry, and to address it I think we could design systems that mostly just propose revisions or extrapolations to our current rules, or highlight inconsistencies among them (e.g. conflicting laws), thereby aiding a collective-intelligence-like democratic process of updating our rules and norms (of the form described in the Collective Governance section), where AI systems facilitate but do not enact normative change. Note that if AI systems represent uncertainty about the "correct" norms, this will often lead them to make queries to humans about how to extend/update the norms (a la active learning), instead of immediately acting under its best estimate of the extended norms. This could be further augmented by a meta-norm of (generally) requiring consent / approval from the relevant human decision-making body before revising or acting under new rules. I'm not suggesting that AI systems should simply do what society does! Rather, the point of the contractualist framing is that AI systems should be aligned (in the limit) to what society would agree to after rational / mutually justifiable collective deliberation. Current democratic systems approximate this ideal to a very rough degree, and I guess I hold out hope that under the right kinds of epistemic and social conditions (freedom of expression, equality of interlocutors, non-deluded thinking), the kind of "moral progress" we instinctively view as desirable will emerge from that form of collective deliberation. So my hope is that rather than specify in great degree what the contents of "superior moral theory" might look like, all we need to align AI systems with is the underlying meta-ethical framework that enables moral change. See Anderson on How Common Sense Can Be Self-Critical for a good discussion of what I think this meta-ethical framework looks like.

Worlds Where Iterative Design Fails

phillchris3y21

I wonder if implications for this kind of reasoning go beyond AI: indeed, you mention the incentive structure for AI as just being a special case of failing to incentivize people properly (e.g. the software executive), and the only difference being AI occurring at a scale which has the potential to drive extinction. But even in this respect, AI doesn't really seem unique: take the economic system as a whole, and "green" metrics, as a way to stave off catastrophic climate change. Firms, with the power to extinguish human life through slow processes like gra... (read more)

AI ALIGNMENT FORUM
AF

All of phillchris's Comments + Replies