User Comment Replies — AI Alignment Forum

Thank you for your comments. There are various things you pointed out which I think are good criticisms, and which we will address:

Most prominently, after looking more into standard usage of the word "scheming" in the alignment literature, I agree with you that AFAICT it only appears in the context of deceptive alignment (which our paper is not about). In particular, I seemed to remember people using it ~interchangeably with “strategic deception”, which we think our paper gives clear examples of, but that seems simply incorrect.
It was a straightf

Micah Carroll10mo20

Cool work and results!

Is there a reason you didn't include GPT4 among the models that you test (apart from cost)? If the results would not be as strong for GPT4, would you find that to be evidence that this issue is less important than you originally thought?

2. Premise two: Some cases of value change are (il)legitimate

Micah Carroll1y00

As we have seen in the former post, the latter question is confusing (and maybe confused) because the value change itself implies a change of the evaluative framework.

I’m not sure which part of the previous post you’re referring to actually – if you could point me to the relevant section that would be great!

1Nora_Ammann1y

yes, sorry! I'm not making it super explicit, actually, but the point is that, if you read e.g. Paul or Callard's accounts of value change (via transformative experiences and via aspiration respectively), a large part of how they even set up their inquiries is with respect to the question whether value change is irrational or not (or what problem value change poses to rational agency). The rationality problem comes up bc it's unclear from what vantage point one should evaluate the rationality (i.e. the "keeping with what expected utiltiy theory tells you to do") of the (decision to undergo) value change. From the vantage point of your past self, it's irrational; from the vantage point of your new self (be it as parent, vampire or jazz lover), it may be rational. Form what I can tell, Paul's framing of transformative experiences is closer to "yes, transformative experiences are irrational (or a-rational) but they still happen; I guess we have to just accept that as a 'glitch' in humans as rational agents"; while Callard's core contribution (in my eyes) is her case for why aspiration is a rational process of value development.

4. Risks from causing illegitimate value change (performative predictors)

Micah Carroll1y30

What is more, the change that the population undergoes is shaped in such a way that it tends towards making the values more predictable.
(...)
As a result, a firms’ steering power will specifically tend towards making the predicted behaviour easier to predict, because it is this predictability that the firm is able to exploit for profit (e.g., via increases in advertisement revenues).

A small misconception that lies at the heart of this section is that AI systems (and specifically recommenders) will try to make people more predictable. This is not necess... (read more)

1Nora_Ammann1y

Thanks for clarifying; I agree it's important to be nuanced here! I basically agree with what you say. I also want to say something like: whether to best count it as side effect or incentivized depends on what optimizer we're looking at/where you draw the boundary around the optimizer in question. I agree that a) at the moment, recommender systems are myopic in the way you describe, and the larger economic logic is where some of the pressure towards homogenization comes from (while other stuff is happening to, including humans pushing to some extent against that pressure, more or less successfully); b) at some limit, we might be worried about an AI systems becoming so powerful that its optimization arc comes to sufficiently large in scope that it's correctly understood as directly doign incentivized influence; but I also want to point out a third scanrios, c) where we should be worried about basically incentivized influence but not all of the causal force/optimization has to be enacted from wihtin the boundaries of a single/specific AI system, but where the economy as a whole is sufficiently integrated with and accelerated by advanced AI to justify the incentivized influence frame (e.g. a la ascended economy, fully automated tech company singularity). I think the general pattern here is basically one of "we continue to outsource ever more consequential decisions to advanced AI systems, without having figured out how to make these systems reliably (not) do any thing in particular".

1Nora_Ammann1y

Yes, I'd agree (and didn't make this clear in the post, sorry) -- the pressure towards predictability comes from a combination of the logic of performative prediction AND the "economic logic" that provide the context in which these performative predictors are being used/applied. This is certainly an important thing to be clear about! (Though it also can only give us so much reassurance: I think it's an extremely hard problem to find reliable ways for AI models to NOT be applied inside of the capitalist economic logic, if that's what we're hoping to do to avoid the legibilisation risk.)

AI ALIGNMENT FORUM
AF

All of micahcarroll's Comments + Replies