Cool work and results!
Is there a reason you didn't include GPT4 among the models that you test (apart from cost)? If the results would not be as strong for GPT4, would you find that to be evidence that this issue is less important than you originally thought?
As we have seen in the former post, the latter question is confusing (and maybe confused) because the value change itself implies a change of the evaluative framework.
I’m not sure which part of the previous post you’re referring to actually – if you could point me to the relevant section that would be great!
What is more, the change that the population undergoes is shaped in such a way that it tends towards making the values more predictable.
(...)
As a result, a firms’ steering power will specifically tend towards making the predicted behaviour easier to predict, because it is this predictability that the firm is able to exploit for profit (e.g., via increases in advertisement revenues).
A small misconception that lies at the heart of this section is that AI systems (and specifically recommenders) will try to make people more predictable. This is not necess...
Thank you for your comments. There are various things you pointed out which I think are good criticisms, and which we will address:
- Most prominently, after looking more into standard usage of the word "scheming" in the alignment literature, I agree with you that AFAICT it only appears in the context of deceptive alignment (which our paper is not about). In particular, I seemed to remember people using it ~interchangeably with “strategic deception”, which we think our paper gives clear examples of, but that seems simply incorrect.
- It was a straightf
... (read more)