Re whether messy goal-seekers can be schemers, you may address this in a different place (and if so forgive me, and I'd appreciate you pointing me to where), but I keep wondering what notion of scheming (or deception, etc.) we should be adopting when, in particular:
Related to my point above (and this quoted paragraph), a fundamental nuance here is the distinction between "accidental influence side effects" and "incentivized influence effects". I'm happy to answer more questions on this difference if it's not clear from the rest of my comment.
Thanks for clarifying; I agree it's important to be nuanced here!
I basically agree with what you say. I also want to say something like: whether to best count it as side effect or incentivized depends on what optimizer we're looking at/where you draw the boundary around the...
A small misconception that lies at the heart of this section is that AI systems (and specifically recommenders) will try to make people more predictable. This is not necessarily the case.
Yes, I'd agree (and didn't make this clear in the post, sorry) -- the pressure towards predictability comes from a combination of the logic of performative prediction AND the "economic logic" that provide the context in which these performative predictors are being used/applied. This is certainly an important thing to be clear about!
(Though it also can only give us s...
Agree! Examples abound. You can never escape your local ideological context - you can only try to find processes that have some hope at occasionally pumping into the bounds of your current ideology and press beyond it - no reliably receipt (just like there is no reliably receipt to make yourself notice your own blind spot) - but there is the hope for things that in expectation and intertemporally can help us with this.
Which poses a new problem (or clarifies the problem we're facing): we don't get to answer the question of value change legitimacy in a...
Yeah interesting point. I do see the pull of the argument. In particular the example seems well chosen -- where the general form seems to be something like: we can think of cases where our agent can be said to be better off (according to some reasonable standards/form some reasonable vantage point) if the agent can make themselves be committed to continue doing a thing/undergoing a change for at least a certain amount of time.
That said, I think there are also some problems with it. For example, I'm wary of reifying "I-as-in-CEV" more than what is war...
yes, sorry! I'm not making it super explicit, actually, but the point is that, if you read e.g. Paul or Callard's accounts of value change (via transformative experiences and via aspiration respectively), a large part of how they even set up their inquiries is with respect to the question whether value change is irrational or not (or what problem value change poses to rational agency). The rationality problem comes up bc it's unclear from what vantage point one should evaluate the rationality (i.e. the "keeping with what expected utiltiy theory tells you t...
The process that invents democracy is part of some telotect, but is it part of a telophore? Or is the telophore only reached when democracy is implemented?
Musing about how (maybe) certain telopheme impose constraints on the structure (logic) of their corresonding telophores and telotects. Eg democracy, freedom, autonomy, justice, corrigibility, rationality, ... (thought plausibly you'd not want to count (some of) those examples as telophemes in the first place?)
Good point! We are planning to gauge time preferences among the participants and fix slots then. What is maybe most relevant, we are intending to accommodate all time zones. (We have been doing this with PIBBSS fellows as well, so I am pretty confident we will be able to find time slots that work pretty well across the globe.)
Here is another interpretation of what can cause a lack of robustness to scaling down:
(Maybe this is what you have in mind when you talk about single-single alignment not (necessaeraily) scaling to multi-multi alignment - but I am not sure that is the case, and even if it ism I feel pulled to stating it again more as I don't think it comes out as clearly as I would want it to in the original post.)
Taking the example of an "alignment strategy [that makes] the AI find the preferences of values and humans, and then pursu[e] that", robustness to scaling ...
Does it seem like I'm missing something important if I say "Thing = Nexus" gives a "functional" explanation of what thing is, i.e. it serves the function of being an "inductive nexus of reference". This is not a foundational/physicalist/mechanistic explanation, but it is very much a sort of explanation that I can imagine being useful in some cases/for some purposes.
I'm suggesting this as a possibly different angle at "what sort of explanation is Thing=Nexus, and why is it plausibly not fraught despite it's somewhat-circularity?" It seems like it maps on to... (read more)