All of gsastry's Comments + Replies

I agree with both your claims, but maybe with less confidence than you (I also agree with DanielFilan's point below).

Here are two places I can imagine MIRI's intuitions here coming from, and I'm interested in your thoughts on them:

(1) The "idealized reasoner is analogous to a Carnot engine" argument. It seems like you think advanced AI systems will be importantly disanalogous to this idea, and that's not obvious to me.

(2) 'We might care about expected utility maximization / theoretical rationality because there is an impo... (read more)

4Rohin Shah
(1) I am unsure whether there exists an idealized reasoner analogous to a Carnot engine (see Realism about rationality). Even if such a reasoner exists, it seems unlikely that we will a) figure out what it is, b) understand it in sufficient depth, and c) successfully use it to understand and improve ML techniques, before we get powerful AI systems through other means. Under short timelines, this cuts particularly deeply, because a) there's less time to do all of these things and b) it's more likely that advanced AI is built out of "messy" deep learning systems that seem less amenable to this sort of theoretical understanding. (2) I certainly agree that all else equal, advanced agents should act closer to ideal agents. (Assuming there is such a thing as an ideal agent.) I also agree that advanced AI should be less susceptible to money pumps, from which I learn that their "preferences" (i.e. world states that they work to achieve) are transitive. I'm also on board that more advanced AI systems are more likely to be described by some utility function that they are maximizing the expected utility of, per the VNM theorem. I don't agree that the utility function must be simple, or that the AI must be internally reasoning by computing the expected utility over all actions and then choosing the one that's highest. I would be extremely surprised if we built powerful AI such that when we say the English sentence "make paperclips" it acts in accordance with the utility function U(universe history) = number of paperclips in the last state of the universe history. I would be very surprised if we built powerful AI such that we hardcode in the above utility function and then design the AI to maximize its expected value.

I'm not sure what it means for this work to "not apply" to particular systems. It seems like the claim is that decision theory is a way to understand AI systems in general and reason about what they will do, just as we use other theoretical tools to understand current ML systems. Can you spell this out a bit more? (Note that I'm also not really sure what it means for decision theory to apply to all AI systems: I can imagine kludgy systems where it seems really hard in some sense to understand their behavior with decision theory, but I'm not confident at all)

I claim (with some confidence) that Updateless Decision Theory and Logical Induction don't have much to do with understanding AlphaGo or OpenAI Five, and you are better off understanding those systems using standard AI/ML thinking.

I further claim (with less confidence) that in a similar way, at the time that we build our first powerful AI systems, the results of Agent Foundations research at that time won't have much to do with understanding those powerful AI systems.

Does that explain what it means? And if so, do you disagree with either of the claims?