All of ErickBall's Comments + Replies

Suppose [...] you’ve got this AI system with this really, really good intelligence, which maybe we’ll call it a world model or just general intelligence. And this intelligence can take in any utility function, and optimize it, and you plug in the incorrect utility function, and catastrophe happens.

I've seen various people make the argument that this is not how AI works and it's not how AGI will work--it's basically the old "tool AI" vs "agent AI" debate. But I think the only reason current AI doesn&apos... (read more)

1Isnasene
Yeah, that statement is wrong. I was trying to make a more subtle point about how an AI that learns long-term planning on a shorter time-frame is not necessarily going to be able to generalize to longer time-frames (but in the context of superintelligent AIs capable of doing human leve tasks, I do think it will generalize--so that point is kind of irrelevant). I agree with Rohin's response.
3Rohin Shah
I am not arguing that we'll end up building tool AI; I do think it will be agent-like. At a high level, I'm arguing that the intelligence and agentiness will increase continuously over time, and as we notice the resulting (non-existential) problems we'll fix them, or start over. I agree with your point that long-term planning will develop even with a bunch of heuristics.

This may be a dumb question, but how can you asymptotically guarantee human-level intelligence when the world-models have bounded computation time, and the human is a "computable function" that has no such limit? Is it because the number of Turing machines is infinite?

1michaelcohen
Not a dumb question; bounded computation time here means bounded computation time per episode, so really it's linear computation time.

My concern is that since CDT is not reflectively stable, it may have incentives to create non-CDT agents in order to fulfill instrumental goals.

2Wei Dai
If I understand correctly, it's actually updateless within an episode, and that's the only thing it cares about so I don't see how it would not be reflectively stable. Plus, even if it had an incentive to create a non-CDT agent, it would have to do that by outputting some message to the operator, and the operator wouldn't have the ability to create a non-CDT agent without leaving the room which would end the episode. (I guess it could hack the operator's mind and create a non-CDT agent within, but at that point it might as well just make the operator give it max rewards.)