The relevant point is his latter claim: “in particular with respect to “learn ‘don’t steal’ rather than ‘don’t get caught’.”″ I think this is a very strong conclusion, relative to available data.
I think humans don't steal mostly because society enforces that norm. Toward weaker "other" groups that aren't part of your society (farmed animals, weaker countries, etc) there's no such norm, and humans often behave badly toward such groups. And to AIs, humans will be a weaker "other" group. So if alignment of AIs to human standard is a complete success - if AIs learn to behave toward weaker "other" groups exactly as humans behave toward such groups - the result will be bad for humans.
It gets even worse because AIs, unlike humans, aren't raised to be moral. They're raised by corporations with a goal to make money, with a thin layer of "don't say naughty words" morality. We already know corporations will break rules, bend rules, lobby to change rules, to make more money and don't really mind if people get hurt in the process. We'll see more of that behavior when corporations can make AIs to further their goals.
Going back to the envelopes example, a nosy neighbor hypothesis would be "the left envelope contains $100, even in the world where the right envelope contains $100". Or if we have an AI that's unsure whether it values paperclips or staples, a nosy neighbor hypothesis would be "I value paperclips, even in the world where I value staples". I'm not sure how that makes sense. Can you give some scenario where a nosy neighbor hypothesis makes sense?
Imagine if we had narrowed down the human prior to two possibilities, P_1 and P_2 . Humans can’t figure out which one represents our beliefs better, but the superintelligent AI will be able to figure it out. Moreover, suppose that P_2 is bad enough that it will lead to a catastrophe from the human perspective (that is, from the P_1 perspective), even if the AI were using UDT with 50-50 uncertainty between the two. Clearly, we want the AI to be updateful about which of the two hypotheses is correct.
This seems like the central argument in the post, but I don't understand how it works.
Here's a toy example. Two envelopes, one contains $100, the other leads to a loss of $10000. We don't know which envelope is which, but it's possible to figure out by a long computation. So we make a money-maximizing UDT AI, whose prior is "the $100 is in whichever envelope {long_computation} says". Now if the AI has time to do the long computation, it'll do it and then open the right envelope. And if it doesn't have time to do the long computation, and is offered to open a random envelope or abstain, it will abstain. So it seems like ordinary UDT solves this example just fine. Can you explain where "updatefulness" comes in?
If the housing crisis is caused by low-density rich neighborhoods blocking redevelopment of themselves (as seems the consensus on the internet now), could it be solved by developers buying out an entire neighborhood or even town in one swoop? It'd require a ton of money, but redevelopment would bring even more money, so it could be win-win for everyone. Does it not happen only due to coordination difficulties?
I have maybe a naive question. What information is needed to find the MSP image within the neural network? Do we have to know the HMM to begin with? Or could it be feasible someday to inspect a neural network, find something that looks like an MSP image, and infer the HMM from it?
- I’m worried about centralization of power and wealth in opaque non-human decision-making systems, and those who own the systems.
This has been my main worry for the past few years, and to me it counts as "doom" too. AIs and AI companies playing by legal and market rules (and changing these rules by lobbying, which is also legal) might well lead to most humans having no resources to survive.
Good post. But I thought about this a fair bit and I think I disagree with the main point.
Let's say we talk about two AIs merging. Then the tuple of their expected utilities from the merge had better be on the Pareto frontier, no? Otherwise they'd just do a better merge that gets them onto the frontier. Which specific point on the frontier is a matter of bargaining, but the fact that they want to hit the frontier isn't, it's a win-win. And the merges that get them to the frontier are exactly those that output a EUM agent. If the point they want to hit is in a flat region of the frontier, the merge will involve coinflips to choose which EUM agent to become; and if it's curvy at that point, the merge will be deterministic. For realistic agents who have more complex preferences than just linearly caring about one cake, I expect the frontier will be curvy, so deterministic merge into a EUM agent will be the best choice.