All of M. Y. Zuo's Comments + Replies

Thanks, that is more luxurious than I imagined, so families should have no difficulty finding a large enough place. 

I'm not familiar with London housing prices. Is it possible to affordably rent or mortgage a decent 2 bedroom condo within a 5 minute walk of the offices with your compensation package? (By affordable I mean less than 1/3 of total comp spent on housing, stretching to 1/2 if comp is unusually high.)

0Andrew
Whilst you specified a 5 minute walk on your criteria I think you should also consider the fact that there is generally very good public transport in London where prices would be cheaper a bit further out. The location is close to several rail stations (Blackfriars, Farringdon), the new Crossrail route (farringdon) opening in a few weeks and tube lines (central, circle, district). With the new cross-rail route it should be possible to access zone 4 locations in 20 minutes.
6Rohin Shah
Almost certainly, e.g. this one meets those criteria and I'm pretty sure costs < 1/3 of total comp (before taxes), though I don't actually know what typical total comp is. You would find significantly cheaper places if you were willing to compromise on commute, since DeepMind is right in the center of London.

Could the hypothetical AGI be developed in a simulated environment and trained with proportionally lower consequences?

2Steve Byrnes
I'm all for doing lots of testing in simulated environments, but the real world is a whole lot bigger and more open and different than any simulation. Goals / motivations developed in a simulated environment might or might not transfer to the real world in the way you, the designer, were expecting. So, maybe, but for now I would call that "an intriguing research direction" rather than "a solution".

Interesting post!

Query: How do you define ‘feasibly’? as in ‘Incentive landscapes that can’t feasibly be induced by a reward function’

As from my perspective all possible incentive landscapes can be induced by reward, with sufficient time and energy. Of course a large set of these are beyond the capacity of present human civilization.

0Steve Byrnes
Right, the word "feasibly" is referring to the bullet point that starts "Maybe “Reward is connected to the abstract concept of ‘I want to be able to sing well’?”". Here's a little toy example we can run with: teaching an AGI "don't kill all humans". So there are three approaches to reward design that I can think of, and none of them seem to offer a feasible way to do this (at least, not with currently-known techniques): 1. The agent learns by experiencing the reward. This doesn't work for "don't kill all humans" because when the reward happens it's too late. 2. The reward calculator is sophisticated enough to understand what the agent is thinking, and issue rewards proportionate to the probability that the current thoughts and plans will eventually lead to the result-in-question happening. So the AGI thinks "hmm, maybe I'll blow up the sun", and the reward calculator recognizes that merely thinking that thought just now incrementally increased the probability that the AGI will kill all humans, and so it issues a negative reward. This is tricky because the reward calculator needs to have an intelligent understanding of the world, and of the AGI's thoughts. So basically the reward calculator is itself an AGI, and now we need to figure out its rewards. I'm personally quite pessimistic about approaches that involve towers-of-AGIs-supervising-other-AGIs, for reasons in section 3.2 here, although other people would disagree with me on that (partly because they are assuming different AGI development paths and architectures than I am). 3. Same as above, but instead of a separate reward calculator estimating the probability that a thought or plan will lead to the result-in-question, we allow the AGI itself to do that estimation, by flagging a concept in its world-model called "I will kill all humans", and marking it as "very bad and important" somehow. (The inspiration here is a human who somehow winds up with the strong desire "I want to get out of debt". Having assigne