On a few different views, understanding the computation done by neural networks is crucial to building neural networks that constitute human-level artificial intelligence that doesn’t destroy all value in the universe. Given that many people are trying to build neural networks that constitute artificial general intelligence, it seems important to understand the computation in cutting-edge neural networks, and we basically do not.

So, how should we go from here to there? One way is to try hard to think about understanding, until you understand understanding well enough to reliably build understandable AGI. But that seems hard and abstract. A better path would be something more concrete.

Therefore, I set this challenge: know everything that the best go bot knows about go. At the moment, the best publicly available bot is KataGo, if you’re at DeepMind or OpenAI and have access to a better go bot, I guess you should use that instead. If you think those bots are too hard to understand, you’re allowed to make your own easier-to-understand bot, as long as it’s the best.

What constitutes success?

  • You have to be able to know literally everything that the best go bot that you have access to knows about go.
  • It has to be applicable to the current best go bot (or a bot that is essentially as good - e.g. you’re allowed to pick one of the versions of KataGo whose elo is statistically hard-to-distinguish from the best version), not the best go bot as of one year ago.
    • That being said, I think you get a ‘silver medal’ if you understand any go bot that was the best at some point from today on.

Why do I think this is a good challenge?

  • To understand these bots, you need to understand planning behaviour, not just pick up on various visual detectors.
  • In order to solve this challenge, you need to actually understand what it means for models to know something.
  • There’s a time limit: your understanding has to keep up with the pace of AI development.
  • We already know some things about these bots based on how they play and evaluate positions, but obviously not everything.
  • We have some theory about go: e.g. we know that certain symmetries exist, we understand optimal play in the late endgame, we have some neat analysis techniques.
  • I would like to play go as well as the best go bot. Or at least to learn some things from it.

Corollaries of success (non-exhaustive):

  • You should be able to answer questions like “what will this bot do if someone plays mimic go against it” without actually literally checking that during play. More generally, you should know how the bot will respond to novel counter strategies.
  • You should be able to write a computer program anew that plays go just like that go bot, without copying over all the numbers.

Drawbacks of success:

  • You might learn how to build a highly intelligent and capable AI in a way that does not require deep learning. In this case, please do not tell the wider world or do it yourself.
  • It becomes harder to check if professional human go players are cheating by using AI.

Related work:

A conversation with Nate Soares on a related topic probably helped inspire this post. Please don’t blame him if it’s dumb tho.

New Comment
34 comments, sorted by Click to highlight new comments since:

I think there's some communication failure where people are very skeptical of this for reasons that they think are obvious given what they're saying, but which are not obvious to me. Can people tell me which subset of the below claims they agree with, if any? Also if you come up with slight variants that you agree with that would be appreciated.

  1. It is approximately impossible to succeed at this challenge.
  2. It is possible to be confident that advanced AGI systems will not pose an existential threat without being able to succeed at this challenge.
  3. It is not obvious what it means to succeed at this challenge.
  4. It will probably not be obvious what it means to succeed at this challenge at any point in the next 10 years, even if a bunch of people try to work on it.
  5. We do not currently know what it means for a go bot to know something in operational terms.
  6. At no point in the next 10 years could one be confident that one knew everything a go bot knew, because we won't be confident about what it means for a go bot to know something.
  7. You couldn't know everything a go bot knows without essentially being that go bot.

[EDIT: 8. One should not issue a challenge to know everything a go bot knows without having a good definition of what it means for a go bot to know things.]

My take is:

  • I think making this post was a good idea. I'm personally interested in deconfusing the topic of universality (which basically should capture what "learning everything the model knows"), and you brought up a good "simple" example to try to build intuition on.
  • What I would call your mistake is a mostly 8, but a bit of the related ones (so 3 and 4?). Phrasing it as "can we do that" is a mistake in my opinion because the topic is very confused (as shown by the comments). On the other hand, I think asking the question of what it would mean is a very exciting problem. It also gives a more concrete form to the problem of deconfusing universality, which is important AFAIK to Paul's approaches to alignment.

This sounds like a great goal, if you mean "know" in a lazy sense; I'm imagining a question-answering system that will correctly explain any game, move, position, or principle as the bot understands it. I don't believe I could know all at once everything that a good bot knows about go. That's too much knowledge.

That's basically what Paul's universality (my distillation post for another angle) is aiming for: having a question-answering overseer which can tell you everything you want to know about what the system knows and what it will do. You still probably need to be able to ask a relevant question, which I think is what you're pointing at.

Maybe it nearly suffices to get a go professional to know everything about go that the bot does? I bet they could.

What does that mean though? If you give the go professional a massive transcript of the bot knowledge, it's probably unusable. I think what the go professional gives you is the knowledge of where to look/what to ask for/what to search. 

Or maybe it means we train the professional in the principles and heuristics that the bot knows. The question is if we can compress the bot's knowledge into, say, a 1-year training program for professionals.

There are reasons to be optimistic: We can discard information that isn't knowledge (lossy compression). And we can teach the professional in human concepts (lossless compression).

Good point!

I think that it isn't clear what constitutes "fully understanding" an algorithm. 

Say you pick something fairly simple, like a floating point squareroot algorithm. What does it take to fully understand that. 

You have to know what a squareroot is. Do you have to understand the maths behind Newton raphson iteration if the algorithm uses that? All the mathematical derivations, or just taking it as a mathematical fact that it works. Do you have to understand all the proofs about convergence rates. Or can you just go "yeah, 5 iterations seems to be enough in practice". Do you have to understand how floating point numbers are stored in memory? Including all the special cases like NaN which your algorithm hopefully won't be given? Do you have to keep track of how the starting guess is made, how the rounding is done. Do you have to be able to calculate the exact floating point value the algorithm would give, taking into account all the rounding errors. Answering in binary or decimal? 

Is brute force minmax search easy to understand. You might be able to easily implement the algorithm, but you still don't know which moves it will make. In general, for any algorithm that takes a lot of compute, humans won't be able to work out what it will do without very slowly imitating a computer. There are some algorithms we can prove theorems about. But it isn't clear which theorems we need to prove to get "full understanding" 

Another obstacle to full understanding is memory. Suppose your go bot has memorized a huge list of "if you are in such and such situation move here" type rules. You can understand how gradient descent would generate good rules in the abstract. You have inspected a few rules in detail. But there are far too many rules for a human to consider them all. And the rules depend on a choice of random seed.  

Corollaries of success (non-exhaustive):

  • You should be able to answer questions like “what will this bot do if someone plays mimic go against it” without actually literally checking that during play. More generally, you should know how the bot will respond to novel counter strategies

There is not in general a way to compute what an algorithm does without running it. Some algorithms are going about the problem in a deliberately slow way. However if we assume that the go algorithm has no massive known efficiency gains. (Ie no algorithm that computes the same answer using a millionth of the compute) And that the algorithm is far too compute hungry for humans doing it manually. Then it follows that humans won't be able to work out exactly what the algorithm will do.

You should be able to write a computer program anew that plays go just like that go bot, without copying over all the numbers.

Being able to understand the algorithm well enough to program it for the first time, not just blindly reciting code. An ambiguous but achievable goal.

Suppose a bunch of people coded another Alpha go like system. The random seed is different. The layer widths are different. The learning rate is slightly different. Its trained with different batch size, for a different amount of iterations on a different database of stored games. It plays about as well. In many situations it makes a different move. The only way to get a go bot that plays exactly like alpha go is to copy everything including the random seed. This might have been picked based on lucky numbers or birthdays. You can't rederive from first principles what was never derived from first principles. You can only copy numbers across, or pick your own lucky numbers. Numbers like batch size aren't quite as pick your own, there are unreasonably small and large values, but there is still quite a lot of wiggle room. 

I think that it isn't clear what constitutes "fully understanding" an algorithm.

That seems right.

Another obstacle to full understanding is memory. Suppose your go bot has memorized a huge list of "if you are in such and such situation move here" type rules.

I think there's reason to believe that SGD doesn't do exactly this (nets that memorize random data have different learning curves than normal nets iirc?), and better reason to think it's possible to train a top go bot that doesn't do this.

There is not in general a way to compute what an algorithm does without running it.

Yes, but luckily you don't have to do this for all algorithms, just the best go bot. Also as mentioned, I think you probably get to use a computer program for help, as long as you've written that computer program.

I'd also be happy with an inexact description of what the bot will do in response to specified strategies that captured all the relevant details.

I'm a bit confused. What's the difference between "knowing everything that the best go bot knows" and "being able to play an even game against a go bot."? I think they're basically the same. It seems to me that you can't know everything the go bot knows without being able to beat any professional go player.

Or am I missing something?

You could plausibly play an even game against a go bot without knowing everything it knows.

Sure. But the question is can you know everything it knows and not be as good as it? That is, does understanding the go bot in your sense imply that you could play an even game against it?

[D]oes understanding the go bot in your sense imply that you could play an even game against it?

I imagine so. One complication is that it can do more computation than you.

But once you let it do more computation, then it doesn't have to know anything at all, right? Like, maybe the best go bot is, "Train an AlphaZero-like algorithm for a million years, and then use it to play."

I know more about go than that bot starts out knowing, but less than it will know after it does computation.

I wonder if, when you use the word "know", you mean some kind of distilled, compressed, easily explained knowledge?

Perhaps the bot knows different things at different times and your job is to figure out (a) what it always knows and (b) a way to quickly find out everything it knows at a certain point in time.

I think at this point you've pushed the word "know" to a point where it's not very well-defined; I'd encourage you to try to restate the original post while tabooing that word.

This seems particularly valuable because there are some versions of "know" for which the goal of knowing everything a complex model knows seems wildly unmanageable (for example, trying to convert a human athlete's ingrained instincts into a set of propositions). So before people start trying to do what you suggested, it'd be good to explain why it's actually a realistic target.

Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime:

  • it's not obvious to me that this is a realistic target, and I'd be surprised if it took fewer than 10 person-years to achieve.
  • I do think the knowledge should 'cover' all the athlete's ingrained instincts in your example, but I think the propositions are allowed to look like "it's a good idea to do x in case y".

it's not obvious to me that this is a realistic target

Perhaps I should instead have said: it'd be good to explain to people why this might be a useful/realistic target. Because if you need propositions that cover all the instincts, then it seems like you're basically asking for people to revive GOFAI.

(I'm being unusually critical of your post because it seems that a number of safety research agendas lately have become very reliant on highly optimistic expectations about progress on interpretability, so I want to make sure that people are forced to defend that assumption rather than starting an information cascade.)

OK, the parenthetical helped me understand where you're coming from. I think a re-write of this post should (in part) make clear that I think a massive heroic effort would be necessary to make this happen, but sometimes massive heroic efforts work, and I have no special private info that makes it seem more plausible than it looks a priori.

Actually, hmm. My thoughts are not really in equilibrium here.

(Also: such a rewrite would be a combination of 'what I really meant' and 'what the comments made me realize I should have really meant')

But once you let it do more computation, then it doesn't have to know anything at all, right? Like, maybe the best go bot is, "Train an AlphaZero-like algorithm for a million years, and then use it to play."

I would say that bot knows what the trained AlphaZero-like model knows.

Also it certainly knows the rules of go and the win condition.

As an additional reason for the importance of tabooing "know", note that I disagree with all three of your claims about what the model "knows" in this comment and its parent.

(The definition of "know" I'm using is something like "knowing X means possessing a mental model which corresponds fairly well to reality, from which X can be fairly easily extracted".)

On that definition, how does one train an AlphaZero-like algorithm without knowing the rules of the game and win condition?

The human knows the rules and the win condition. The optimisation algorithm doesn't, for the same reason that evolution doesn't "know" what dying is: neither are the types of entities to which you should ascribe knowledge.

Suppose you have a computer program that gets two neural networks, simulates a game of go between them, determines the winner, and uses the outcome to modify the neural networks. It seems to me that this program has a model of the 'go world', i.e. a simulator, and from that model you can fairly easily extract the rules and winning condition. Do you think that this is a model but not a mental model, or that it's too exact to count as a model, or something else?

I'd say that this is too simple and programmatic to be usefully described as a mental model. The amount of structure encoded in the computer program you describe is very small, compared with the amount of structure encoded in the neural networks themselves. (I agree that you can have arbitrarily simple models of very simple phenomena, but those aren't the types of models I'm interested in here. I care about models which have some level of flexibility and generality, otherwise you can come up with dumb counterexamples like rocks "knowing" the laws of physics.)

As another analogy: would you say that the quicksort algorithm "knows" how to sort lists? I wouldn't, because you can instead just say that the quicksort algorithm sorts lists, which conveys more information (because it avoids anthropomorphic implications). Similarly, the program you describe builds networks that are good at Go, and does so by making use of the rules of Go, but can't do the sort of additional processing with respect to those rules which would make me want to talk about its knowledge of Go.

In the parent, is your objection that the trained AlphaZero-like model plausibly knows nothing at all?

The trained AlphaZero model knows lots of things about Go, in a comparable way to how a dog knows lots of things about running.

But the algorithm that gives rise to that model can know arbitrarily few things. (After all, the laws of physics gave rise to us, but they know nothing at all.)

Ah, understood. I think this is basically covered by talking about what the go bot knows at various points in time, a la this comment - it seems pretty sensible to me to talk about knowledge as a property of the actual computation rather than the algorithm as a whole. But from your response there it seems that you think that this sense isn't really well-defined.

I'm not sure what you mean by "actual computation rather than the algorithm as a whole". I thought that I was talking about the knowledge of the trained model which actually does the "computation" of which move to play, and you were talking about the knowledge of the algorithm as a whole (i.e. the trained model plus the optimising bot).