All of NickGabs's Comments + Replies

I think this is a very good critique of OpenAI's plan.  However, to steelman the plan, I think you could argue that advanced language models will be sufficiently "generally intelligent" that they won't need very specialized feedback  in order to produce high quality alignment research.  As e. g. Nate Soares has pointed out repeatedly, the case of humans suggests that in some cases, a system's capabilities can generalize way past the kinds of problems that it was explicitly trained to do.  If we assume that sufficiently powerful language... (read more)

1Alex Flint
Well even if language models do generalize beyond their training domain in the way that humans can, you still need to be in contact with a given problem in order to solve that problem. Suppose I take a very intelligent human and ask them to become a world expert at some game X, but I don't actually tell them the rules of game X nor give them any way of playing out game X. No matter how intelligent the person is, they still need some information about what the game consists of. Now suppose that you have this intelligent person write essays about how one ought to play game X, and have their essays assessed by other humans who have some familiarity with game X but not a clear understanding. It is not impossible that this could work, but it does seem unlikely. There are a lot of levels of indirection stacked against this working. So overall I'm not saying that language models can't be generally intelligent, I'm saying that a generally intelligent entity still needs to be in a tight feedback loop with the problem itself (whatever that is).