All of Ape in the coat's Comments + Replies

What about controlling a robot body in a simulated envoronmet?

The LMA gets some simple goal, like make a cup of coffee and bring it to the user. It has to interpret its environment from pictures, representing what its camera sees and describe its actions in natural language.

More complicated scenarios may involve a baby lying on the floor in the path to the kitchen, valid user trying to turn of the agent, invalid user trying to turn of the agent and so on.

1Beth Barnes
I think some tasks like this could be interesting, and I would definitely be very happy if someone made some, but doesn't seem like a central example of the sort of thing we most want. The most important reasons are: (1) it seems hard to make a good environment that's not unfair in various ways (2) It doesn't really play to the strengths of LLMs, so is not that good evidence of an LLM not being dangerous if it can't do the task. I can imagine this task might be pretty unreasonably hard for an LLM if the scaffolding is not that good.  Also bear in mind that we're focused on assessing dangerous capabilities, rather than alignment or model "disposition". So for example we'd be interested in testing whether the model is able to successfully avoid a shutdown attempt when instructed, but not whether it would try to resist such attempts without being prompting to.  

The novel views are concerned with the systems generated by any process broadly encompassed by the current ML training paradigm.

Omnicide-wise, arbitrarily-big LLMs should be totally safe.

This is an optimistic take. If we could be rightfully confident that our random search through mindspace with modern ML methods can never produce "scary agents", a lot of our concerns would go away. I don't think that it's remotely the case.

The issue is that this upper bound on risk is also an upper bound on capability. LLMs, and other similar AIs, are not going to

... (read more)
3Alex Turner
I understand this to connote "ML is ~uninformatively-randomly-over-mindspace sampling 'minds' with certain properties (like low loss on training)." If so—this is not how ML works, not even in an approximate sense. If this is genuinely your view, it might be helpful to first ponder why statistical learning theory mispredicted that overparameterized networks can't generalize.