DeepMind claims that their recent agents learned tool use; in particular, they learned to build ramps to access inaccessible areas even though none of their training environments contained any inaccessible areas!

This is the result that impresses me most. However, I'm uncertain whether it's really true, or just cherry-picked.

Their results showreel depicts multiple instances of the AIs accessing inaccessible areas via ramps and other clever methods. However, in the actual paper page 51, they list all the hand-authored tasks they used to test the agents on, and mention that for some of them the agents did not get >0 reward. One of the tasks that the agents got 0 reward on is:

Tool Use Climb 1: The agent must use the objects to reach a higher floor with the target object.

So... what gives? They have video of the agents using objects to reach a higher floor containing the target object, multiple separate times. But then in the chart it says failed to achieve reward >0.

EDIT: Maybe the paper just has a typo? The blog post contains this image, which appears to show non-zero reward for the "tool use" task, zero-shot:


 

New Answer
New Comment

1 Answers sorted by

On page 28 they say:

Whilst some tasks do show successful ramp building (Figure 21), some hand-authored tasks require multiple ramps to be built to navigate up multiple floors which are inaccessible. In these tasks the agent fails.

From this, I'm guessing that it sometimes succeeds to build one ramp, but fails when the task requires building multiple ramps.

Nice, I missed that! Thanks!

1 comment, sorted by Click to highlight new comments since: Today at 8:59 AM

Also, what does the "1G 38G 152G" mean in the image? I can't tell. I would have thought it means number of games trained on, or something, except that at the top it says 0-Shot.