Logan Zoellner - AI Alignment Forum

Modern Transformers are AGI, and Human-Level

Absolutely. I don't think it's impossible to build such a system. In fact, I think a transformer is probably about 90% there. Need to add trial and error, some kind of long-term memory/fine-tuning and a handful of default heuristics. Scale will help too, but no amount of scale alone will get us there.

Modern Transformers are AGI, and Human-Level

Logan Zoellner1y1-2

It certainly wouldn't generalize to e.g Hidouku

Modern Transformers are AGI, and Human-Level

Logan Zoellner1y30

In the technical sense that you can implement arbitrary programs by prompting an LLM (they are turning complete), sure.

In a practical sense, no.

GPT-4 can't even play tic-tac-toe. Manifold spent a year getting GPT-4 to implement (much less discover) the algorithm for Sudoku and failed.

Now imagine trying to implement a serious backtracking algorithm. Stockfish checks millions of positions per turn of play. The attention window for your "backtracking transformer" is going to have to be at lease {size of chess board state}*{number of positions evaluated}.

And because of quadratic attention, training it is going to take on the order of {number or parameters}*({chess board state size}*{number of positions evaluated})^2

Even with very generous assumptions for {number of parameters} and {chess board state}, there's simply no way we could train such a model this century (and that's assuming Moore's law somehow continues that long).

Modern Transformers are AGI, and Human-Level

Logan Zoellner1y-20

Obvious bait is obvious bait, but here goes.

Transformers are not AGI because they will never be able to "figure something out" the way humans can.

If a human is given the rules for Sudoku, they first try filling in the square randomly. After a while, they notice that certain things work and certain things don't work. They begin to define heuristics for things that work (for example, if all but one number appears in the same row or column as a box, that number goes in the box). Eventually they work out a complete algorithm for solving Sudoku.

A transformer will never do this (pretending Sudoku wasn't in its training data). Because they are next-token predictors, they are fundamentally incapable of reasoning about things not in their training set. They are incapable of "noticing when they made a mistake" and then backtracking they way a human would.

Now it's entirely possible that a very small wrapper around a Transformer could solve Sudoku. You could have the transformer suggest moves and then add a reasoning/planning layer around it to handle the back-tracking. This is effectively what Alpha-Geometry does.

But a Transformer BY ITSELF will never be AGI.

Distinguishing test from training

Logan Zoellner1y0-2

"reality is large" is a bad objection.

It's possible in principle to build a simulation that is literally indistinguishable from reality. Say we only run the AI in simulation for 100million years, and there's a simulation overhead of 10x. That should cost (100e6 ly)**3*(100e6 years) * 10 of our future lightcone. This is a minuscule fraction of our actual future lightcone (9.4e10 ly) * (10^15 y)

A few better objections:

Simulating a universe with a paperclip maximizer in it means simulating billions of people being murdered and turned into paperclips. If we believe computation=existence, that's hugely morally objectionable.

The AGI's prior that it is in a simulation doesn't depend on anything we do, only on the universal prior.

A Data limited future

Logan Zoellner3y10

Let's take a concrete example.

Assume you have an AI that could get 100% on every Putnam test, do you think it would be reasonable or not to assume such an AI would also display superhuman performance at solving the Yang-Mills Mass Gap?

A Data limited future

Logan Zoellner3y10

This doesn't include working out advances in fundamental physics, or designing a fusion reactor, or making breakthroughs in AI research.

Why don't all of these fall into the self-play category? Physics, software and fusion reactors can all be simulated.

I would be mildly surprised if a sufficiently large language model couldn't solve all of Project Euler+Putnam+MATH dataset.

A Data limited future

Logan Zoellner3y40

I strongly doubt we live in a data-limited AGI timeline

Humans are trained using much less data than Chinchilla
We haven't even begun to exploit forms of media other than text (Youtube alone is >2OOM bigger)
self-play allows for literally limitless amounts of data
regularization methods mean data constraints aren't nearly as important as claimed
In the domains where we have exhausted available data, ML models are already weakly superhuman

Response to Blake Richards: AGI, generality, alignment, & loss functions

Logan Zoellner3y30

I’m not quite sure what you mean here.

In the standard picture of a reinforcement learner, suppose you get to specify the reward function and i get to specify the "agent". No matter what reward function you choose, I claim I can make an agent that both: 1) gets a huge reward compared to some baseline implementation 2) destroys the world. In fact, I think most "superintelligent" systems have this property for any reward function you could specify using current ML techniques.

Now switch the order, I design the agent first and ask you for an arbitrary reward function. I claim that there exist architectures which are: 1) useful, given the correct reward function 2) never, under any circumstances destroy the world.

Response to Blake Richards: AGI, generality, alignment, & loss functions

Logan Zoellner3y30

What loss function(s), when sent into a future AI’s brain-like configuration of neocortex / hippocampus / striatum / etc.-like learning algorithms, will result in an AGI that is definitely not trying to literally exterminate humanity?

Specifying a correct loss functions is not the right way to think about the Alignment Problem. A system's architecture matters much more than its loss function for determining whether or not it is dangerous. In fact, there probably isn't even a well-defined loss function that would remain aligned under infinite optimization pressure.

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments