Cole Wyeth

I am a PhD student in computer science at the University of Waterloo, supervised by Professor Ming Li and advised by Professor Marcus Hutter.

My current research is related to applications of algorithmic probability to sequential decision theory (universal artificial intelligence). Recently I have been trying to start a dialogue between the computational cognitive science and UAI communities. Sometimes I build robots, professionally or otherwise. Another hobby (and a personal favorite of my posts here) is the Sherlockian abduction master list, which is a crowdsourced project seeking to make "Sherlock Holmes" style inference feasible by compiling observational cues. Give it a read and see if you can contribute!

See my personal website colewyeth.com for an overview of my interests and work.

I do ~two types of writing, academic publications and (lesswrong) posts. With the former I try to be careful enough that I can stand by ~all (strong/central) claims in 10 years, usually by presenting a combination of theorems with rigorous proofs and only more conservative intuitive speculation. With the later, I try to learn enough by writing that I have changed my mind by the time I'm finished - and though I usually include an "epistemic status" to suggest my (final) degree of confidence before posting, the ensuing discussion often changes my mind again.

Wikitag Contributions

Comments

Sorted by

I usually don't think of "building a safer LLM agent" as a viable route to aligned AI

I agree that building a safer LLM agent is an incredibly fraught path that probably doesn't work. My comment is in the context of Abram's first approach, developing safer AI tech that companies might (apparently voluntarily) switch to, and specifically the route of scaling up IB to compete with LLM agents. Note that Abram also seems to be discussing the AI 2027 report, which if taken seriously requires all of this to be done in about 2 years. Conditioning on this route, I suggest that most realistic paths look like what I described, but I am pretty pessimistic that this route will actually work. The reason is that I don't see explicitly Bayesian glass-box methods competing with massive black-box models at tasks like natural language prediction any time soon. But who knows, perhaps with the "true" (IB?) theory of agency in hand much more is possible. 

More importantly, I believe that we need to complete the theory of agents first, before we can have strong confidence about which approaches are more promising.

I'm not sure it's possible to "complete" the theory of agents, and I am particularly skeptical that we can do it any time soon. However, I think we agree locally / directionally, because it also seems to me that a more rigorous theory of agency is necessary for alignment.

As to heuristic implementations of infra-Bayesianism, this is something I don't want to speculate about in public, it seems exfohazardous.

Fair enough, but in that case, it seems impossible for this conversation to meaningfully progress here.

It seems to me that an "implementation" of something like Infra-Bayesianism which can realistically compete with modern LLMs would ultimately look a lot like a semi-theoretically-justified modification to the loss function or optimizer of agentic fine-tuning / RL or possibly its scaffolding to encourage it to generalize conservatively. This intuition comes in two parts:

1: The pre-training phase is already finding a mesa-optimizer that does induction in context. I usually think of this as something like Solomonoff induction with a good inductive bias, but probably you would expect something more like logical induction. I expect the answer to be somewhere in between. I'll try to test this empirically at ARENA this May. The point is that I struggle to see how IB applies here, on the level of pure prediction, in practice. It's possible that this is just a result of my ignorance or lack of creativity.

2: I'm pessimistic about learning results for MDPs or environments "without traps" having anything to do with building a safe LLM agent.

If IB is only used in this heuristic way, we might expect fewer of the mathematical results to transfer, and instead just port over some sort of pessimism about uncertainty. In fact, Michael Cohen's work follows pretty much exactly this approach at times (I've read him mention IB about once, apparently as a source of intuition but not technical results).

None of this is really a criticism of IB; rather, I think it's important to keep in mind when considering which aspects of IB or IB-like theories are most worth developing.

This called a Hurwicz decision rule / criterion (your t is usually alpha).

I think the content of this argument is not that maxmin is fundamental, but rather that simplicity priors "look like" or justify Hurwicz-like decision rules. Simple versions of this are easy to prove but (as far as I know) do not appear in the literature.

It’s wild to me that you’ve concentrated a full 50% of your measure in the next <3 years. What if there are some aspects of intelligence which we don’t know we don’t know about yet? It’s been over ~40 years of progress since the perceptron, how do you know we’re in the last ~10% today?

There is a specific type of thinking, which I tried to gesture at in my original post, which I think LLMs seem to be literally incapable of. It’s possible to unpack the phrase “scientific insight” in more than one way, and some interpretations fall on either side of the line. 

I think the argument you’re making is that since LLMs can make eps > 0 progress, they can repeat it N times to make unbounded progress. But this is not the structure of conceptual insight as a general rule. Concretely, it fails for the architectural reasons I explained in the original post. 

It seems suspicious to me that this hype is coming from fields were it seems hard to verify (is the LLM actually coming up with original ideas or is it just fusing standard procedures? Are the ideas the bottleneck or is the experimental time the bottleneck? Are the ideas actually working or do they just sound impressive?). And of course this is Twitter. 

Why not progress on hard (or even easy but open) math problems? Are LLMs afraid of proof verifiers? On the contrary, it seems like this is the area where we should be able to best apply RL, since there is a clear reward signal. 

Yeah, I agree with this. If you feed an LLM enough hints about the solution you believe is right, and it generates ten solutions, one of them will sound to you like the right solution.

I agree; I will shift to an end-game strategy as soon as LLMs demonstrate the ability to automate research.

Load More