Lessons from Studying Two-Hop Latent Reasoning

Do you have an explanation for there is a bridge-entity representation mismatch for synthetic facts but not for real ones? What in "real" training allows LLMs to learn a common representation of input and output entities? Can you emulate that with additional fine-tuning on more synthetic documents?

[-]Tomek Korbak3mo20

We don't have a good explanation. One idea could be that you need bridge entities to be somehow more internalized to support latent two-hop reasoning, e.g. they need to occur in many facts as first and as second entities or maybe they need to occur in other two-hop questions. The Grokked transformers paper has some results linking the ratio of e2 and e3 to two-hop performance (in toy grokking settings).

[-]Fabien Roger3mo31

Determining monitorability might thus be best achieved via end-to-end evaluations, where LLM agents attempt to complete agentic tasks while their CoT is monitored by another model.

I think those are useful, but I think more basic science results are also important, especially when they take the form "LLMs differ from humans learning from data in way X" because it lets you find ways in which modeling LLMs as "like humans, but weaker" is wrong. In particular, I think the reversal curse paper updated me much more than this GDM paper (though I think the GDM paper has other qualities, like illustrating a specific hope for CoT monitoring, which I think is important). Just because you failed at finding such a result in this case and got a more messy "LLMs can somewhat do the same 2-hop-reasoning that humans can also do, except in these synthetic cases" doesn't mean there aren't other reversal-curse-like results that remain to be found.

But I think these "direct observation" papers will get somewhat more informative as we get closer to AIs that are actually scary, since the setups could actually get quite close to real catastrophic reasoning (as opposed to the more toy setups from the GDM paper).

[-]Tomek Korbak3mo30

Just because you failed at finding such a result in this case and got a more messy "LLMs can somewhat do the same 2-hop-reasoning that humans can also do, except in these synthetic cases" doesn't mean there aren't other reversal-curse-like results that remain to be found.

I think that's fair, it might be that we've over-updated a bit after getting results we did not expect (we did expect a reversal-curse-like phenomenon).

Two big reasons why I'm hesitant to draw conclusions about monitorability in agents setting is that our setup is a simplistic (QA, non-frontier models) and we don't offer a clean explanation of why we see the results we see.

[-]J Bostock3mo20

RE part 6:

I think there's a more intuitive/abstract framing here. If a model has only seen e_2 with respect to two different facts, it probably won't have generated an abstraction for e_2 in its world model at all. An abstraction is mostly useful as a hub of different inferences, like in the old blegg/rube diagram.

Something which has come up in pretraining will already be an abstraction with an easy-to-reach-for handle that the model can pull.

Might be testable by fine-tuning on only some of (or some pairs of) the spokes of a blegg/rube diagram, to see whether the final spoke-pairs fill in.

I.e.

"This object is round, so it's a blegg, so it's blue"

"This object is smooth, so it's a blegg, so it's round"

"This object is smooth, so it's a blegg, so it's bouncy"

"This object is round, is it bouncy?"

Something like that might cause "blegg" to be bound up and assembled into an abstraction in the AI, with a single representation.

Overall I consider this work to be weak evidence in favour of multi-step reasoning being an issue, since the latter parts show that it definitely can occur (just not if both facts are fine-tuned separately)

[-]Tomek Korbak3mo10

Yeah, it seems plausible that entity being activated across different context is necessary for it being represented saliently enough to facilitate multi-hop reasoning. The Grokked transformer paper has some results linking the ratio of e2 and e3 to two-hop performance (in toy settings).

[-]David Johnston3mo10

We did some related work: https://arxiv.org/pdf/2502.03490.

One of our findings was that with synthetic data, it was necessary to have e1->e2 as the first hop in some two-hop question and e2->e3 as the second hop in some two hop question in order to learn e1->e3. This differs from your finding with "natural" facts: if e2->e3 is a "natural" fact, then it plausibly does appear as a second hop in some of the pretraining data. But you find generalization even when they synthetic e1->e2 is present only by itself, so there seems to be a further difference between natural facts and synthetic facts that appear as second hops.

We also found that learning synthetic two hop reasoning seems to take about twice as many parameters (or twice as much "knowledge capacity") as learning only the one-hop questions from the same dataset, supporting the idea that, for transformers, learning to use a fact in either hop of a latent two-hop question requires something like learning that fact twice.

Did you try any experiments with a synthetic second hop instead of a synthetic first hop? It would be interesting to know whether "natural facts" can be composed flexibly with new facts or whether they can only be composed with new first hops. Our results suggest that there's a substantial cost to making facts latently composable, so I think it would be surprising if many facts were flexibly composable, especially if many of those facts were reasonably rare.

[-]Tomek Korbak3mo10

Thanks! I somehow missed this paper, looks interesting!

Overall, I agree with the sentiment that two-hop latent reasoning is unreliable (e.g. our average accuracy is around 20%). We didn't intend to leave readers with an impression that it "just works". It seems very plausible to me that it's less parameter-efficient and that there are some additional, unidentified factors required for successful two-hop reasoning.

Did you try any experiments with a synthetic second hop instead of a synthetic first hop?

We did not, but Jiahai Feng had an experiment like this in his paper.

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

29

Lessons from Studying Two-Hop Latent Reasoning

29