2168

AI ALIGNMENT FORUM
AF

2167

Jacob_Hilton's Shortform

by Jacob_Hilton
1st May 2025
1 min read
24

4

This is a special post for quick takes by Jacob_Hilton. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Jacob_Hilton's Shortform
4Jacob_Hilton
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 10:19 PM
[-]Jacob_Hilton5mo*40

I recently gave this talk at the Safety-Guaranteed LLMs workshop:

The talk is about ARC's work on low probability estimation (LPE), covering:

  • Theoretical motivation for LPE and (towards the end) activation modeling approaches (both described here)
  • Empirical work on LPE in language models (described here)
  • Recent work-in-progress on theoretical results
Reply
Moderation Log
More from Jacob_Hilton
View more
Curated and popular this week
1Comments
Mentioned in
56My AGI timeline updates from GPT-5 (and 2025 so far)
57Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
35Research Agenda: Synthesizing Standalone World-Models