User Comment Replies — AI Alignment Forum

I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to "will first dangerous models look like current models", which I think matters more for research directions than what you allow in the second paragraph.

For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.

AGI Timelines Are Mostly Not Strategically Relevant To Alignment

Daniel Paleka3y1211

5johnswentworth3y

Yup, I definitely agree that something like "will roughly the current architectures take off first" is a highly relevant question. Indeed, I think that gathering arguments and evidence relevant to that question (and the more general question of "what kind of architecture will take off first?" or "what properties will the first architecture to take off have?") is the main way that work on timelines actually provides value. But it is a separate question from timelines, and I think most people trying to do timelines estimates would do more useful work if they instead explicitly focused on what architecture will take off first, or on what properties the first architecture to take off will have.

AI ALIGNMENT FORUM
AF

All of Daniel Paleka's Comments + Replies