Daniel Paleka

Posts

Sorted by New

1Daniel Paleka's Shortform

2d

0

Wikitag Contributions

Comments

Sorted by

Newest

AGI Timelines Are Mostly Not Strategically Relevant To Alignment

Daniel Paleka

3y1211

I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to "will first dangerous models look like current models", which I think matters more for research directions than what you allow in the second paragraph.

For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.

Reply