Comment Author | Post | Deleted By User | Deleted Date | Deleted Public | Reason |
---|---|---|---|---|---|
TurnTrout's shortform feed | TurnTrout | false | |||
AI Safety Camp 10 | Raemon | true | multiple people have told me this person asked to not have their email posted publicly | ||
Base LLMs refuse too | Arthur Conmy | false | Think I am wrong | ||
Backdoors as an analogy for deceptive alignment | Raemon | false | |||
Defining alignment research | DanielFilan | true | |||
TurnTrout's shortform feed | habryka | false | |||
Understanding and controlling a maze-solving policy network | Linda Linsefors | true | |||
Understanding and controlling a maze-solving policy network | Linda Linsefors | true | |||
Sycophancy to subterfuge: Investigating reward tampering in large language models | ryan_greenblatt | true | moved to make it take up less space. | ||
Fabien's Shortform | ryan_greenblatt | true | I somehow missed "We had the idea a few times to try out a detection-based approach but we didn't get around to it." |
Author | Post | Banned Users |
---|---|---|
Asymptotically Unambitious AGI |
ID | Banned From Frontpage | Banned from Personal Posts |
---|---|---|
User | Ended at | Type |
---|---|---|
allPosts | ||
allComments | ||
allPosts | ||
allComments |