jurgen123

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Newest

Why is o1 so deceptive?

jurgen123

6mo23

You write: "Doesn't the model know (in the sense that if you asked it, it would tell you) that the user wanted real references as opposed to faked references? Here, I'll ask it:"

I think what we can conclude here that the model doesn't know what it knows, like we do. It will say it values one thing, but 'forget' that it values the thing when exercising some other task. When executing a string of tokens, it is not leveraging any 'knowledge' stored in the rest of the neural net. That's what makes their knowledge (and alignment for that matter) so shallow and brittle (and honestly dangerous).

Reply