Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
Satron02

Have there been any proposals for detecting alignment faking LLMs in AI control literature?

Satron16

Very interesting results. Something that I, unfortunately, was expecting to see as LLMs got better.

Are there any proposed mechanisms for preventing/detecting alignment faking in LLMs?

Satron00

"One thing I appreciate about Buck/Ryan's comms around AI control is that they explicitly acknowledge that they believe control will fail for sufficiently intelligent systems."

Does that mean that they believe that after a certain point we would lose control over AI? I am new to this field, but doesn't this fact spell doom for humanity?