Posts

Sorted by New

Wiki Contributions

Comments

I have a pretty fundamental concern with these sorts of techniques as a mechanism for eventually assessing alignment

that would lead to safety or alignment goodharting problem.