David James

My top interest is AI safety, followed by reinforcement learning. My professional background is in software engineering, computer science, machine learning. I have degrees in electrical engineering, liberal arts, and public policy. I currently live in the Washington, DC metro area; before that, I lived in Berkeley for about five years.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

and I don't really see how to do that without directly engaging with the knowledge of the failure modes there.

I agree. To put it another way, even if all training data was scrubbed of all flavors of deception, how could ignorance of it be durable?