In some sense, the Agent Foundations program at MIRI sees the problem as: human values are currently an informal object. We can only get meaningful guarantees for formal systems. So, we need to work on formalizing concepts like human values. Only then will we be able to get formal safety guarantees.
unless i'm misunderstanding you or MIRI, that's not their primary concern at all:
Another way of putting this view is that nearly all of the effort should be going into solving the technical problem, "How would you get an AI system to do some very modest con
Good citation. Yeah, I should have flagged harder that my description there was a caricature and not what anyone said at any point. I still need to think more about how to revise the post to be less misleading in this respect.
One thing I can say is that the reason that quote flags that particular failure mode is because, according to the MIRI way of thinking about the problem, that is an easy failure mode to fall into.
unless i'm misunderstanding you or MIRI, that's not their primary concern at all:
... (read more)