Metaethics here is understood as an ideal procedure that humans are approximating when they reason about ethics i.e. when they are trying to build ethical theories.
That would make sense except that "metaethics" already has a different meaning in academic philosophy, namely studying what morality itself is. (See my Six Plausible Meta-Ethical Alternatives for a really quick intro to the main metaethical positions that I think are plausible.)
What you're calling "metaethics" here corresponds better to what philosophers call metaphilosophy. I've been pushing the importance of researching metaphilosophy in the context of AI alignment for a while, so it's nice to see someone reach similar conclusions independently. :) If you're interested in my thoughts on the topic, see Some Thoughts on Metaphilosophy and the posts that it links to.
Another line of thinking that's related is CEV.
(I'll probably come back and give some more detailed feedback on the rest of the content, but just wanted to fire off these quick notes for now.)
Thanks for all the useful links! I'm also always happy to receive more feedback.
I agree that the sense in which I use metaethics in this post is different from what academic philosophers usually call metaethics. I have the impression that metaethics, in academic sense, and metaphilosophy are somehow related. Studying what morality itself is, how to select ethical theories and what is the process behind ethical reasoning seems not independent. For example if moral nihilism is more plausible then it seems to be less likely that there is some meaningful feedback loop to select ethical theories or that there is such a meaningful thing as a ‘good’ ethical theory (at least in an observer independent way) . If moral emotivism is more plausible then maybe reflecting on ethics is more like emotions rationalisation, e.g. typically expressing in a sophisticated way something that just fundamentally means ‘boo suffering’. In that case having better understanding of metaethics in the academic sense seems to bring some light to a process that generates ethical theories, at least in humans.
Epistemic status: MSFP blog post day. General and very speculative ideas.
Proposition : Deconfusing metaethics might be a promising way to increase our chances of solving AI alignment.
What do I mean by metaethics?
Metaethics here is understood as an ideal procedure that humans are approximating when they reason about ethics i.e. when they are trying to build ethical theories. Let's have a look at mathematics for an analogy. Part of the mathematical production involves using some theory of logic to prove or disprove some conjecture about some mathematical object. Theorems, lemmas and properties that one can derive from axioms working with some logic is, roughly, part of how mathematics progresses. Another analogy is how we learn about regularities in the world by approximating Solomonoff induction. It seems that we are lacking some formalised, ideal rational procedure of ethical progress that would help us with sorting and generating ethical theories. Such a procedure seems difficult to figure out and potentially crucial to help solving AI alignment.
Why could this be important?
A better understanding of metaethics could help us decide among different ethical theories and how to generate new ones. Furthermore, knowing what the world should become and how AI should interact with it might requires us to make progress on how we should think about ethics to enlighten how we could think about aligning AI. For example, aligning AI with human values, learning and aggregating human preferences in some way, avoiding X-risks are all ethical propositions of what we should do. It is plausible that these views are flawed and that a better understanding of how to think about ethics might make us reconsider these normative stances and clarify what alignment means.
The following intuition is one of the main reasons why I think a better understanding of metaethics might be important to AI alignment research. As I am thinking more about ethics, arguing with others about it and getting more informed about the world, my ethical views evolve and it seems that I am making some sort of progress by sharpening my reasons for why I hold some ethical view or why some ethical theory seems flawed. Thus I tend to value more my future self's moral views to the extent that he has spent more time thinking about ethics and is more informed about the world so that I trust him more about deciding how I should go about transforming it. Similarly, it might be sensible for future AI systems to be able to instantiate a similar process of moral progress to update its utility function or goals according to the results of such a process that, if transparent and consulted by humans, could figure out how to transform the world through some long and efficient ethical reflection.
Some examples
For clarification, the following, non-exhaustive, criteria might be examples of how to evaluate ethical theories and constraints under which we could generate new ones.
Possible objections
This approach of AI alignment might be too top-down in its current formulation and raise a number of difficult challenges or objections toward being a research path worth pursuing :
Nevertheless such a project might have the positive aspect of not speeding up AI capability research while informing us about values and how to think about alignment. One important downside though would be that there might be other more promising projects to pursue instead.
Conclusion
To conclude I would like to suggest some possible way to imagine working toward a better understanding of metaethics and producing better ethical theories. These are extremely broad and vague suggestions to stimulate research ideas.
.