Certain opportunities to violate an injunction will only arise if the injunction exists; somesomeone planning a murder will only confess if he expects the priest not to testify. Thus the apparent gain from violating an injunction in a single case does not actually exist on a systemic level. If prospective murders know that priests makes exception for murders, then they won’t confess to the priest and the priest will not have the opportunity to make an exception. Injunctions that seem value destructive in single instance hypotheticals can be beneficial at a systemic level.
Ethical injunctions are rules not to do something even when you believe it's the right thing to do. (That is, you refrain "even when your brain has computed it's the right thing to do", but this will just seem like "the right thing to do".)
Linking the previous posts in the sequence to the problem of AI, this post explores ethical injunctions as failsafe mechanisms in a self-modifying AI. A simple example is that if an AI in the takeoff phase decides at iteration N that it needs to deceive it programmers about its end goals, then the goals have likely drifted too far during the modification process. An injunction against deceiving the programmers will shut down the AI before it gets any worse. Further, the AI at step N-1 will hopefully have seen this itself and built the injunction into its next iteration. As humans with many subconscious biases, a choice to impose ethical injunctions on ourselves can serve as a similar failsafe.
This post is not cross listed as a part of the listed main sequences.
Certain opportunities to violate an injunction will only arise if the injunction exists; some planning a murder will only confess if he expects the priest not to testify. Thus the apparent gain from violating an injunction in a single case does not actually exist on a systemic level. If prospective murders know that priests makes exception for murders, then they won’t confess to the priest and the priest will not have the opportunity to make an exception. Injunctions that seem value destructive in single instance hypotheticals can be beneficial at a systemic level.
This post is not cross listed as a part of the listed main sequences.
This is a round-up of some of the more interesting and insightful comments to prior posts in the sequence with detailed responses brought to the front.
This post is not cross listed as a part of the listed main sequences.
A speculative evo psych post reasoning that "ethical instincts" would have been adaptive in a context where people systemically underestimated the risks of getting caught ( see general overconfidence bias) and were punished heavily via exile from the tribe or outright death.
This post is not cross listed as a part of the listed main sequences.
A more personal / reflective post in which Eliezer looks back and observes that his ethically motivated truthfulness has led to better outcomes than he would have achieved by lying. He proposes several reasons for this including that honesty makes it harder to sweep problems away forcing him to deal with them.
This post is not cross listed as a part of the listed main sequences.
Most lies, in order to stand against rigorous investigation, would require additional lies about supporting facts. Since people do not know all aspects of all disciplines, the web of supporting lies will eventually entail making a claim that is self evidently false to someone with expert knowledge the liar does not possess. Only a god could lie to an AI.
Part of the Against Rationalization subsequence of How To Actually Change Your Mind
"The end does not justify the means" is just consequentialist reasoning at one meta-level up. If a human starts thinking on the object level that the end justifies the means, this has awful consequences given our untrustworthy brains; therefore a human shouldn't think this way. But it is all still ultimately consequentialism. It's just reflective consequentialism, for beings who know that their moment-by-moment decisions are made by untrusted hardware.
This post is not cross listed as a part of the listed main sequences.
Power corrupts is well known folk wisdom. This post gives an evo-psych explanation. Corrupt behavior provides a fitness advantage, but signaling corruption makes it hard to get power. The cleanest way to not signal corruption is to honestly believe that one will not be corrupt. Thus the fittest strategy is to couple an honest desire to do good with a tendency to find the common abuses of power pleasurable.
This post is not cross listed as a part of the listed main sequences.
Ethical Injunctionsinjunctions are rules not to do something even when you believe it's the right thing to do. (That is, you refrain "even when your brain has computed it's the right thing to do", but this will just seem like "the right thing to do".)
Ethical Injunctions are rules not to do something even when it's the right thing to do. (That is, you refrain "even when your brain has computed it's the right thing to do", but this will just seem like "the right thing to do".)
For example, you shouldn't rob banks even if you plan to give the money to a good cause.
This is to protect you from your own cleverness (especially taking bad black swan bets), and the Corrupted hardware you're running on.
Related to the Metaethics sequence.
Related to theSequences:Metaethics sequenceEthical Injunctions.Ethical Injunctions Sequence Summary
See alsoRelated Pages