Consequentialist reasoning selects policies on the basis of their predicted consequences - it does action because is forecasted to lead to preferred outcome . Whenever we reason that an agent which prefers outcome over will therefore do instead of we're implicitly assuming that the agent has the cognitive ability to do consequentialism at least about s and s. It does means-end reasoning; it selects means on the basis of their predicted ends plus a preference over ends.
E.g: When we infer that a paperclip maximizer would try to improve its own cognitive abilities given means to do so, the background assumptions include:
(Technically, since the forecasts of our actions' consequences will usually be uncertain, a coherent agent needs a utility function over outcomes and not just a preference ordering over outcomes.)
The related idea of "backward chaining" is one particular way of solving the cognitive problems of consequentialism: start from a desired outcome/event/future, and figure out what intermediate events are likely to have the consequence of bringing about that event/outcome, and repeat this question until it arrives back at a particular plan/policy/action.
Many narrow AI algorithms are consequentialists over narrow domains. A chess program that searches far ahead in the game tree is a consequentialist; it outputs chess moves based on the expected result of those chess moves and your replies to them, into the distant future of the board.
We can see one of the critical aspects of human intelligence as cross-domain consequentialism. Rather than only forecasting consequences within the boundaries of a narrow domain, we can trace chains of events that leap from one domain to another. Making a chess move wins a chess game that wins a chess tournament that wins prize money that can be used to rent a car that can drive to the supermarket to get milk. An Artificial General Intelligence that could learn many domains, and engage in consequentialist reasoning that leaped across those domains, would be a sufficiently advanced agent to be interesting from most perspectives on interestingness. It would start to be a consequentialist about the real world.
Some systems are pseudoconsequentialist - they in some ways behave as if outputting actions on the basis of their leading to particular futures, without using an explicit cognitive model and explicit forecasts.
For example, natural selection has a lot of the power of a cross-domain consequentialist; it can design whole organisms around the consequence of reproduction (or rather, inclusive genetic fitness). It's a fair approximation to say that spiders weave webs because the webs will catch prey that the spider can eat. Natural selection doesn't actually have a mind or an explicit model of the world; but millions of years of selecting DNA strands that did in fact previously construct an organism that reproduced, gives an effect sort of like outputting an organism design on the basis of its future consequences. (Although if the environment changes, the difference suddenly becomes clear: natural selection doesn't immediately catch on when humans start using birth control. Our DNA goes on having been selected on the basis of the old future of the ancestral environment, not the new future of the actual world.)
Similarly, a reinforcement-learning system learning to play Pong might not actually have an explicit model of "What happens if I move the paddle here?" - it might just be re-executing policies that had the consequence of winning last time. But there's still a future-to-present connection, a pseudo-backwards-causation, based on the Pong environment remaining fairly constant over time, so that we can sort of regard the Pong player's moves as happening because it will win the Pong game.
Consequentialism is an extremely basic idiom of optimization:
Anything that Aristotle would have considered as having a "final cause", or teleological explanation, without being entirely wrong about that, is something we can see through the lens of cognitive consequentialism or pseudoconsequentialism. A plan, a design, a reinforced behavior, or selected genes: Most of the complex order on Earth derives from one or more of these.
Consequentialism or pseudoconsequentialism, over various domains, is an advanced agent property that is a key requisite or key threshold in several issues of AI alignment and advanced safety:
Above all: The human ability to think of a future and plan ways to get there, or think of a desired result and engineer technologies to achieve it, is the source of humans having enough cognitive capability to be dangerous. Most of the magnitude of the impact of an AI, such that we'd want to align in the first place, would come in a certain sense from that AI being a sufficiently good consequentialist or solving the same cognitive problems that consequentialists solve.
Since consequentialism seems tied up in so many issues, some of the proposals for making alignment easier have in some way tried to retreat from, limit, or subvert consequentialism. E.g:
But since consequentialism is so close to the heart of why an AI would be sufficiently useful in the first place, getting rid of it tends to not be that straightforward. E.g:
Since 'consquentialism' or 'linking up actions to consequences' or 'figuring out how to get to a consequence' is so close to what would make advanced AIs useful in the first place, it shouldn't be surprising if some attempts to subvert consequentialism in the name of safety run squarely into an unresolvable safety-usefulness tradeoff.
Another concern is that consequentialism may to some extent be a convergent or default outcome of optimizing anything hard enough. E.g., although natural selection is a pseudoconsequentialist process, it optimized for reproductive capacity so hard that it eventually spit out some powerful organisms that were explicit cognitive consequentialists (aka humans).
We might similarly worry that optimizing any internal aspect of a machine intelligence hard enough would start to embed consequentialism somewhere - policies/designs/answers selected from a sufficiently general space that "do consequentialist reasoning" is embedded in some of the most effective answers.
Or perhaps a machine intelligence might need to be consequentialist in some internal aspects in order to be smart enough to do sufficiently useful things - maybe you just can't get a sufficiently advanced machine intelligence, sufficiently early, unless it is, e.g., choosing on a consequential basis what thoughts to think about, or engaging in consequentialist engineering of its internal elements.
In the same way that expected utility is the only coherent way of making certain choices, or in the same way that natural selection optimizing hard enough on reproduction started spitting out explicit cognitive consequentialists, we might worry that consequentialism is in some sense central enough that it will be hard to subvert - hard enough that we can't easily get rid of instrumental convergence on problematic strategies just by getting rid of the consequentialism while preserving the AI's usefulness.
This doesn't say that the research avenue of subverting consequentialism is automatically doomed to be fruitless. It does suggest that this is a deeper, more difficult, and stranger challenge than, "Oh, well then, just build an AI with all the consequentialist aspects taken out."