Anvil Problem - History - AI Alignment Forum

Cole Wyeth v1.14.0Aug 30th 2024 GMT (+119)

It has been argued that a careful (re)definition of AIXI's off-policy behavior may patch the anvil problem in practice.

•

Created by Joshua Fox at 4y

Joshua Fox v1.13.0Sep 11th 2013 GMT (+110/-59) /* Relevance to Friendly AI */

AIXI is a valuable tool in theoretically considering the nature of super-intelligence, yet it has its limitations. From one perspective, its lack of a a self-model is a mere detail necessarily left out of a formalized abstraction. Nonetheless, for researchers of a future artificial general intelligence, a correct understanding of self-analysis and self-modification is essential.

First, since any Friendly AI must strive to avoid changes in its own goal system, and self-modeling may be valuable for ~~this. Thus, our decision theory~~this, the AI must be ~~improved to include~~based on a reflection., and today's decision theories mostly lack an understanding of reflectivity.

Second, because human values are not well-understood or formalized, the FAI may need to refine its own goal of maximizing human values. "Refining" ~~ones~~one's own goal without changing the goal's essentials is another demanding problem in reflective decision theory.

Third, an artificial general intelligence will likely choose to ~~seek~~try to enhance its own intelligence to better achieve its goals. It may do so by altering its own implementation, or by creating a new generation of AI. It may even do so without regard for the destruction of the current implementation, so long as the new system can better achieve the goals. All these forms of self-modification again raise central questions about the self-model of the AI, which, as mentioned, is not a part of AIXI.

Joshua Fox v1.12.0Aug 14th 2013 GMT (+4/-7)

~~Though~~ AIXI is an abstraction, and any real AI would have a physical embodiment that could be damaged, and an implementation which could be changed or could change its behavior due to bugs. The AIXI formalism completely ignores these possibilities (Yampolskiy & Fox, 2012).

Joshua Fox v1.11.0Sep 4th 2012 GMT (+114/-90) /* Relevance to Friendly AI */

AIXI is a valuable tool in theoretically considering the nature of super-intelligence, yet has its limitations. From one perspective, its lack of a a self-model is a mere detail necessarily left out of a formalized abstraction. Nonetheless, for researchers of a future ~~Friendly~~artificial general intelligence, a correct understanding of self-analysis and self-modification ~~must be considered carefully.~~ is essential.

First, since any Friendly AI must strive to avoid changes in its own goal system, ~~the question of~~and self-modeling ~~cannot~~may be ~~ignored. Our~~valuable for this. Thus, our decision theory must be improved to include reflection.

Second, because human values are not well-understood or formalized, the FAI may need to refine its own goal of maximizing human values. "Refining" ~~the~~ones own goal without changing ~~its~~the goal's essentials is another demanding problem in reflective decision theory.

Third, an artificial general intelligence will likely choose to ~~self-improve,~~seek to enhance its own intelligence to better achieve its goals. It may do so by altering its own implementation, or by creating a new generation of AI. It may even do so without regard for the destruction of the current implementation, so long as the new system can better achieve the goals. All these forms of self-modification again raise central questions about the self-model of the AI, which, as mentioned, is not a part of AIXI.

Joshua Fox v1.9.0Aug 22nd 2012 GMT (+9/-8) /* Relevant to Friendly AI */

RelevantRelevance to Friendly AI

Joshua Fox v1.8.0Aug 22nd 2012 GMT (+882/-331)

~~It has been pointed out by~~ Eliezer Yudkowsky ~~and others~~has pointed out that "Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens..., because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations."

AIXI, the theoretical formalism for the most intelligent possible agent, does not model itself. ~~AIXI~~It is simply a calculation of the best possible action, extrapolating into the ~~future, and~~future. This calculation at each step ~~choosing~~chooses the best action, ~~which is calculated~~ by recursively calculating the next ~~step~~step, and so on ~~into~~to the time horizon.

AIXI is very simple math. AIXI does not ~~include a model of itself to figure~~consider its own structure in figuring out what actions it will take in the future. Implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI's definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed or changed. This is not accurate for real-world implementations which may malfunction, self-modify, be destroyed, be changed, etc.

Relevant to Friendly AI

~~This~~AIXI is ~~called~~a valuable tool in theoretically considering the ~~Anvil problem: AIXI would not care if an anvil was about to drop on~~nature of super-intelligence, yet has its ~~head.~~

~~The "Anvil problem"~~limitations. From one perspective, a self-model is ~~not~~ a mere detail necessarily left out of a formalized abstraction. ~~Self-~~Nonetheless, for researchers of a future Friendly artificial general intelligence, self-analysis and self-modification ~~are likely to~~must be ~~essential parts of~~considered carefully. First, since any ~~future~~ Friendly AI~~. First, as the~~ AI must strive to avoid changes in its own goal system, the question of self-modeling cannot be ignored. Our decision theory must be improved to include reflection.

Third, an ~~FAI may~~artificial general intelligence will likely choose to self-improve, to enhance its own intelligence to better achieve its goals. It may do so by altering its own ~~implementation~~implementation, or by creating a new generation of ~~AI, perhaps~~AI. It may even do so without regard for the destruction of the current implementation, so long as the new system can better achieve the goals. All these forms of self-modification again raise central questions about the self-model of the AI, which, as mentioned, is ~~ignored by~~not a part of AIXI.

Blog comment

Eliezer Yudkowsky on Qualitatively Confused at LessWrong, 15 March 2008.

Joshua Fox v1.7.0Aug 21st 2012 GMT (+261/-135)

It has been pointed out by Eliezer Yudkowsky and others that AIXI ~~lacks~~does not model itself. AIXI is simply a ~~self-model: It extrapolates its own actions~~calculation of the best possible action, extrapolating into the ~~future indefinitely,~~future, and at each step choosing the best action, which is calculated by recursively calculating the next step and so on into the ~~assumption that it will keep working in the same way in the future.~~horizon.

AIXI is very simple math. AIXI does not include a model of itself to figure out what actions it will take in the future. Implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI's definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed or changed. This is not accurate for real-world implementations which may malfunction, self-modify, be destroyed, be changed, etc.

Joshua Fox v1.6.0Aug 21st 2012 GMT (+47/-24)

AIXI does not ~~"model itself"~~model itself to figure out what actions it will take in the future. Implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI's definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed or changed. This is not accurate for real-world implementations which may malfunction, self-modify, be destroyed, be changed, etc.

The "Anvil problem" is not a mere detail necessarily left out of a formalized abstraction. Self-analysis and self-modification ~~may~~are likely to be essential parts of any future Friendly AI. First, as itthe AI must ~~work~~strive to avoid changes in its own goal system, the question of self-modeling cannot be ignored. Our decision theory must be improved to include reflection.

Third, an FAI may choose to self-improve, to enhance its own intelligence to better achieve .its goals. It may do so by altering its own implementation or by creating a new generation of AI, perhaps without regard for the destruction of the current implementation, so long as the new system can better achieve the goals. All these forms of self-modification again raise central questions about the self-model of the AI, which, as mentioned, is ignored by AIXI.

Joshua Fox v1.4.0Aug 21st 2012 GMT (+460/-26)

Third, an FAI may choose to self-improve, to enhance its own intelligence to better achieve . It may do so by altering its own implementation or by creating a new generation of AI, perhaps without regard for the destruction of the current implementation, so long as the new system can better achieve the goals. All these forms of self-modification again raise central questions about the self-model of the AI, which, as mentioned, is ignored by AIXI.

Joshua Fox v1.3.0Aug 21st 2012 GMT (+761/-116)

"AIXI does not ~~'model itself'~~"model itself" to figure out what actions it will take in the ~~future; implicit~~future. Implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. ~~AIXI’~~AIXI's definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably ~~destroyed.~~destroyed or changed. This is not accurate for real-world implementations which may malfunction, self-modify, be destroyed, ~~self-modify, etc~~be changed, etc.

"Though AIXI is an abstraction, any real AI would have a physical embodiment that could be damaged, and an implementation which could be changed or could change its behavior due to ~~bugs; and the~~bugs. The AIXI formalism completely ignores these ~~possibilities.~~ possibilities (Yampolskiy & Fox, 2012).

This is called the Anvil problem: AIXI would not care if an anvil was about to drop on its head.~~" (Yampolskiy, Fox, 2012)~~

The "Anvil problem" is not a mere detail necessarily left out of a formalized abstraction. Self-analysis and self-modification may be essential parts of any future Friendly AI. First, as it must work to avoid changes in its own goal system, the question of self-modeling cannot be ignored. Our decision theory must be improved to include Reflective decision theory. Second, because human values are not well-understood or formalized, the FAI may need to refine its goal of maximizing human values. "Refining" the goal without changing its essentials is another demanding problem in reflective decision theory.

Joshua Fox v1.2.0Aug 21st 2012 GMT (+27/-17)

It has been pointed out by Eliezer Yudkowsky and others ~~have pointed out~~ that AIXI lacks a self-model: It extrapolates its own actions into the future indefinitely, on the assumption that it will keep working in the same way in the future.

Joshua Fox v1.1.0Aug 21st 2012 GMT (+10)

References

Joshua Fox v1.0.0Aug 21st 2012 GMT (+1284) Created page with "{{stub}} [[Eliezer Yudkowsky]] and others have pointed out that [[AIXI]] lacks a self-model: It extrapolates its own actions into the future indefinitely, on the assumption that..."

Eliezer Yudkowsky and others have pointed out that AIXI lacks a self-model: It extrapolates its own actions into the future indefinitely, on the assumption that it will keep working in the same way in the future.

"AIXI does not 'model itself' to figure out what actions it will take in the future; implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI’s definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed. This is not accurate for real-world implementations which may malfunction, be destroyed, self-modify, etc

"Though AIXI is an abstraction, any real AI would have a physical embodiment that could be damaged, and an implementation which could change its behavior due to bugs; and the AIXI formalism completely ignores these possibilities. This is called the Anvil problem: AIXI would not care if an anvil was about to drop on its head." (Yampolskiy, Fox, 2012).

R.V. Yampolskiy, J. Fox (2012) Artificial General Intelligence and the Human Mental Model. In Amnon H. Eden, Johnny Søraker, James H. Moor, Eric Steinhart (Eds.), The Singularity Hypothesis.The Frontiers Collection. London: Springer.