The research field Fundamental Controllability Limits has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):
- Theoretical limits to controlling any AGI using any method of causation.
- Threat models of AGI convergent dynamics that are impossible to control (by 1.).
- Impossibility theorems, by contradiction of 'long-term AGI safety' with convergence result (2.)
~ ~ ~
Definitions and Distinctions
'AGI convergent dynamic that is impossible to control':
Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.
'Control:'
- In theory, the control of system A over system B means that A can influence system B to achieve A’s desired subset of state space [Source: https://dl.acm.org/doi/10.1145/3603371].
- In practice, to engineer control of AGI requires tracking (detecting, modelling, simulating, comparing against references) effects internally to then correct for those effects externally.
'Long term':
- In theory: into perpetuity.
- In practice: over a thousand years.
'AGI safety':
Ambient conditions/contexts around planet Earth changed by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition of safety).
'AGI':
That the notion of 'artificial intelligence' (AI) can be either "narrow" or "general":
That the notion of 'narrow AI' specifically implies:
- a single domain of sense and action.
- no possibility for self base-code modification.
- a single well-defined meta-algorithm.
- that all aspects of its own self agency/intention are fully defined by its builders/developers/creators.
That the notion of 'general AI' specifically implies:
- multiple domains of sense/action;
- intrinsic non-reducible possibility for self-modification;
- and that/therefore; that the meta-algorithm is effectively arbitrary; hence;
- that it is inherently undecidable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators.
[Source: https://mflb.com/ai_alignment_1/si_safety_qanda_out.html#p3]