Fundamental Controllability Limits

last updated 28th Dec 2024

The research field Fundamental Controllability Limits has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):

Theoretical limits to controlling any AGI using any method of causation.

Threat models of AGI convergent dynamics that are impossible to control (by 1.).

Impossibility theorems, by contradiction of 'long-term AGI safety' with convergence result (2.)

~ ~ ~

Definitions and Distinctions

'AGI convergent dynamic that is impossible to control':

Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.

'Control:'

In theory, the control of system A over system B means that A can influence system B to achieve A’s desired subset of state space [Source: https://dl.acm.org/doi/10.1145/3603371].

In practice, to engineer control of AGI requires tracking (detecting, modelling, simulating, comparing against references) effects internally to then correct for those effects externally.

'Long term':

In theory: into perpetuity.

In practice: over a thousand years.

'AGI safety':

Ambient conditions/contexts around planet Earth changed by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition of safety).

'AGI':

That the notion of 'artificial intelligence' (AI) can be either "narrow" or "general":

That the notion of 'narrow AI' specifically implies:

a single domain of sense and action.

no possibility for self base-code modification.

a single well-defined meta-algorithm.

that all aspects of its own self agency/intention are fully defined by its builders/developers/creators.

That the notion of 'general AI' specifically implies:

multiple domains of sense/action;

intrinsic non-reducible possibility for self-modification;

and that/therefore; that the meta-algorithm is effectively arbitrary; hence;

that it is inherently undecidable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators.

[Source: https://mflb.com/ai_alignment_1/si_safety_qanda_out.html#p3]

Posts tagged Fundamental Controllability Limits

1

7Projects I would like to see (possibly at AI Safety Camp)

Linda Linsefors

1y

5