Generalised models as a category

Stuart_Armstrong

Naming the "generalised" models

In this post, I'll apply some mathematical rigour to my ideas of model splintering, and see what they are as a category^[1].

And the first question is... what to call them? I can't refer to them as 'the models I use in model splintering'. After a bit of reflection, I decided to call them 'generalised models'. Though that's a bit vague, it does describe well what they are, and what I hope to use them for: a formalism to cover all sorts of models.

The generalised models

A generalised model $M$ is given by three objects:

$M = (F, E, Q) .$

Here $F$ is a set of features. Each feature $f$ consists of a name or label, and a set in which the feature takes values. For example, we might have the feature "room empty?" with values "true" and "false", or the feature "room temperature?" with values in $R^{+}$ , the positive reals.

We allow these features to sometimes take no values at all (such as the above two features if the room doesn't exist) or multiple values (such as "potential running speed of person $X$ " which includes the maximal speed and any speed below it).

Define $¯ ¯ ¯ f$ as the set component of the feature, and $¯ ¯¯¯ ¯ F$ as disjoint union of all the sets of the different features - ie $¯ ¯¯¯ ¯ F = ⊔_{f \in F} ¯ ¯ ¯ f$ .

A world, in the most general sense, is defined by all the values that the different features could take (including situations where features take multiple values and none at all). So the set of worlds, $W$ , is the set of functions from $¯ ¯¯¯ ¯ F$ to ${0, 1}$ , with $1$ representing the fact that that feature takes that value, and $0$ the opposite. Hence $W = 2^{¯ ¯¯ ¯ F}$ , the power set of $¯ ¯¯¯ ¯ F$ .

The set of environments is a specific subset of these worlds: $E \subset W$ . The choice of $E$ is actually more important than that of $W$ , as that establishes which values of the features we are modelling.

The $Q$ is a partial probability distribution. In general, we won't worry as to whether $Q$ is normalised (ie whether $Q (E) = 1$ ) or not; we'll even allow $Q$ s with $Q (E) > 1$ . So $Q$ could be more properly be defined as a partial weight distribution. As long as we consider terms like $Q (A ∣ B)$ , then the normalisation doesn't matter.

Morphisms: relations

For simplicity, assume there are finitely many features taking values in finite sets, making all sets in the generalised model finite.

If $M_{0} = (F_{0}, E_{0}, Q_{0})$ and $M^{1} (F_{1}, E_{1}, Q_{1})$ are generalised models, then we want to use binary relations between $E_{0}$ and $E_{1}$ as morphisms between the generalised models.

Let $r$ be a relation between $E_{0}$ and $E_{1}$ , written as $e_{0} \sim_{r} e_{1}$ . Then it defines a map $r : 2^{E_{0}} \to 2^{E_{1}}$ between subsets of $E_{0}$ and $E_{1}$ . This map is defined by $e_{1} \in r (E_{0})$ iff there exists an $e_{0} \in E_{0}$ with $e_{0} \sim_{r} e_{1}$ . The map $r^{- 1} : 2^{E_{1}} \to 2^{E_{0}}$ is defined similarly^[2], seeing $r^{- 1}$ as the inverse relation, $e_{0} \sim_{r} e_{1}$ iff $e_{1} \sim_{r^{- 1}} e_{0}$ .

We say that the relation $r$ is a morphism between the generalised models if, for any $E_{0} \subset E_{0}$ and $E_{1} \subset E_{1}$ :

$Q_{0} (E_{0}) \leq Q_{1} (r (E_{0}))$ , or both measures are undefined.
$Q_{1} (E_{1}) \leq Q_{0} (r^{- 1} (E_{1}))$ , or both measures are undefined.

The intuition here is that probability flows along the connections: if $e_{0} \sim_{r} e_{1}$ then probability can flow from $e_{0}$ to $e_{1}$ (and vice-versa). Thus $r (E_{0})$ must have picked up all the probability that flowed out of $E_{0}$ - but it might have picked up more probability, since there may be connections coming into it from outside $E_{0}$ . Same goes for $r^{- 1} (E_{1})$ and the probability of $E_{1}$ .

Morphisms properties

We now check that these relations obey the requirements of morphisms in category theory.

Let $r$ be a morphism $M_{0} \to M_{1}$ (ie a relation between $E_{0}$ and $E_{1}$ ), and let $q$ be a morphism $M_{1} \to M_{2}$ (ie a relation between $E_{1}$ and $E_{2}$ ).

We compose relations by the composition of relations: $e_{0} \sim_{p r} e_{2}$ iff there exists an $e_{1}$ with $e_{0} \sim_{r} e_{1}$ and $e_{1} \sim_{p} e_{2}$ . Composition of relations is associative.

We now need to show that $q r$ is a morphism. But this is easy to show:

$Q_{0} (E_{0}) \leq Q_{1} (r (E_{0})) \leq Q_{2} (p r (E_{0}))$ , or all three measures are undefined.
$Q_{2} (E_{2}) \leq Q_{1} (p^{- 1} (E_{2})) \leq Q_{0} (r^{- 1} p^{- 1} (E_{2}))$ , or all three measures are undefined.

Finally, the identity relation $I d_{E_{0}}$ is the one that relates a given $e_{0} \in E_{0}$ only to itself; then $r$ and $r^{- 1}$ are the identity maps on $2^{E_{0}}$ , and the morphism properties for $Q_{0} = Q_{1}$ are trivially true.

So define the category of generalised models as $G M$ .

$r$ -stable sets

Say that a set $E_{0} \subset E_{0}$ is $r$ -stable if $r^{- 1} r (E_{0}) = E_{0}$ .

For such an $r$ -stable set, $Q_{0} (E_{0}) \leq Q_{1} (r (E_{0}))$ and $Q_{1} (r (E_{0})) \leq Q_{0} (r^{- 1} r (E_{0})) = Q_{0} (E_{0})$ , thus $Q_{0} (E_{0}) = Q_{1} (r (E_{0}))$ .

Hence if $r$ is a morphism, it preserves the probability measure on the $r$ -stable sets.

In the particular case where $r$ is a bijective function, all points of $E_{0}$ are $r$ -stable (and all points of $E_{1}$ are $r^{- 1}$ -stable), so it's an isomorphism between $E_{0}$ and $E_{1}$ that forces $Q_{0} = Q_{1}$ .

Morphism example: probability update

Suppose we wanted to update our probability measure $Q_{0}$ , maybe by updating that a particular feature $f$ takes a certain value $x$ .

Then let $E_{f = x} \subset E_{0}$ be the set of environments where $f$ takes that value $x$ . Then updating on $f = x$ is the same as restricting to $E_{f = x}$ and then rescaling.

Since we don't care about the scaling, we can consider updating on $f = x$ as just restricting to $E_{f = x}$ . This morphism is given by:

$M_{1} = (F_{0}, E_{f = x}, Q_{1})$ ,
$Q_{1} = Q_{0}$ on $E_{f = x} \subset E_{0}$ ,
the morphism $r : M_{0} \to M_{1}$ is given by the relation that $e_{0} \sim_{r} e_{0}$ for all $e_{0} \in E_{f = x}$ .

Morphism example: surjective partial function

In my previous posts I defined how $M_{1} = (F_{1}, E_{1}, Q_{1})$ could be a refinement of $M_{0} = (F_{0}, E_{0}, Q_{0})$ .

In the language of the present post, $M_{1}$ is a refinement of $M_{0}$ if there exists a generalised model $M_{1}^{'} = (F_{1}, E_{1}, Q_{1}^{'})$ and a surjective partial function $r : E_{1} \to E_{0}$ (functions and partial functions are specific examples of binary relations) that is a morphism from $M_{1}^{'}$ to $M_{0}$ . The $Q_{1}$ is required to be potentially 'better' than $Q_{1}^{'}$ on $E_{1}$ , in some relevant sense.

This means that $M_{1}$ is 'better' than $M_{0}$ in three ways. The $r$ is surjective, so $E_{1}$ covers all of $E_{0}$ , so its set of environments is at least as detailed. The $r$ is a partial function, so $E_{1}$ might have even more environments that don't correspond to anything in $E_{0}$ (it considers more situations). And, finally, $Q_{1}$ is better than $Q_{1}^{'}$ , by whatever definition of better that we're using.

Feature-split relations

The morphisms/relations defined so far use $E$ and $Q$ - but they don't make any use of $F$ . Here is one definition that does make use of the feature structure.

Say that the generalised model $M = (F, E, Q)$ is feature-split if $F = ⊔_{i = 1}^{n} F^{i}$ and $E = \times_{i = 1}^{n} E^{i}$ such that

$E^{i} \subset 2^{^{i}} .$

Note that $F = ⊔_{i = 1}^{n} F^{i}$ implies $W = 2^{¯ ¯¯ ¯ F} = \times_{i = 1}^{n} 2^{^{i}}$ , so $\times_{i = 1}^{n} E^{i}$ lies naturally within $W$ .

Designate such a generalised model by $M = ({F^{i}}, E, Q)$ .

Then a feature-split relation between $M_{0} = ({F_{0}^{i}}, E_{0}, Q_{0})$ and $M_{1} = ({F_{1}^{i}}, E_{1}, Q_{1})$ is a morphism $r$ that is defined as $r = (r^{1}, r^{2}, \dots, r^{n})$ with $r^{i}$ a relation between $E_{0}^{i}$ and $E_{1}^{i}$ .

I'm not fully sold on category theory as a mathematical tool, but it's certainly worthwhile to formalise your mathematical structures so that they can fit within the formalism of a category; it makes you think carefully about what you're doing. ↩︎
There is a slight abuse of notation here: $r : 2^{E_{0}} \to 2^{E_{1}}$ and $r^{- 1} : 2^{E_{1}} \to 2^{E_{0}}$ are not generally inverses. They are inverses precisely for the "r-stable" sets that are discussed further down in the post. ↩︎

[-]sj99993y60

I think these might be some typos you could correct:

, or both measures are undefined.

The $E_{0}$ should be $E_{1}$ .

For such an $r$ -stable set, $Q_{0} (E_{0}) \leq Q_{1} (r (E_{0}))$ and $Q_{1} (r (E_{1}) \leq Q_{0} (r^{- 1} r (E_{0})) = Q_{0} (E_{0})$ , thus $Q_{0} (E_{0}) = Q_{1} (r (E_{0}))$ .

There is a missing parenthesis and the $E_{1}$ should be $E_{0}$ : $Q_{1} (r (E_{0})) \leq Q_{0} (r^{- 1} r (E_{0})) = Q_{0} (E_{0})$

[-]Koen Holtman4y10

Cross reference: I am not a big fan of stating things in category theory notation, so I made some remarks on the building and interpretation of generalised models in the comment section of this earlier post on model splintering.

[-]Stuart Armstrong4y20

Cheers! My opinion on category theory has changed a bit, because of this post; by making things fit into the category formulation, I developed insights into how general relations could be used to connect different generalised models.

[-]Koen Holtman4y30

Definitely, it has also been my experience that you can often get new insights by constructing mappings to different models or notations.

[-]Morgan Rogers4y00

Re "I'm not fully sold on category theory as a mathematical tool", if someone (e.g. me) were to take the category you've outlined and run with it, in the sense of establishing its general structure and special features, could you be convinced? Are there questions that you have about this category that you currently are only able to answer by brute force computation from the definitions of the objects and morphisms as you've given them? More generally, are there variants of this category that you've considered that it might be useful to study in parallel?

AI ALIGNMENT FORUM
AF

14

Generalised models as a category

14

Naming the "generalised" models

The generalised models

Morphisms: relations

Morphisms properties

$r$ -stable sets

Morphism example: probability update

Morphism example: surjective partial function

Feature-split relations

14

Generalised models as a category

14

The generalised models

Morphisms: relations

Morphisms properties

r-stable sets

Morphism example: probability update

Morphism example: surjective partial function

Feature-split relations

$r$ -stable sets