Good idea!
Fix some alphabet . Here's how you make an automaton that checks that the input sequence (an element of ) is a subsequence of some infinite periodic sequence with period . For every in , let be an automaton that checks whether the symbols in the input sequences at places s.t. are all equal (its number of states is ). We can modify it to make a transducer that produces its unmodified input sequence if the test passes and if the test fails. It also produces when the input is . We then chain and get the desired automaton. Alternatively, we can connect the in parallel and then add an automaton with boolean inputs that acts as an AND gate. is a valid multi-input automaton in our language because AND is associative and commutative (so we indeed get a functor on the product category).
Notice that the description size of this automaton in our language is polynomial in . On the other hand, a tabular description would be exponential in (the full automaton has exponentially many states). Moreover, I think that any regular expression for this language is also exponentially large.
We only talked about describing deterministic (or probabilistic, or monadic) automata. What about nondeterministic? Here is how you can implement a nondeterministic automaton in the same language, without incurring the exponential penalty of determinization, assuming non-free categories are allowed.
Let be some category that contains an object and a morphism s.t. and . For example it can be the closed cartesian category freely generated by this datum (which I think is well-defined). Then, we can simulate a non-deterministic automaton on category by a deterministic transducer from to :
However, this trick doesn't give us nondeterministic transducers. A completely different way to get nondeterminism, which works for transducers as well, is using infra-Bayesian Knightian uncertainty to represent it. We can do via the memo monad approach, but then we get the constraint that nondeterministic transitions happen identically in e.g. identical subgraphs. This doesn't matter for ordinary automata (that process words) but it matters for tree/graph automata. If we don't want this constraint, we can probably do it in a framework that uses "infra-Markov" categories (which I don't know how to define atm, but it seems likely to be possible).
Together with Example 1, this implies that (this version of) our language is strictly more powerful than regular expressions.
Suppose we want to take two input sequences (elements of ) and check them for equality. This is actually impossible without the meta-vertex, because of the commutativity properties of category products. With the meta-vertex (the result is not an automaton anymore, we can call it a "string machine"), here is how we can do it. Denote the input sequences and . We apply a transducer to that transforms it into a string diagram. Each symbol in is replaced by a transducer that checks whether the first symbol of its input is and outputs the same sequence with the first symbol removed. These transducers are chained together. In the end of the chain we add a final transducer that checks whether its input is empty. Finally, we use the meta-vertex to feed into this chain of transducers.
Notice that this requires only one level of meta: the string diagram we generate doesn't contain any meta-vertices.
More generally, we can simulate any multi-tape automaton using a succinct string machine with one level of meta: First, there is a transducer that takes the tapes as separate inputs and simulates a single step of the multi-tape automaton. Here, moving the tape head forward corresponds to outputting a sequence with the first symbol omitted. Second, there is a transducer that takes the inputs and produces a string diagram that consists of copies of chained together (this transducer just adds another to the chain per every input symbol). Finally, the desired string machine runs and then uses a meta-vertex to apply the diagram produces to the input.
A string machine with unbounded meta can simulate a 1D cellular automaton, and hence any Turing machine: First, there is a transducer which simulates a single step of the automaton (this is a classical transducer, not even the stronger version we allow). Second, we already know there is an automaton that tests for equality. Third, fix some automaton whose job is to read the desired output off the final state of the cellular automaton. We modify to make a transducer , which (i) when the inputs are equal, produces a string diagram that consists only of (ii) when the inputs are different, produces this string machine. This string macine applies to the input , then applies to and , then uses the meta-vertex to apply to ...
Except that I cheated here because the description of references a string machine that contains . However, with closed machines such quining can be implemented: instead of producing a string diagram, the transducer produces an morphism that converts transducers to string diagrams, and then we feed the same transducer composed in the same way with itself into this morphism.
TLDR: I sketch a language for describing (either Bayesian or infra-Bayesian) hypotheses about computations (i.e. computable[1] logical/analytic facts). This is achieved via a novel (AFAIK) branch of automata theory, which generalizes both ordinary and tree automata by recasting them in category-theoretic language. The resulting theory can be equipped with a intriguing self-referential feature: automata that "build" and run new automata in "runtime".
Epistemic status: this is an "extended shortform" i.e. the explanations are terse, the math is a sketch (in particular the proofs are not spelled out) and there might be errors (but probably not fatal errors). I made it a top-level post because it seemed worth reporting and was too long for a shortform.
Motivation
Finite-state automata are a convenient tool for constructing hypotheses classes that admit efficient learning. A classical example is MDPs in reinforcement learning. Ordinary automata accept strings in input. This is convenient when the input is a time sequence. However, in Turing Reinforcement Learning and infra-Bayesian physicalism we want to equip our agent with beliefs about computations, which don't form a time sequence in the ordinary sense.
Computations are often represented as trees: e.g. syntax trees of lambda calculus, or any other functional programming language, or expressions defining recursive functions. Hence, the obvious guess is using tree automata. Specifically, we will focus on bottom-up tree automata which are more powerful and more convenient for other reasons as well.
One limitation of this is that hypotheses described using (infra-)probabilistic tree automata are missing correlations between computations. Even if two programs contain a literally identical subroutine, or a single program uses multiple copies of an identical subroutine, the uncertainty is about the different copies is uncorrelated.
Another limitation is that sophisticated automata might require exponentially many states, meaning that a straightforward (tabular) description of the transition rules is exponentially large. Instead, we would like a convenient compositional language for constructing such automata (analogously[2] to the compositional language I proposed for MDPs).
Therefore, a generalization is called for.
Deterministic Operadic Automata
Bottom-up deterministic tree automata are algebras over the operad of trees[3] (modulo accepting states, which are not encoded in the algebra structure). More precisely, they are algebras over the free operad generated by the input alphabet of the tree automaton, where the arity of each symbol in the alphabet becomes the operadic arity of the corresponding generator. The transition rules of the tree automaton are the operations in the algebra.
So, we can now define a deterministic automaton over a (colored) operad O to be an O-algebra. The automaton is "finite" when the colors are a finite set, and the algebra elements of each color are a finite set. An example application is: take the operad of lambda terms, quotient by η-reduction and α-conversion, and add the renaming of free variables as additional operations into the operad. This leads to hypotheses-about-lambda-expressions which have a better inherent understanding of the semantics. On the other hand, the finiteness condition seems too restrictive now. But, maybe we can get away with, finite number of orbits under the action of the variable-renaming group. More generally, it is not clear which operads lead to a computationally-convenient notion of automaton (similarly how many problems are computationally/statistically tractable for ordinary automata): this seems like a good problem to investigate.
Deterministic Operadic Transducers
The automata-theoretic notion of "transducer" also has a natural operadic analogue. However, in full generality, this analogue turns out to be much more powerful than classical transducers, even if the underlying operad is just the operad of the free monoid over some alphabet (i.e. in the case where operadic automata are just ordinary automata)[4]: ordinary transducers can only add symbols to the end of the output as they process new input symbols, whereas operadic transducers can e.g. add symbols to the beginning, or write into two "working tapes" s.t. the output is their concatenation.
Given an operad O, we define the associated "state operad"[5] OS as follows. Define "O-signature" to be a pair of (i) a list of O-colors (the input types) and (ii) an O-color (the output type).
Given operads O1,2 and A,B respective colors, a determinsitic operadic transducer from A to B can now be defined as an operad morphism F from O1 to OS2 s.t. for any x∈S(F(A)), the tuple V(x) starts with a 0-ary signature whose output color is B. Given such F, we can transform any 0-ary operation with output color A to a 0-ary operation with output color B (exercise for the reader).
It is also possible to make transducers process morphisms of general signature. We leave out the details, since the notation in the operad setting is unwieldy. We do explain it for cartesian transducers below.
Deterministic Cartesian Automata/Transducers
It is often convenient to represent programs as directed acyclic graphs (DAGs) rather than trees. Indeed, we want to be able to define a term once and use it multiple types later. Hence, it is natural to look for a flavor of automaton that can process DAGs. This is achieved by replacing operads with cartesian monoidal categories and algebras by strong monoidal functors from the category to Set. DAGs are thought of as morphisms in a free cartesian category, just like trees are operations in a free operad.
Given a cartesian category C, we can define the "state cartesian category" CS as follows. Here, a "C"-signature" is just a pair of objects (source and target).
Given A,B objects of CS, we have S(A×B)=S(A)×S(B) and V(x,y) is the concatenation of V(x) and V(y).
A determinsitic cartesian transducer is now defined as consisting of:
A transducer can accept an element of X and morphisms of signatures σ0…σn and output an element of Y and morphisms of signatures τ1…τm.
Compositional language
We can now start defining our compositional language for constructing automata. This language describes[6] automata/transducers as string diagrams. There are two types of edges: edges labeled with sets, and edges labeled with a cartesian category C and some C-signature. Vertices are labeled by transducers. Each vertex has incoming edges with labels corresponding to the transducer's input state space and input signatures, and outgoing edges with labels corresponding to the transducer's out state space and output signatures. The edges should be thought of as "wires" that conduct set elements or morphisms of prescribed signature respectively.
There is an interesting phenomenon here: string diagrams are in themselves morphisms of some cartesian category (with signature corresponding to labels of the diagram's external edges)! This allows us to get self-referential, and add a special type of vertex in our diagram, which has one input which is a diagram of the same type (with prescribed external edge labels) and some number of other inputs, and whose semantics is applying the diagram to the inputs (where the outputs and other inputs have labels matching the diagram type).
I think that without further constraints, the resulting language might be Turing complete: at the very least, it is possible to create an infinite loop by coming up with an input α and a diagram D s.t. D contains a transducer T which transforms α into D, and D runs its input through T and runs another copy of the input through the diagram T outputs. [EDIT: Actually, this requires T to produce a diagram D which contains T. Unless we allow this explicitly, it shouldn't be possible with "plain" cartesian automata. However, I think it is possible with closed cartesian automata (see below and comment).]
To avoid infinite loops, we can add a counter for the "level of meta" (i.e. the meta-vertex in diagrams of level k can only accept diagrams of level lower than k). It is also possible to make this counter an ordinal, by representing ordinals in some notation that itself naturally takes the form of trees/DAGs. Although to get a tractable language, we need to impose some strong constraints: even without the meta-vertex, we can get super-exponential blowup because each transducer can increase the size of the content of a variable by a constant factor for every input symbol. Possibly we should require our category to be equipped with some way to measure the "size" of morphisms (i.e. a functor to N, or maybe a lax functor to N regarded as a 2-category using its order), and most transducers to e.g. only increase size by O(1).
Monadic Cartesian Automata (MCA)
So far, everything was determinsitic. But, we want our hypotheses to have probabilistic and maybe also "infra-probabilistic" uncertainty. The obvious way to define probabilistic automata is to replace cartesian categories with Markov categories and replace Set with e.g. Stoch (the category of measurable spaces with Markov kernels as morphisms). However, this is poorly suited for our purpose, because (i) we a joint distribution over the outputs of all programs, and (ii) if an identical subroutine is defined multiple times, we want our uncertainty about those copies to be strongly correlated. Intuitively, want we want is allow all automata read from an additional "tape" filled with random bits. This can be naturally expressed using monads.
Fix a monad M0:Set→Set[7] that is equipped with "lax cartesian structure"[5:1]. By this we mean that it has lax monoidal structure μ0 w.r.t. the monoidal structure of Set given by products, and this structure satisfies that for any sets X,Y,Z and mappings f:X→Y and g:X→Z, we have
M0((f×g)∘ΔX)=μ0Y,Z∘(M0(f)×M0(g))∘ΔM0(X)
Here, Δ stands for the diagonal mapping.
Also, we require that μ0X,Y∘(ηX×idM0(Y)) is a strength of M0, where η is the monad unit. However, we require neither that μ0 arises from this strength nor that it makes M0 commutative (typically neither holds!) We call such a monad "memo monad".
The prototypical example is when M0(X) is the set of binary trees whose leaves are labeled by elements of X[8]. We think of such a tree as a way to select an element of X by reading a stream of bits. Monad multiplication works by attaching the label-trees to the "parent" tree at the corresponding leaves. The monoidal structure works by taking the union of two trees, where each tree is regarded as the subset of {0,1}∗ corresponding to the bitstrings that describe the paths from the root to each vertex (imagine placing one tree such that it overlaps the other), and propagating the labels to the leaves in the obvious way.
Another example is binary trees whose non-leaf nodes are labeled by symbols from some fixed alphabet Σ. We think of each element of Σ as corresponding to its own stream can be be queried indepedently. We won't spell out the monoidal structure, but imagine running two subroutines one after the other while memoizing all the queries the first subroutine made from the benefit of the second. As opposed to the first example, this monoidal structure is not symmetric.
We also need M0 to be equipped with a strong monad morphism to the monad PFM defined as follows. For any set X,
PFM(X):=∞⨆n=1(Xn×D([n]))
Here, [n] is the set {0,1…n−1} and D is the distribution monad.
We think of these as distributions over a list of outcomes of size n, with each outcome labeled by an element of X. Monad multiplication works by combining concatenation of strings with multiplication of probabilities in the natural way. Importantly, there are canonical monad morphisms from PFM to both the free monoid monad FM and the distribution monad D.
The morphism from M0 to PFM means that M0 automatically acquires monad morphisms to FM (we will denote it mFM) and D (we will denote it mD). In the case of binary trees, this morphism is defined by listing the leaves of the tree in horizontal order and assigning probabilities by postulating a fair coin at each non-leaf vertex.
To define automata, we require that our cartesian categories are equipped with a memo monad. Given a cartesian category C with memo monad M, a (C,M)-automaton is a strong monoidal functor F:C→Set together with a natural isomorhism between M0∘F and F∘M which is (i) strong monoidal (ii) makes F a morphism in the 2-category of monads on arbitrary categories (we call this a "memo functor"). Given A,B objects in C and α:A→M(B) a morphism, mD(F(α)) is a Markov kernel from F(A) (the initial state of the automaton) to F(B) (the final state of the automaton).
Notice that, given α1:A1→B1 and α2:A2→B2, mD(F(α1×α2)) has the marginals mD(F(α1×αi)), but the corresponding random variables are not independent. (Because of how mD interacts with the monoidal structure of M0.) This means we get a non-trivial joint distribution over all morphisms, which is why we introduced monads in the first place.
This formalism has an infra-Bayesian analogue as well: it requires replacing D by a the □ monad (and we know it is a monad) and replacing PFM by the corresponding infra-Bayesian monad. We won't spell out the details.
Monadic cartesian transducers can be defined, but we won't spell out the definition. The definition is spelled out for the case of closed cartesian transducers (see below), where the notation is a little simpler.
Processing DAGs with MCAs
As we discussed, programs are DAGs therefore DAGs are the starting point of any chain of transducers we might want to build. DAGs can be naturall regarded as morphisms in a free cartesian category. But how to embed them into a free cartesian category with memo monad? We do it as follows:
It would be nice to formulate this process as evaluating a string diagram in the corresponding Kleisli category, but for a memo monad the Kleisli category is not even monoidal. Maybe there is some way to make it work, but atm I don't know how.
(Monadic) Closed Cartesian Automata/Transducers
It is convenient to require that our cartesian categories are closed, since it allows simplifying the definition of the "state category" 2-functor and improve its properties. We will also require the category to have arbitrary (not just finite) products. Indeed, let C be such a category. Then we define the CS as follows:
Given A,B objects of CS, we have S(A×B)=S(A)×S(B) VA×B(x,y)=VA(x)×VB(y)
Moreover, S(BA)=S(B)S(A) VBA(f)=∏x∈S(A)VB(f(x))VA(x)
So far, we could avoid the infinite products by requiring that S(A) is finite. But, when C is equipped with a memo monad M, we also want CS to have a memo monad MS. This will require infinite sets. We define MS as follows:
S(MS(A)):=M0(S(A)) VMS(A)(x):=∏i∈[|mFM(S(A))|]VA(mFM(S(A))i)
MS is indeed a monad, but to define the memo (monoidal) structure on it, we need additional structure on mPFM. Let X and Y be two sets and consider the following two mappings f1,2:M0(X)×M0(Y)→PFM(X):
f1:=mPFM(X)∘prM0(X) f2:=mPFM(X)∘M0(prX)∘μ0X,Y
In general, this diagram doesn't commute, i.e. f1≠f2. But, we require that it "lax commutes" in the following sense.
For any set Z, PFM(Z) can be regarded as a category, in which a morphism from (u,p)∈Zn×D[n] to (v,q)∈Zm×D[m] is a mapping α:[n]→[m] s.t. the following conditions hold:
This allows us to regard f1 and f2 as functors, and we postulate a natural transformation ϕX,Y:f1→f2. We also postulate another natural transformation ϕ′X,Y related in the same way to prY (it is separate since μ0 is not symmetric). Since f1 and f2 are actually natural transformations in terms of their dependence on X,Y, we require that ϕ and ϕ′ are 2-natural transformations. Possibly there are more conditions we need to satisfy but we won't try to specify them precisely. In the case of trees/queries, ϕ,ϕ′ are defined using the observation that, knowing the result of making more queries is sufficient to predict the result of making less queries.
We can now define the monoidal structure μS on MS given the monoidal structure μ on M. Consider some A,B objects in CS. We define
S(μSA,B):=μ0S(A),S(B) VμSA,B(x,y)i:=idVA(x)ϕS(A),S(B)(i)×idVB(x)ϕ′S(A),S(B)(i)
As before, a (monadic) closed cartesian automaton is a (memo) cartesian closed functor from C to Set. A (monadic) closed cartesian transducer from C to D is a (memo) cartesian closed functor from C to DS. Given an object A of C, a transducer F can be regarded as taking morphisms α:1→A as inputs producing y and β as outputs, where:
Notice that these automata are in general not "finite" in any sense. To have a notion of "finite" automaton/transducer, we need to have a fully faithfull cartesian closed subcategory Cfin of C representing "finite" objects, which (in some sense?) generates all of C if arbitrary products are allowed. We can then define CSfin to consist of A s.t. S(A) is a finite set and for every x∈S(A), VA(x) is in Cfin. A finite automaton is a functor F s.t. F(A) is a finite set whenever A is a finite object, and a finite transducer is a functor F s.t. F(A) is a finite object whenever A is. We should probably also require that, if an infinite product ∏βAβ is a finite object, then all of Aβ are finite objects and all but a finite number of them are isomorphic to the terminal object 1. Notice that the later condition is consistent with the way we defined CSfin.
Curiously, the state category 2-functor seems to be a "commutative"[9] 2-monad. Indeed, the 2-monad multiplication corresponds to the functor P:(CS)S→CS defined as follows. Given A an object of (CS)S:
The 2-monoidal structure corresponds to the functor ν:CS×DS→(C×D)S defined as follows. Given A and object of CS and B an object of D:
This means that transducers are 1-morphisms in the corresponding Kleisli 2-category. In particular, composition of two transducers is a transducer. Moreover, an automaton is just a transducer from C to the terminal closed cartesian category. And, transducer string diagrams are essentially just string diagrams in the Kleisli 2-category.
We can use the same methods to talk about arbitrary describable facts, but I'm content with computable. ↩︎
The analogy is very rough, it is possible that there is some stronger compositional language that contains both as segments. ↩︎
This is obvious, but I haven't seen it in the literature. ↩︎
I don't know whether even this special case is already known under some other name in automata theory. ↩︎
I don't know whether there is a standard name for this in category theory. ↩︎ ↩︎
In full generality, I'm not sure the composition of transducers yields transducers (however, it certainly does for closed cartesian automata, see below), so this language might define a broader class of "automata" rather than just describing them. ↩︎
Instead of Set, we can use a different cartesian category for the state spaces of our automata, e.g. compact Polish spaces. But, we will continue the explanation using Set, for the sake of simplicity. ↩︎
Equivalently, M0(X) is the free magma on X. In fact, algebras over M0 are precisely magmas. ↩︎
Not sure what is the right term for 2-monads. ↩︎