For programmers, I think the simplest way to motivate domain theory is to think about a lazy language like Haskell. In such a language, if you have for example a list of numbers, it doesn't actually contain numbers, but rather "thunks" that when evaluated will give numbers. Some of these thunks, when you try to evaluate them, can go into an infinite loop instead of giving a number. Let's call these "bottoms". Basically any data structure can have some of its parts replaced with bottoms: a subtree of a tree can be bottom, and so on. Then we can say all functions in a lazy language are total functions from things-with-bottoms to things-with-bottoms: if your function goes into an infinite loop, we'll just say it returns a bottom.
Then we can say a piece of data X is "at least as informative" as Y, if Y can be obtained from X by replacing zero or more of its parts with bottoms. That's a partial ordering. And it's kind of obvious that any function we can write in our language will be monotone with respect to that ordering: f(X) will be at least as informative as f(Y).
So basically domain theory is what you get if you ask yourself "what if data structures could contain bottoms" and turn the abstractness crank a few times. In a lazy language it applies pretty naturally; in a strict language it's a bit more awkward, because things-with-bottoms don't exist at runtime to be passed around; and in a total language it's unnecessary, because all functions are already total.
So, domain theory is a fairly abstract branch of math which is about giving semantics to weird recursive constructions in computer science, in the form of partially ordered sets with additional structure. I'm still learning the parts of it which regard building a link to explicit computable rules that can be implemented in a programming language, and it's not easy at all to learn. Takes a lot of persistence.
However, the parts I have learned so far seem worth explaining more widely, due to the ability to pull off some very unique fixpoint constructions in domain theory that are very hard to do in any other area of math. The classical example is showing that there are nontrivial models of the untyped lambda calculus. Lambda terms can act as functions from lambda terms to lambda terms, but it's awfully hard to explicitly come up with a space Λ that's isomorphic to the space of functions [Λ→Λ], except for the single-point space, due to cardinality constraints. We must work with a restricted notion of function space in order to dodge the cardinality constraints.
Also, in game theory, we can view a policy as a function from the opponents policy to a probability distribution over our own actions. This was the subject of the lawvere problem a while ago that was solved by reflective oracles. Specifically, you'd want a pair of spaces π1≅[π2→[0,1]], π2≅[π1→[0,1]]. (and then figure out what sort of computable construction corresponds to the thing you just built, if one exists).
If you want to have some seemingly-impossible type signature, it seems worthwhile to see what arsenal of tools domain theory has to let you construct such a space. So this will be a quick rundown of chapters 1-5 in these domain theory notes. http://www.cs.bham.ac.uk/~axj/pub/papers/handy1.pdf
The Basics:
Domains are a special type of partially ordered set, where the standard ≥ order roughly corresponds to "information content". Now, given an arbitrary partially ordered set, sup and inf (sup roughly corresponds to the smallest batch of information that contains all the information present in the elements of the set you're taking the sup of, and inf corresponds to the largest batch of information such that every element of the set has more information than that) may not exist for certain subsets. As a toy example, take the poset of three elements that looks like a V. The set consisting of the top two elements has an inf, but it doesn't have a sup. If we add an element on top to turn the poset into a diamond shape, then all subsets have a sup and inf.
In a domain, we don't require the existence of arbitrary sup and inf (that's called a complete lattice), and we don't even require the existence of sup and inf for arbitrary finite sets (that's called a lattice). What we do require is that every directed set has a sup. A poset which fulfills this is called a dcpo (directed-complete partial order).
What's a directed set? It's a nonempty set A, where, for all x,y∈A, there's a z∈A s.t. z≥x,z≥y. You can always find upper bounds (not necessarily the supremum, though!) for any finite subset of a directed set, within the directed set. Consider the poset given by the natural numbers, where the ≥ ordering is given by the standard ≥ ordering on natural numbers. This looks like a chain that starts at 0 and extends forever upwards. This fails the requirement that every directed set has a sup! Because N is a directed set, but there's no sup for it. If we add a single element on the top corresponding to ω, which is ≥ everything, then every directed set has a sup again.
So, one requirement for being a domain is that every directed set has a sup. Or, in other words, given some arbitrary batch of information-states where for every two information-states, there's a third one incorporating all the information in both of them (and maybe some more info), there must be a minimal information-state incorporating all the information from the whole batch. Any finite poset fulfills the "every directed set has a sup" property, but there may be interesting failures when you move to infinitely many points.
The next component is continuity. Besides the standard ≥ ordering, there's a >> ordering, which corresponds to approximation. x approximates y,x<<y, iff, for all directed sets A where sup(A)≥y, there's a z∈A s.t. z≥x. As an example, if we take the space [0,1] and equip it with the usual ≥ order to turn it into a poset, x>>y (in the poset) corresponds to x>y (in the standard number ordering). Transitivity holds, and antisymmetry holds. But reflexivity doesn't necessarily hold, as the previous example shows. An element that approximates itself, x>>x, is called compact.
A continuous dcpo is one where, for all points x, x=sup{y|y<<x}. This set happens to be directed (though this fact isn't obvious), and in short, it means that any information-state can be described as the sup of information states that approximate it.
An example of a non-continuous dcpo is: Have two copies of the natural numbers, and add an ω that's above both of those chains, so they're "glued together at infinity". No numbers (in either of the chains) approximate ω, because for any point in one of the chains, you can take the entire other chain, and that's a directed set with a sup of ω, but nothing in that directed set is above the point you picked. So ω can't be built by taking the sup of stuff that approximates it, because there is no such stuff.
If, for all points x, x=sup{y|y<<x,y<<y} (ie, every element is the sup of the compact elements below it), then it's called an algebraic dcpo. These are nicer to work with.
A domain (as defined in the linked notes) is a continuous dcpo (every element can be built as the sup of stuff that approximates it, and all directed sets have a sup). But there's one more condition that shows up so often that I think it should just be folded into the definition of a domain, because if you drop it, an awful lot of useful stuff stops working.
Pointedness. Specifically, your domain must have an element ⊥ that is below everything. The typical interpetation of this is something that loops forever. So I'll define a domain as a continuous dcpo with a ⊥.
Functions:
We only consider continuous functions between domains. Specifically, a continuous function is one which fulfills the following two properties.
x≤y→f(x)≤f(y)
f(supA)=supa∈Af(a)
In other words, the function should preserve the information ordering (increasing the information content of the input increases information content of the output), and also it shouldn't matter whether you take the sup of a directed set first and then send it through f, or whether you send your directed set through f and take the sup of the image (which is also a directed set by the first condition)
[D→E] is the space of all continuous functions from a domain D to a domain E, and it fulfills a lot of the conditions to be a domain, though not necessarily continuity. The ≥ ordering on functions is given as follows: f≥g↔∀x:f(x)≥g(x). (ie, f always produces a more informative result than g on the same input)
The notion of sup in the function space for directed collection of functions is as follows: sup(F)(x)=supf∈Ff(x). The image in the second half is directed because F is a directed set of functions, thus showing that the result is well-defined, and then you can put in some extra work to show that sup(F) is also a continuous function as well, so the function space is a dcpo.
The bottom element in the function space is the function f where ∀x:f(x)=⊥E, the maximally uninformative function that just loops forever on all inputs.
The approximation order << in the function space might be quite badly behaved, however. The function space might not even be a domain. In order to get the function space [D→E] to be a domain, you have to impose extra conditions on either D or E. There are several possible extra conditions you can impose which get the function-space to be a domain (or algebraic domain), and also get the function-space to have the same property. So when we jump up to the category theory level, we typically work in some subcategory of domains that is closed under taking the function space.
One neat property of domains is that every continuous function from a domain to itself has a least fixed-point below all the other fixed points, and this least fixed-point also has a nice pretty description. The least fixed point of f∈[D→D], is given by the sup of the directed set A with elements: ⊥,f(⊥),f(f(⊥)),f(f(f(⊥)))...
⊥≤f(⊥) because ⊥ is below everything. x≥y→f(x)≥f(y) so repeatedly applying this shows that each element in the chain is ≥ the element below it. So it's a directed set and has a sup. The image of A after applying f is just A itself (minus the bottom element). By continuity of f, f(supA)=supa∈Af(a)=supn∈Nfn+1(⊥)=supA. So supA is a fixed point of f.
Also, the function fix:[D→D]→D that maps a function to its fixed point is continuous.
There's also a category-theory analogue of this same result, but it's quite a bit more complicated.
Quick note on notation:
The standard notation for composition of functions, f∘g, doesn't mesh well with intuition, because you have to read it backwards. First g is applied, then f is applied. This can be dealt with acceptably when you aren't composing that many functions, but later on in domain theory, you work up to some really complicated function compositions, so making up a notation that puts the functions the proper way around empirically made proofs vastly easier for me. From now on, f;g;h will be a notation for "first apply f, then g, then h". It's the same as h∘g∘f, but without the need to take ten seconds to mentally reverse the order whenever it shows up.
Embeddings and Projections:
An embedding-projection pair between a pair of domains D and E is a pair of continuous functions e:D→E,p:E→D, s.t. e;p=idD, and p;e≤idE. In other words, E is richer than D and has more information states available, so you can embed E into D. Crunching E back down to D via the projection, recovers D exactly (ie, the embedding doesn't identify different things, it's injective). However, projecting E down to D, and then embedding back into E destroys information, so p;e(x)≤x, which is equivalent to p;e≤idE.
These are nice because an embedding uniquely determines a projection, and vice-versa. If you've got an embedding, it's usually pretty easy to come up with a natural candidate for a projection, and once you verify the two defining properties of that pair of functions, you know there's nothing else you could substitute in (note that there may be many embedding/projection pairs! But once you fix one of the two, that uniquely determines the other part).
This comes in handy once we get up to the category theory level, because lots of times when you're drawing a diagram you're like "damn, I need this arrow to go the other way", but if the arrow is an embedding or projection, you can just automatically get an arrow going the other way by going "take the unique projection/embedding corresponding to the embedding/projection I already have". Normally you'd need an isomorphism to pull off the "reverse the arrows" trick, but embeddings and projections also let you do that. Also all isomorphisms are embeddings and projections.
Also, embeddings and projections have the nice property that they have to be strict (ie, they must both map ⊥ to ⊥).
Category Theory Constructions:
So, there's several things you can do to a domain to get another domain. We already know about function-space, but there are others.
Cartesian product, ×, is simple. It's the domain of pairs where (x,y)≥(z,a) iff x≥z,y≥a.
These get you a cartesian closed category, if you are working in a sufficiently nice subcategory of domains where the function-space is forced to be a domain too and have the defining property of the subcategory.
Considering the subcategory of domains where the only morphisms are strict functions (functions which map ⊥ to ⊥), our candidate for the function space would be the strict function space, [D→E]⊥!, of all functions which map ⊥ to ⊥.
This doesn't harmonize with the cartesian product. However, there's another product. The smash product, ⊗. It's like cartesian product, but all pairs with a ⊥ element in one of the coordinates are identified as the same point, the ⊥ element of the product. Or, in other words, if one of the components of the pair loops forever, it's just classified as the loop-forever bottom element of the smash product.
Strict function space and smash product get you a monoidal closed category.
There's lifting, which is taking D and making a new domain D⊥ by sticking a single new ⊥ element below everything in D.
There's something called the coalesced sum, ⊕, which corresponds to the coproduct in category theory. It's done by taking two domains D and E, and identifying their bottom elements, which glues them together at the bottom point. The coalesced sum of the three-element domain that looks like a V, with itself, is a domain with 4 incomparable points, and a bottom element below them all.
So, function space, strict function space, cartesian product, smash product, coalesced sum, and lifting, are our basic building blocks.
There's one more rather complicated one, the bilimit, which plays a key role in generating fancy domains with seemingly-impossible type signatures.
The Bilimit:
Consider an infinite chain of domains indexed by the natural numbers, D0,D1,D2... where, for all n, there's an embedding en of Dn into Dn+1. Call this an expanding system. In other words, as you go further up the chain, the domains get more complicated and keep adding extra points while preserving the existing structure. This is the category-theory analogue of a chain (a type of directed set), analogous to the chain we made to construct a fixed point of a function. It's then reasonable to ask "what's the analogue of sup in this context?"
A composition of embeddings is an embedding (the projection is given by composing the corresponding projections), so we've also got embeddings from any domain in this sequence to any higher domain, and projections from any domain in this sequence to any lower domain.
There's a very special domain called the bilimit, which is both a limit of the diagram with the projections, and a colimit of the diagram with the embeddings. Or, in other words, embedding Dn into B does the same thing as embedding Dn into Dm, m>n, and embedding Dm into B. Same with projection. It doesn't matter whether you project B into Dn, or project B into Dm and then into Dn. If something else is a limit or colimit, you can embed (or project) B into that thing as needed.
The bilimit can be thought of as the infinite product of the domains (equipped with the obvious ordering), but you only keep the points of the form <x0,x1,x2...> where pn(xn+1)=xn. In other words, a point in the bilimit corresponds to some infinite sequence of points/ever-more-detailed information states from the ever-more-detailed domains, and crunching the m'th domain (more detailed) down to the n'th domain (less detailed) maps the m'th coordinate (more detailed information state, ) down to the n'th coordinate (less detailed information state). The bilimit of a finite sequence would just be the last domain in the sequence, because once you've got the final coordinate, that fixes everything else. The bilimit of an infinite sequence is like the "completion" of the process of the domains getting more and more detailed.
To project from the bilimit to one of the finite stages, just take the appropriate coordinate, that gives you the point to map to. To embed from one of the finite stages to the bilimit, just project back to get points for the earlier stages, and embed forwards to get points for the later stages.
If you take a point x in the bilimit B, projecting to Dn and then embedding back into B gets you a point y≤x, because project-then-embed moves points down. We can consider the batch of points in B given by projecting-then-embedding into some finite stage. This is a directed set, and the sup is x itself. (ie, to get a close approximant for x, just project into some really distant domain and then embed back, that gets you a long initial sequence of coordinates that match up with x itself)
This bilimit is the analogue of sup, on a category-theory level.
Since we're working in some subcategory of domains in order to make sure the function spaces work out alright, we'd also want our defining property of the subcategory to be closed under bilimits as well. There are indeed some conditions you can impose so you're working in a subcategory closed under bilimits.
The nicest and most convenient such subcategory is the category of bifinite domains. There's a really complicated definition of them, but after putting in a lot of work, you can show that something is bifinite iff it's the bilimit of some expanding system of finite pointed posets. (ie, every bifinite domain can be written as the bilimit of an infinite sequence of finite partially ordered sets which all have a bottom element). Another cool note is that these are all algebraic (every point x can be made by taking the sup of elements that approximate themselves that are below x). The compact elements of a bilimit are those which can be made by taking a point at some finite stage and embedding it into the bilimit (ie, a point is compact only if it requires finitely much information to specify, you can complete the rest of the information by just embedding it forward forever once you get far enough)
The problem is that, since you've gotta make it as a limit of finite domains, that impairs your ability to incorporate probabilities into things, since there are infinitely many probabilities. Incorporating probabilities into domains is one of the things on my to-learn list.
Category Theory + Domain Theory:
To complete the analogy and get an analogue of the least-fixpoint theorem, but for domains, we need an analogue of a continuous function (we have an analogue of sup in the form of the bilimit).
In this case, the analogue of a continuous function would be a continuous functor (more on what makes a functor continuous later). For example, consider the functor F s.t. F(D)=[D→X]. A fixpoint of F would be a domain D that's isomorphic to [D→X]. We also want our functor to map embeddings to embeddings for the upcoming application, so if there's an embedding e:D→E, we need F(e) to be an embedding from [D→X] to [E→X]. If you've got a functor with multiple inputs, then it needs to map tuples of embeddings to embeddings. Consider the cartesian product functor where F(D,E)=D×E. Given embeddings e:D→D′ and e′:E→E′, F(e,e′) needs to be an embedding from D×E to D′×E′.
Let's do this for the function space, as that's the most complicated one.
Exercise: Do this for all the other constructors, like lifting and cartesian product and coalesced sum.
We have domains D,D′,E,E′, and embeddings e:D→D′ and e′:E→E′. We want F(e,e′) to be an embedding from [D→E] to [D′→E′]. Well... we have a function f from D to E and we're trying to get a function from D′ to E′. Using f, then e′, we can get from D to E′. But we have to somehow get from D′ to D, but we only have an embedding going the other way...
But it's an embedding, so we can just use the unique projection p, which goes the other way! This is what I was talking about with restricting to embeddings so you can get your arrows going the other way when the need arises.
Specifically, our candidate for the embedding is: F(e,e′)(f)=p;f;e′, and our candidate for the projection going the other way is to map g:D′→E′ to e;g;p′. Let's call this function P(e,e′) for later use. It's a bit of work to show that these are continuous functions, so I'm gonna skip that part, but I will show the part about these two things fulfilling the relevant conditions for embeddings and projections.
Condition 1: embedding then projecting had better recover your original point.
F(e,e′);P(e,e′)(f)=P(e,e′)(p;f;e′)=e;p;f;e′;p′=idD;f;idD′=f
by the definition of our embeddings and projections, and embed-then-project giving you identity.
Condition 2: projecting then embedding had better produce a lower point.
P(e,e′);F(e,e′)(g)=F(e,e′)(e;g;p′)=p;e;g;p′;e′
p;e;g;p′;e′(x)≤g;p′;e′(x)≤g(x) so p;e;g;p′;e′≤g
(again, by project-then-embed giving you a lower point, for the inequalities)
So our functor does map pairs of embeddings to embeddings.
Now do it with the other building blocks.
Anyways, returning to the original thing, we should go over the definition of continuity for a functor.
For an expanding sequence with bilimit B, applying the functor to the sequence gives you a different expanding sequence (because it maps embeddings to embeddings). This has a bilimit B′.
A functor is continuous if F(B)≅B′. In other words it doesn't matter if you take the bilimit and shove it through the functor, or shove your sequence through the functor, and then take the bilimit of that. Swapping out "functor" for function, and "bilimit" for sup, we can see the close parallel to continuity of functions.
It's kinda hard to verify this condition, but there's a stronger one called local continuity which is a lot easier to check and implies continuity. For all the basic building blocks of domains (product, function space, lifting, coalesced sum, etc...), and compositions of them, the associated functors are continuous, which is quite nice.
Yo Dawg I Heard You Like Fixed Points:
Let's momentarily just look at continuous functors that only take one input, not multiple. Like the functor mapping Λ to [Λ→Λ], as a concrete example. We're going to be trying to complete the analogy to the original fixpoint theorem. We start with ⊥, and keep applying our function to it to build up a chain, and then take the sup of the chain, and that's the minimal fixed point. So what's the analogue of ⊥ in a category-theory sense?
It's just the domain consisting of a single point, call it I. And there's only one possible embedding from that to whatever domain you want, the embedding which maps the single point to the bottom element in the target domain.
So our expanding sequence is I,F(I),F(F(I)),F(F(F(I))).... and for the embeddings, we've got our unique embedding e from I into F(I), so F(e) is an embedding from F(I) into F(F(I)), and F(F(e)) is an embedding from F(F(I)) into F(F(F(I)))... Ok, we have our embeddings, and thus our projections. Now we take the bilimit B. By functor continuity, F(B)=bilimit of the expanding sequence shoved through F. But shoving our expanding sequence through F just gives us the same sequence, but with I clipped off. This doesn't change the bilimit at all.
So, as desired, F(B)≅B. And this is how you cook up mathematical spaces X isomorphic to [X→X]. Or if you want a space X that's isomorphic to X×D, that also can be done.
Now, if you're attentive, you'll have noticed that this construction just gives you a single point as your canonical solution to X≅[X→X]. This can be fixed by starting with something other than a single point (like a domain with two points, a top and a bottom), and then specifying a custom embedding from that into F(2-point domain), giving you a full-blown model of the lambda calculus. But apparently the least fixed-point/canonical solution grants you extra-nice properties. I don't know what those are, since I haven't gotten to that chapter, but apparently you have more powerful tools available if you're working in the least-fixed-point domain instead of some other fixed-point.
There was a paper by Abramsky (here) that looked at something called the lazy lambda calculus, which turns out to be exactly the least-fixed-point solution of X≅[X→X]⊥, which I'm still digesting. This is notable because adding an extra point at the bottom lets you get away from a single point. We start with the one-element domain, take the function space (still one-element domain), add a bottom element (ok, now we have two elements), take the function space of that (now three elements for the three possible continuous functions), add a bottom element, and from there it gets waay more complicated.
The analogue of the isomorphism between X and [X→X]⊥ in the computational sense is that you can take a lambda term (element of X), work on it until you get it into something called weak head normal form (a form like λx.M), and then it's a function from lambda terms to lambda terms. The bottom element comes in because maybe the lambda term doesn't reduce to weak head normal form, and then the reduction process just loops forever, and that's the bottom element (because bottom elements correspond to stuff that just loops forever or doesn't return an output)
Now, with tupling and some more work, it's possible to even solve multiple fixpoints simaltaneously. If you've got continuous functors F1,F2,F3, then there's a least-fixpoint solution to:
A≅F1(A,B,C)
B≅F2(A,B,C)
C≅F3(A,B,C)
So that's what domain theory can do. You can set up a bunch of isomorphisms that you want, of the form "these domains should be isomorphic to these other domains that can be defined in terms of the basic building blocks like function space and smash product", and cook up a canonical solution. An example I'm particularly interested in is (V is the three-element domain that looks like a V):
A≅[B→A]⊕V
B≅[A→B]⊕V
This corresponds to two domains of strategies, where a strategy is a continuous function from their strategy to your strategy, or it's one of two possible actions (or loop forever). So you can take two points in these, and repeatedly feed them to each other to either get looping forever, or a concrete action (the "repeatedly play the strategies against each other" function is a continuous function from A×B to V×V, though that's a bit tricky to show). This seems kinda like the full domain-theory version of Vanessa's metathreat hierarchy without the restriction to operate on some fixed finite level, though I'm not sure how to get probabilities in there.
More later on how to get those domains into something you can actually work with on a computer, once I learn that.