Animating categories
Change of enrichment along a systems theory.
\[ \def\Cs{\mathscr{C}} \def\Ds{\mathscr{D}} % \def\Ca{\mathcal{C}} \def\Da{\mathcal{D}} \def\Ea{\mathcal{E}} \def\Ia{\mathcal{I}} \def\Ma{\mathcal{M}} \def\Sa{\mathcal{S}} % \def\Lb{\mathbf{L}} % \def\Cat{\mathbf{Cat}} \def\BLens{\mathbf{BayesLens}} \def\BLensP{\BLens_{\rightleftharpoons}} \def\BCilia{\mathbf{BayesCilia}} \def\Lens{\mathbf{Lens}} \def\MonCat{\mathbf{MonCat}} \def\Poly{\mathbf{Poly}} \def\Org{\mathbf{Org}} \def\Set{\mathbf{Set}} % \def\Coalg{\mathsf{Coalg}} \def\Moore{\mathsf{Moore}} % \def\Kl{\mathcal{K}\mspace{-2mu}\ell} \def\coKl{\mathrm{co}\Kl} % \def\deloop{\mathbf{B}} \def\id{\mathsf{id}} \def\mdash{{\hbox{-}}} \def\proj{\mathsf{proj}} \def\wtilde{\widetilde} % \let\op=\relax \def\op{^{\,\mathrm{op}}} \] It can sometimes be helpful to animate categories, to render their morphisms dynamical. This article explains how, given a systems theory and a suitably-enriched category, the latter may be animated by the former, yielding a bicategory whose 1-cells are systems that “emit morphisms”. The construction is exemplified by cilia: dynamical systems that control lenses. A further step of enrichment then yields, and generalizes, the “dynamic categories” of Shapiro and Spivak (2023).
1 The recipe
Open dynamical systems typically collect into systems theories (Myers 2022), categories \(\Sa:\Ia\to\Cat\) indexed by a monoidal category \(\Ia\) describing their interfaces. The simple idea behind categorical animation is that, given a systems theory \(\Sa\) and a category \(\Ca\) enriched in its category of interfaces \(\Ia\), we can change its enrichment along \(\Sa\) to obtain a bicategory \(\wtilde \Ca\), the animation of \(\Ca\).
Definition 1 (Indexed enrichment; animation) Given a monoidal indexed category \(\Sa:(\Ia,\otimes,I)\to(\Cat,\times,\mathbf{1})\), we say that \(\wtilde \Ca\) is \(\Sa\)-enriched if it is obtained from an \(\Ia\)-enriched category \(\Ca\) through change of enrichment along \(\Sa\). If \(\Sa\) indexes categories of dynamical systems, then we say that \(\wtilde \Ca\) is the \(\Sa\)-animation of \(\Ca\) and we call it an animated category.
Remark. Note that systems theories are more generally understood as indexed by a double category, rather than a mere monoidal category. The present recipe could indeed be recapitulated in this setting, using the notion of enrichment in a double category (cf. Leinster 2002), but we will not need the extra generality.
2 Examples
Let us first introduce a couple of systems theories. In each case, we will sketch the category of interfaces, before presenting the systems indexed by them. We will focus here on ‘lens’-like interfaces, but the recipe does not demand this.
2.1 Lenses, Moore machines, and all (monoidal) categories
For our first set of examples, we will see how all (monoidal) categories may be systematically animated, to give bicategories whose 1-cells may ‘learn’ from observing their inputs. We begin with a category of interfaces.
Example 1 (Lenses)
- Objects. The objects of the category \(\Lens(\Set)\) are pairs \((O,I)\) of sets. We can think of these as representing the input-output interface of a system: \(I\) being its set of possible inputs and \(O\) its set of possible outputs.
- Morphisms. A morphism, or lens, \(l:(O,I)\to(O',I')\) consists of a pair \((l_1,l^\sharp)\) of functions: one ‘forwards’ \(l_1:O\to O'\) and one ‘backwards’ \(l^\sharp:O\times I'\to I\).
- Identities. The identity lens \((O,I)\to(O,I)\) is given by the pair \((\id_O,\proj_I)\) where \(\id_O\) is the identity map on \(O\) and \(\proj_I:O\times I\to I\) is the \(I\)-projection from the product \(O\times I\).
- Composition. Lenses \((f_1,f^\sharp):(A,B)\to(S,T)\) and \((g_1,g^\sharp):(S,T)\to(X,Y)\) compose to give \((g_1\circ f_1, f^\sharp\circ g^\sharp_{f_1}):(A,B)\to(X,Y)\), where \(g_1\circ f_1\) is the evident composite map \(A\xrightarrow{f_1}S\xrightarrow{g_1}X\) and \(f^\sharp\circ g^\sharp_{f_1}\) is defined by \[A\times Y \xrightarrow{\delta_A\times\id_Y} A\times A\times Y \xrightarrow{\id_A\times f_1\times\id_X} A\times S\times Y \xrightarrow{\id_A\times g^\sharp} A\times T \xrightarrow{f^\sharp} B\] where \(\delta_A:A\to A\times A\) is the copying map \(a\mapsto(a,a)\).
- Monoidal structure. \(\Lens(\Set)\) is a monoidal category. The monoidal unit is the pair \((1,1)\) of singleton sets. The monoidal product \(\otimes\) is given on objects by taking products of sets: \((O,I)\otimes (O',I') = (O\times O',I\times I')\); on lenses, it is similar (up to a reordering of objects in the domain of the backward map).
- ‘Wiring’. We can understand lenses as wiring diagrams (Schultz, Spivak, and Vasilakopoulou 2020, Remark 2.2.2), describing the nesting of systems. For example, a lens \((A,B)\otimes(S,T)\to(O,I)\) tells us how to wire the interfaces \((A,B)\) and \((S,T)\) together yielding an interface \((O,I)\): the forwards map takes the ‘inner’ outputs in \(A\times S\) and returns an ‘outer’ output in \(O\); the backwards map tells us how to obtain the inner inputs \(B\times T\) from the outer input \(I\) and the inner outputs.
A simple theory of systems on such wiring-diagram-like interfaces is the theory of Moore machines.
Example 2 (Moore machines)
A Moore machine with interface \((O,I)\) is a triple \((S,\vartheta^o,\vartheta^u)\), where \(S\) is a set, and where \(\vartheta^o:S\to O\) and \(\vartheta^u:S\times I\to S\) are functions.
We think of \(\vartheta^u\) as determining the dynamics (or ‘transitions’) of the system, and \(\vartheta^o\) as yielding the system’s outputs.
Moore machines over \((O,I)\) with state space \(S\) are equivalently lenses \((S,S)\to(O,I)\). They form the objects of a category \(\Moore(O,I)\).
If \(\vartheta:(S,S)\to(O,I)\) and \(\vartheta':(S',S')\to(O,I)\) are both objects of \(\Moore(O,I)\), then a morphism \(\varphi:\vartheta\to\vartheta'\) is a function \(\varphi:S\to S'\) that “commutes with the dynamics”, in the sense that the following diagrams commute:
Morphisms of Moore machines compose as functions.
\(\Moore\) constitutes an indexed category \(\Lens(\Set)\to\Cat\). If \(l:(O,I)\to(O',I')\) is a lens, then the reindexing functor \(\Moore(l):\Moore(O,I)\to\Moore(O',I')\) acts by post-composition. That is, since a Moore machine over \((O,I)\) is itself a lens \(\vartheta:(S,S)\to(O,I)\) for some \(S\), the composite \((S,S)\xrightarrow{\vartheta}(O,I)\xrightarrow{l}(O',I')\) is then a Moore machine over \((O',I')\). (Note that \(\Moore(l)\) acts on morphisms as the identity.)
\(\Moore\) is additionally lax monoidal, giving a monoidal indexed category \(\bigl(\Lens(\Set),\otimes,(1,1)\bigr)\to(\Cat,\times,\mathbf{1})\).
- The laxator \(\lambda\) acts to place Moore machines in parallel, with components \(\lambda_{(O,I),(O',I')}:\Moore(O,I)\times\Moore(O',I')\to\Moore\bigl((O,I)\otimes(O',I')\bigr)\) which act by the monoidal product \(\otimes\) of lenses: given systems \(\vartheta:\Moore(O,I)\) and \(\vartheta':\Moore(O',I')\), form their product \(\vartheta\otimes\vartheta'\) and observe that it has the required interface \(\bigl((O,I)\otimes(O',I')\bigr)\); on morphisms \(\varphi\) and \(\varphi'\), take their product \(\varphi\times\varphi'\).
- The unit \(\epsilon:\mathbf{1}\to\Moore(1,1)\) picks the identity lens on \((1,1)\); then both its state space and interface are trivial.
Every (locally small1) category can be trivially turned into a category enriched in \(\Lens(\Set)\), because there is a monoidal embedding \(\Set\hookrightarrow\Lens(\Set)\).
Proposition 1 There is an embedding of \(\Set\) into \(\Lens(\Set)\), given by mapping each set \(A\) to the pair \((A,1)\) and each function \(f:A\to B\) to the lens \((f,!):(A,1)\to(B,1)\), where \(!\) is the unique map \(A\times 1\to 1\) into the terminal set \(1\). This embedding is moreover strong monoidal, since there is an isomorphism \((A,1)\otimes(B,1) \cong (A\times B, 1)\) in \(\Lens(\Set)\) natural in \(A\) and \(B\).
Corollary 1 The monoidal embedding \(\Set\hookrightarrow\Lens(\Set)\) yields a change-of-enrichment functor \(\Cat\to\Lens(\Set)\mdash\Cat\), where \(\Lens(\Set)\mdash\Cat\) is the 2-category of categories enriched in \(\Lens(\Set)\). Given any (locally small) category \(\Ca\), we obtain a \(\Lens(\Set)\)-category \(\Ca_\Lb\) as follows. The objects of \(\Ca_\Lb\) are those of \(\Ca\). For each pair of objects \(a,b:\Ca_\Lb\), the hom interface \(\Ca_\Lb(a,b)\) is \(\bigl(\Ca(a,b),1\bigr)\). Identities are given by the lenses \((1,1)\xrightarrow{(\id_a,!)}\bigl(\Ca(a,a),1\bigr)\). Composition is given, for each triple of objects \(a,b,c:\Ca\), by the lenses \[\bigl(\Ca(b,c),1\bigr)\otimes\bigl(\Ca(a,b),1\bigr)\xrightarrow{(\circ_{a,b,c},!)}\bigl(\Ca(a,c),1\bigr)\] where \(\circ_{a,b,c}\) denotes the composition function in \(\Ca\).
Now that we have categories enriched in our category of interfaces \(\Lens(\Set)\), we can apply the recipe of Definition 1 to animate those categories accordingly.
Corollary 2 Each locally small category \(\Ca\) yields an animated category \(\wtilde\Ca_1\). The objects of \(\wtilde\Ca_1\) are those of \(\Ca\). The hom-categories \(\wtilde\Ca_1(a,b)\) are given by \(\Moore\bigl(\Ca(a,b),1\bigr)\).
We write \(\wtilde\Ca_1\) (with the subscript) because the systems of this bicategory have trivial inputs: all they can do is output morphisms in \(\Ca\) and ‘autonomously’ update their states. For a more interesting animation, we need a different \(\Lens(\Cat)\)-category: we can’t just use the earlier embedding. Let us therefore do something just a little more sophisticated, repurposing the notation \(\Ca_\Lb\). This extra sophistication will require extra structure from \(\Ca\): we will now ask for it to be monoidal2.
Proposition 2 Given a monoidal category \((\Ca,\otimes,I)\), there is a \(\Lens(\Set)\)-category \(\Ca_\Lb\) as follows.
- The objects of \(\Ca_\Lb\) are the objects of \(\Ca\).
- Given objects \(a,b:\Ca_\Lb\), the hom interface \(\Ca_\Lb(a,b)\) is defined to be \(\bigl(\Ca(a,b),\Ca(I,a)\bigr)\).
- Identities are given by the lenses \((1,1)\xrightarrow{(\id_a,!)}\bigl(\Ca(a,a),\Ca(I,a)\bigr)\).
- For each triple of objects \(a,b,c:\Ca_\Lb\), define a composition lens \[\bigl(\Ca(b,c),\Ca(I,b)\bigr) \otimes \bigl(\Ca(a,b),\Ca(I,a)\bigr) \xrightarrow{(\circ_{a,b,c},\circ_{a,b,c}^\sharp)} \bigl(\Ca(a,c),\Ca(I,a)\bigr)\] by taking as the forwards component the composition function \(\circ_{a,b,c}\) from \(\Ca\) and defining \[\begin{aligned} \circ_{a,b,c}^\sharp : \Ca(b,c) \times \Ca(a,b) \times \Ca(I,a) &\to \Ca(I,b) \times \Ca(I,a) \\ (g,f,x) &\mapsto (f\circ x, x) \end{aligned} \; .\]
Remark. We expect the foregoing mapping to form part of a pseudofunctor \(\MonCat^\sharp\to\Lens(\Set)\mdash\Cat\), where \(\MonCat^\sharp\) is the 2-category of monoidal categories, monoidal cofunctors, and monoidal cotransformations.
Applying the recipe to the resulting lens-categories \(\Ca_\Lb\) gives us animated categories whose systems may now change their behaviour in response to their history of inputs; in a rudimentary sense, they may learn.
Corollary 3 Each monoidal category \((\Ca,\otimes,I)\) yields an animated category \(\wtilde\Ca\). The objects of \(\wtilde\Ca\) are those of \(\Ca\). The hom-categories \(\wtilde\Ca(a,b)\) are given by \(\Moore\bigl(\Ca(a,b),\Ca(I,a)\bigr)\). Horizontal composition is given by the functors \(\Moore(\circ_{a,b,c},\circ_{a,b,c}^\sharp)\) for each triple of objects \(a,b,c\).
Remark. Indeed, a monoidal category yields more than a plain bicategory \(\wtilde\Ca\): the monoidal structure on \(\Ca\) also lifts to \(\wtilde\Ca\). Moreover, as indicated above, categorical systems theory may prompt us to upgrade from bicategories to double categories. But we won’t deal with these extra structures here.
A 1-cell \(a\to b\) in \(\wtilde\Ca\) is therefore a Moore machine which outputs morphisms \(a\to b\) in \(\Ca\) and updates on the basis of inputs in \(\Ca(I,a)\): that is, the inputs are states of the domain object \(a:\Ca\). We can therefore think of such 1-cells as morphisms in \(\Ca\) that can observe their “invocation history” and update themselves accordingly, with outputs that may depend on some internal state.
In the special case that \((\Ca,\otimes,I) = (\Set,\times,1)\), a 1-cell \(A\to B\) in \(\wtilde\Set\) can be equivalently presented as a triple \((S,\vartheta^o,\vartheta^u)\) where \(S\) is a set, \(\vartheta^o\) is an ‘output’ function \(S\times A\to B\), and \(\vartheta^u\) is an ‘update’ function \(S\times A\to S\). Hence, we can really see that such a “dynamic function” \(A\to B\) is as sketched in the preceding paragraph. (In fact, this presentation may be obtained for a general Cartesian closed \(\Ca\).)
2.2 Polynomial coalgebras and cilia
Sometimes, it is desirable to work with richer structures than lenses and categories. For example, lenses only allow us to formalize “static” wiring of systems, but we may seek wiring that is itself dynamical; for this purpose we may use dependent lenses, a.k.a. polynomial functors (David I. Spivak 2020). Additionally, we may want our dynamical morphisms to have richer dynamics: for instance, they may themselves be ‘bidirectional’, and want inputs in both directions; or we may want transitions that are not merely deterministic and discrete-time. Fortunately, the recipe is flexible enough for all of this.
In this section, we will make use of this flexibility to define Bayesian cilia: animated categories of hierarchical inference processes.
Remark. Some of this material, in a very slightly different form, may be found in more detail in my DPhil thesis (St Clere Smithe 2023a, chap. 6).
We begin by recalling the category \(\Poly(\Set)\) and the notion of polynomial coalgebra.
Example 3 (Dependent lenses & polynomial functors)
- Objects. The objects of the category \(\Poly(\Set)\) are “polynomials of sets” in one variable, which we denote \(y\); for example, \(Ay^B + Cy + D\), where \(A,B,C,D\) are sets. Noting that \(Ay^B\) can alternatively be written \(\sum_{a:A} y^B\), we will follow David I. Spivak (2020) and write a general polynomial \(p\) as \(\sum_{i:p(1)} y^{p[i]}\), where \(p(1)\) is a set indexing the terms of the polynomial and \(p[i]\) is the exponent on the \(i\)th term \(y^{p[i]}\).
- Morphisms. A morphism \(\varphi:p\to q\) is a dependent lens. It consists of a pair \((\varphi_1,\varphi^\sharp)\) where \(\varphi_1\) is a ‘forwards’ function \(p(1)\to q(1)\) and \(\varphi^\sharp\) is a \(p(1)\)-indexed family of ‘backwards’ functions \(\varphi^\sharp_i:q[\varphi_1(i)]\to p[i]\).
- Identities. The identity morphism \(\id_p:p\to p\) is given in the forwards direction by \(\id_{p(1)}\) and, for each \(i:p(1)\), in the backwards direction by \(\id_{p[i]}\).
- Composition. Given morphisms \(\varphi:p\to q\) and \(\psi:q\to r\), their composite \(\psi\circ\varphi:p\to r\) is defined in the forward direction by \(\psi_1\circ\varphi_1:p(1)\to r(1)\) and, for each \(i:p(1)\), in the backwards direction by \(r[\psi_1\circ\varphi_1(i)] \xrightarrow{\psi^\sharp_{\varphi_1(i)}} q[\varphi_1(i)] \xrightarrow{\varphi^\sharp_i} p[i]\).
- Monoidal structure. The monoidal unit is \(y\). The monoidal product is given on objects \(p\) and \(p'\) by taking products of indexing sets and exponents, so that \[p\otimes p' = \sum_{i:p(1), i':p'(1)} p[i]\times p'[i'] \; .\] On morphisms it is similar.
- ‘Wiring’. If we think of a polynomial \(p\) as the interface to a system, then the indexing set \(p(1)\) may be understood as the set of the system’s possible configurations or outputs, and the exponent sets \(p[i]\) as the ‘inputs’ that the system accepts when it is in configuration \(i\). We can therefore think of a morphism \(p\to q\) as a wiring diagram where the mapping from outer inputs and inner outputs to inner inputs may depend on the configurations of the inner system(s); this is what David I. Spivak (2020) means by mode-dependence.
Remark. Note that there is an embedding \(\Lens(\Set)\hookrightarrow\Poly(\Set)\) given by mapping \((A,B)\) to \(Ay^B\).
Just as lenses are generalized by dependent lenses, Moore machines may be generalized by polynomial coalgebras.
Example 4 (Polynomial coalgebra)
A polynomial coalgebra is a pair \((S,\vartheta)\) where \(S\) is a set and \(\vartheta\) is a dependent lens \(Sy^S\to p\). Observe that \(\vartheta\) may alternatively be written as a pair of an output map \(\vartheta^o:S\to p(1)\) with an update map \(\vartheta^u : \sum_{s:S} p[\vartheta^o(s)] \to S\).
- Hence a Moore machine with interface \((O,I)\) and state space \(S\) is a polynomial coalgebra \(Sy^S\to Oy^I\).
Coalgebras over \(p\) form the objects of a category \(\Coalg(p)\). If \(\vartheta:Sy^S\to p\) and \(\vartheta':S'y^{S'}\to p\) are both objects of \(\Coalg(p)\), then a morphism \(f:\vartheta\to\vartheta'\) is a function \(f:S\to S'\) making the following diagrams commute:
(Note that the right-hand diagram can be read componentwise as enforcing the equality \({\vartheta'}^u_{f(s)} = f\circ\vartheta^u_s\) for each \(s:S\).) Morphisms of coalgebras then compose as functions.
\(\Coalg\) constitutes an indexed category \(\Poly(\Set)\to\Cat\). If \(\varphi:p\to q\) is a dependent lens, then the reindexing function \(\Coalg(\varphi):\Coalg(p)\to\Coalg(q)\) acts by post-composition, taking each coalgebra \(\vartheta:Sy^S\to p\) to the composite \(Sy^S \xrightarrow{\vartheta} p \xrightarrow{\varphi} q\). (On morphisms, \(\Coalg(f)\) acts as the identity.)
\(\Coalg\) is additionally lax monoidal, giving a monoidal indexed category \(\bigl(\Poly(\Set),\otimes,y\bigr) \to (\Cat,\times,\mathbf{1})\).
- The laxator \(\lambda\) places systems in parallel, with components \(\lambda_{p,p'} : \Coalg(p) \times \Coalg(p') \to \Coalg(p\otimes p')\) which act by the monoidal product \(\otimes\) of polynomials: given \(\vartheta:\Coalg(p)\) and \(\vartheta':\Coalg(p')\), form their product \(\vartheta\otimes\vartheta'\); on morphisms \(\varphi\) and \(\varphi'\), take their product \(\varphi\times\varphi'\).
- The unit \(\epsilon : \mathbf{1} \to \Coalg(y)\) picks the identity lens on \(y\).
Remark. Objects in \(\Coalg(p)\) are called coalgebras because they can be equivalently defined as coalgebras for the polynomial functor \(p\): it is not a complicated computation to check that a dependent lens \(Sy^S\to p\) is equivalent to a function \(S\to pS\), and that a morphism \(f:(S,\vartheta)\to(S',\vartheta')\) in \(\Coalg(p)\) is a \(p\)-coalgebra homomorphism, a function \(f:S\to S'\) making the following diagram commute.
Now, because \(\Lens(\Set)\) embeds into \(\Poly(\Set)\), we don’t need to recapitulate the development above of \(\widetilde{\Ca}\) — instead, we can use the extra flexibility of \(\Poly(\Set)\) to do some more interesting things.
Firstly, we note that we are not restricted to deterministic systems here. If we have a monad \(M:\Set\to\Set\), then we can work with systems whose update maps have ‘side-effects’ governed by \(M\).
Example 5 (Coalgebras with ‘effectful’ transitions) Given a monad \(M:\Set\to\Set\), we can lift it to a comonad \(W:\Poly(\Set)\to\Poly(\Set)\) as follows. If \(p := \sum_{i:p(1)} y^{p[i]}\) is a polynomial, then \(Wp\) is defined as \(\sum_{i:p(1)} y^{Mp[i]}\). If \(\varphi : p\to q\) is a dependent lens, then \(W\varphi\) has the same forward map but each component \((W\varphi)^\sharp_i\) of the backward map is defined to be \(M(\varphi^\sharp_i)\). The counit of the comonad structure on \(W\) is the identity in the forward direction and the unit of \(M\) in the backward direction; likewise the comultiplication is forwards the identity and backwards the multiplication of \(M\).
We can think of the coKleisli category \(\coKl(W)\) as being a category of dependent lenses with ‘effectful’ feedback governed by \(M\). Its objects are the objects of \(\Poly(\Set)\), but its morphisms \(p\to q\) are dependent lenses \(Wp\to q\); composition is as usual in the forward direction, but given by Kleisli composition according to \(M\) in the backward direction.
A more detailed exposition of this structure is given in Remark 6.2.19 of my thesis (St Clere Smithe 2023a, 240). In that section (§6.2), it is also shown that this construction generalizes beyond monads on \(\Set\), yielding categories \(\Poly(M)\) of polynomials with effectful feedback for monads \(M:\Ea\to\Ea\) on finitely complete categories \(\Ea\). (In the case that \(\Ea=\Set\), then \(\Poly(M) = \coKl(W)\) as above.)
Finally, by instantiating \(\Coalg\) in \(\Poly(M)\), we obtain coalgebraic systems with \(M\)-effectful transitions: componentwise, the update maps are \(M\)-Kleisli morphisms. The indexed category resulting from this instantiation may be denoted by \(\Coalg_M\) — each fibre \(\Coalg_M(p)\) is equivalent to the category of coalgebras for the composite functor \(pM\).
An important class of monadic side-effects are those governed by the distributions monad \(\Da:\Set\to\Set\), which yields stochastic transitions.
The distributions monad takes a set \(X\) to the set \(\Da X\) of finitely-supported probability distributions on \(X\) — the set of functions \(\nu : X\to[0,1]\) such that \(\nu(x) \neq 0\) for only finitely many \(x:X\) and \(\sum_{x: X} \nu(x) = 1\).
Example 6 (Markov decision process) The objects of \(\Coalg_{\Da}\) are generalized partially observable Markov decision problems (POMDPs): an object \((X,\vartheta)\) of \(\Coalg_{\Da}(p)\) consists of a deterministic output map \(\vartheta^o:X\to p(1)\) which produces the ‘observations’, along with a stochastic update or ‘transition’ map \(\sum_{x:X} p[\vartheta^o(x)] \to \Da X\) that takes a state \(x:X\) and a possibly state-dependent action in \(p[\vartheta^o(x)]\) and returns a distribution over states. (Typically a POMDP includes a ‘reward’ signal; but this may be incorporated into the observed outputs in \(p(1)\).)
Other names for generalized POMDPs include open Markov chain, stochastic dependent Moore machine, and stochastic labelled transition system.
Remark. As well as generalizing \(\Coalg\) from deterministic dynamics, we can also generalize from discrete time, to consider time measured by any monoid. This is the content of much of §6.2 of my thesis and my Applied Category Theory 2022 paper (St Clere Smithe 2023c), and the reader may check those sources for the details. With this further generalization, we obtain notions of open Markov process and continuous-time generalized POMDP.
By using \(\Coalg\), we can define more interesting animated categories: we have a single framework that lets us talk about many different kinds of dynamics. In particular, certain categories animated by \(\Coalg_{\Da}\) (for a general probability monad \(\Da\)) provide the semantics for what I have called approximate inference doctrines (St Clere Smithe 2023a, sec. 7.3).
The idea here is that an approximate inference process is abstractly characterized by a Bayesian lens, a pairing of a forwards ‘predictive’ process (or ‘model’) with a backward process that inverts it. Concretely, however, both the prediction and the inversion may require some nontrivial computation that we may describe as a dynamical system. Approximate inference doctrines formalize the translation of the abstract problem into the concrete system as a particular kind of functor, which we might think of as taking a Bayesian lens and animating it. The resulting animated categories are called cilia, because their morphisms are lenses controlled by dynamical systems. Their construction makes use of both the bidirectionality allowed by lenses and polynomials as well as the rich dynamics allowed by \(\Coalg\); as such, the construction exemplifies the ‘abundance’ of \(\Poly\) (David I. Spivak 2020).
Remark. Note that, in general, cilia do not have to be animated Bayesian lenses. We might consider generalized cilia which are any kind of lens or optic (Riley 2018; Capucci 2022), animated appropriately.
We will end this section by defining (Bayesian) cilia, which means first defining Bayesian lenses, which we will do for finitary (discrete) probability only; for more general settings, and for more exposition, the reader may consult chapter 4 of my thesis.
Definition 2 (Bayesian lenses, for finitary probability) Suppose \(X,A,Y,B\) are sets. A Bayesian lens \((X,A)\to(Y,B)\) is given by a pair \((c,c')\) of functions \(c:X\to\Da Y\) and \(c':\Da X\times B\to\Da A\).
Proposition 3 (Bayesian lenses form a category)
- The objects of \(\BLens\) are pairs \((X,A)\) of sets; the morphisms are Bayesian lenses.
- If \((c,c'):(X,A)\to(Y,B)\) and \((d,d'):(Y,B)\to(Z,C)\) are Bayesian lenses, then their composite \((d,d')\diamond(c,c')\) is given by the pair \((d_\ast c, c'_{\bar\ast} d'_c)\), where \((-)_\ast\) denotes ‘pushforward’, so that
- \(d_\ast c\) is the function \[ \begin{aligned} X &\to \Da Z \\ x &\mapsto \sum_{y:Y} d(y)\, c(x)(y) \end{aligned} \]
- \(c'_{\bar\ast} d'_c\) is the function \[ \begin{aligned} \Da X\times C &\to \Da A \\ (\pi,k) &\mapsto \sum_{h\in B} c'(\pi,h)\, d'(c_\ast\pi, k)(h) \end{aligned} \] and \(c_\ast\pi = \sum_{x\in X} c(x)\,\pi(x)\).
- The identity Bayesian lens \((X,A)\to(X,A)\) is given by \((\mathsf{id}_X,\mathsf{proj}_A)\), where \(\mathsf{proj}_A\) is the projection \((\pi,a)\mapsto a\).
Remark. Bayesian lenses are mathematically interesting because their formalize the compositional structure of Bayesian inference — both ‘exact’ (Braithwaite, Hedges, and St Clere Smithe 2023) and ‘approximate’ (St Clere Smithe 2023b) — and hence the associated “chain rules”.
Since \(\BLens\) is a category, we could use the recipes of the preceding section to animate it — but this would be naive about the information flow: a Bayesian lens \((X,A)\to(Y,B)\) takes a forward input from \(X\) and a backward input from \(B\), and produces a forward output in \(Y\) and a backward output in \(A\). To obtain a useful animation, we need to enrich \(\BLens\) to expose this bidirectionality. We will do this by defining a new category \(\BLensP\) enriched in \(\Poly(\Da)\).
Definition 3 (\(\BLensP\))
- The objects of \(\BLensP\) are those of \(\BLens\): pairs \((X,A)\) of sets.
- For each pair of objects \(\bigl((X,A),(Y,B)\bigr)\), the hom polynomial is given by \[ \BLensP\bigl((X,A),(Y,B)\bigr) := \BLens\bigl((X,A),(Y,B)\bigr)\,y^{\Da X\times B} \; . \]
- For each triple of objects \((X,A),(Y,B),(Z,C)\), there is a dependent lens witnessing composition \(\circ\), with type \[ \BLensP\bigl((Y,B),(Z,C)\bigr) \otimes \BLensP\bigl((X,A),(Y,B)\bigr) \to \BLensP\bigl((X,A),(Z,C)\bigr) \; . \] This has a forward component \(\circ_1\) which is simply given by Bayesian lens composition \[ \begin{matrix} \circ_1 & : & \BLens\bigl((Y,B),(Z,C)\bigr) & \times & \BLens\bigl((X,A),(Y,B)\bigr) & \to & \BLens\bigl((X,A),(Z,C)\bigr) \\ & & \bigl( (d,d') & , & (c,c') \bigr) & \mapsto & (d,d') \diamond (c,c') \end{matrix} \] and, for each such pair \((d,d'),(c,c')\) of Bayesian lenses, a stochastic backward component \[ \begin{matrix} \circ^\sharp_{(d,d'),(c,c')} &: & \Da X & \times & C & \to & \Da\bigl(\Da Y \times C \times \Da X \times B\bigr) \\ & & (\pi & , & k) & \mapsto & \delta_{c_\ast\pi} \otimes \delta_k \otimes \delta_\pi \otimes d'(c_\ast\pi, k) \end{matrix} \; . \] Here, \(\delta_x\) represents a Dirac delta distribution on \(x\) and \(\otimes\) denotes the formation of a joint distribution. Hence we think of \(\circ^\sharp\) as taking the forwards and backwards inputs to a composite Bayesian lens, and returning the forwards and backwards inputs to the constituent lenses. The forwards input for \((d,d')\) is thus sampled from the output of \(c\) (\(c_\ast\pi\)); and the backwards input for \((c,c')\) is sampled from the output of \(d'\), in its compositional context.
- For each object \((X,A)\), we also have an ‘identity’ dependent lens \[ \id_{(X,A)} : y \to \BLensP\bigl((X,A),(X,A)\bigr) \; . \] Its forward component is a function of type \[ 1 \to \BLens\bigl((X,A),(X,A)\bigr) \] which simply picks the identity Bayesian lens \((X,A)\to(X,A)\). Its backward component is trivial.
Proposition 4 \(\BLensP\) is a well defined category enriched in the monoidal category \((\Poly(\Da),\otimes,y)\).
Proof. This is established, in greater generality, by §6.3.1 of my thesis.
Finally, to obtain the bicategory of Bayesian cilia, we follow the process of animation and change the base of \(\BLensP\) along \(\Coalg_{\Da}\).
Definition 4 (Bayesian cilia) The bicategory \(\BCilia\) of (finitary) Bayesian cilia is the \(\Coalg_{\Da}\)-animation of \(\BLensP\).
A 1-cell in \(\BCilia\) is then a POMDP whose outputs are Bayesian lenses and whose inputs are the lenses’ inputs. That is, if \((S,\vartheta):(X,A)\to(Y,B)\) is such a 1-cell, then it is determined by:
- a state space \(S:\Set\);
- a forward output map \(\vartheta_1:S\times X\to\Da Y\);
- a backward output map \(\vartheta':S\times\Da X\times B\to \Da A\); and
- an update map \(\vartheta^u:S\times\Da X\times B\to\Da S\).
Bayesian cilia then compose first by taking the product of state spaces using the monoidal structure of \(\Coalg_{\Da}\) and then by applying the composition laws of \(\BLensP\). More specifically, the composition functors \[ \BCilia\bigl((Y,B),(Z,C)\bigr) \times \BCilia\bigl((X,A),(Y,B)\bigr) \to \BCilia\bigl((X,A),(Z,C)\bigr) \] are obtained by applying \(\Coalg_{\Da}\left(\circ_1,\circ^\sharp\right)\) after the laxator of the monoidal structure of \(\Coalg_{\Da}\).
Remark. Because \(\BLens\), \(\Poly(\Da)\), and \(\Coalg_{\Da}\) are all monoidal, the bicategory \(\BCilia\) inherits a monoidal structure, allowing dynamical approximate inference systems to be placed in parallel. This structure is exhibited in Definitions 6.3.7 and 6.3.8 of my thesis.
Remark. Of course, none of this is restricted to finitary probability or even to discrete-time dynamics: Bayesian cilia may be defined quite generally (and indeed they are in my thesis).
Let us finally adopt this more general perspective, and consider the cilia that underlie the notion of “dynamic category” (Shapiro and Spivak 2023).
3 From animated to dynamic categories
Dynamic categories were introduced by Shapiro and Spivak (2023) to formalize the compositional structure of dynamically adaptive systems, especially in the case that those systems — like ecosystems or large organizations, or even prediction markets or deep learning algorithms — may have a somehow ‘fractal’ form. In that work, dynamic categories are defined to be categories enriched in a certain category of dynamical systems called \(\Org\), but the notion of animated categories allows us to generalize this, because \(\Org\) is not much more than a certain category of cilia.
Remark. Strictly speaking, the \(\Org\) of Shapiro and Spivak (2023) is a double category. Here, we will work only with its underlying (horizontal) bicategory; but as noted above, I expect that the notions of animated category and cilia will extend naturally to a double-categorical setting.
The construction of \(\Org\) begins with the observation that \(\Poly(\Set)\) is enriched in itself: it is a closed category. This means that \(\Poly(\Set)\) may be animated by polynomial coalgebras; and since the morphisms of \(\Poly(\Set)\) are lenses, one thus obtains a bicategory of cilia.
Definition 5 (Self-enrichment of \(\Poly(\Set)\)) We can consider \(\Poly(\Set)\) as enriched in itself as follows.
- The objects are of course the same: polynomials of sets.
- For each pair of polynomials \((p,q)\), there is an internal hom polynomial \[ [p,q] := \sum_{\varphi:\Poly(\Set)(p,q)} y^{\sum_{i:p(1)} q[\varphi_1(i)]} \] whose indexing set is the hom set \(\Poly(\Set)(p,q)\) and whose exponents are the bidirectional inputs for the dependent lenses within.
- For each triple of polynomials \(p,q,r\), there is a dependent lens witnessing composition \[ \circ : [q,r] \otimes [p,q] \to [p,r] \] with forward component given simply by composition in the category \(\Poly(\Set)\) \[ \Poly(\Set)(q,r) \times \Poly(\Set)(p,q) \to \Poly(\Set)(p,r) \] and backward components given by \[ \begin{matrix} \circ^\sharp_{\psi,\varphi} & : & \sum_{i:p(1)} & r\bigl[\psi_1\left(\varphi_1(i)\right)\bigr] & \to & \sum_{(j,i):q(1)\times p(1)} & r\bigl[\psi_1(j)\bigr] & \times & q\bigl[\varphi_1(i)\bigr] \\ & & (i, & z) & \mapsto & \bigl(\varphi_1(i),i, & z, & & \psi^\sharp_{\varphi_1(i)}(z)\bigr) \end{matrix} \; . \]
- For each polynomial \(p\), there is an identity dependent lens \(y \to [p,p]\), defined in the forward direction by the function \(1\to\Poly(\Set)\) picking the identity lens \(\id_p\), and trivial in the backward direction.
Remark. That this gives a well-defined \(\Poly(\Set)\)-enriched category follows from the facts that \([-,=]\) defines an internal hom for \(\Poly(\Set)\) (David I. Spivak and Niu 2021, sec. 4.5) and that every closed category is enriched in itself.
Animating \(\Poly(\Set)\) (so enriched) using \(\Coalg:\Poly(\Set)\to\Cat\) yields \(\Org\), the bicategory of cilia internal to \(\Poly(\Set)\).
Corollary 4 (\(\Org\)) The objects of the bicategory \(\Org\) are polynomials. A 1-cell \(p\to q\) in \(\Org\) is a coalgebra \((S,\vartheta) : Sy^S \to [p,q]\); a 2-cell is thus a coalgebra homomorphism. Horizontal composition is by placing systems in parallel (via the monoidal structure on \(\Coalg\)) and then applying \(\Coalg(\circ)\) (where \(\circ\) denotes the enriched composition operator).
Now, recall that a monoidal category \((\Ma,\otimes,I)\) may be seen equivalently as a one-object bicategory \(\deloop\Ma\), its delooping: the objects of \(\Ma\) become the 1-cells of \(\deloop\Ma\); the morphisms of \(\Ma\) become the 2-cells; and horizontal composition in \(\deloop\Ma\) is given by the monoidal structure \((\otimes,I)\). In the same way, the notion of enrichment may be “horizontally categorified” from monoidal categories to bicategories.
Definition 6 A category \(\Cs\) enriched in a bicategory \(\deloop\) consists of
- a collection of objects \(\Cs_0\);
- for each object \(a:\Cs\), a 0-cell \(p_a:\deloop\);
- for each pair of objects \((a,b)\), a 1-cell \(\Cs(a,b):p_a\to p_b\) in \(\deloop\);
- for each object \(a\), a 2-cell \(\mathit{id}_a : \id_{p_a} \Rightarrow \Cs(a,a)\) witnessing identity;
- and, for each triple of objects \((a,b,c)\), a composition 2-cell \[ \circ_{a,b,c} : \Cs(b,c)\otimes\Cs(a,b) \Rightarrow \Cs(a,c) \] where \(\otimes\) denotes horizontal composition in \(\deloop\)
such that standard axioms of unitality and associativity are satisfied.
Remark. Note that, in the case that \(\deloop\) is indeed the delooping of a monoidal category \(\Ma\), enrichment in \(\deloop\) reduces to enrichment in \(\Ma\).
Following the recipe of this article, animated categories are in fact bicategories, and so we may consider categories enriched in them accordingly. In particular, \(\Org\) is thus a bicategory — and a category enriched in \(\Org\) is a dynamic category in the sense of Shapiro and Spivak (2023). Using more general animated categories, and in particular different categories of cilia, we may then obtain more general notions of dynamic category.
Definition 7 A dynamic category is a category enriched in the bicategory \(\Org\). Thus, a generalized dynamic category may be any category enriched in a bicategory of cilia (or other animated category); accordingly, an \(\Org\)-enriched category is \(\Org\)-dynamic.
Unpacking this definition in the \(\Org\) case, we find that a dynamic category \(\Ds\) is given by a collection of objects \(\Ds_0\), an assigment of a polynomial \(p_a\) to each object \(a\), coalgebras \(\Ds(a,b):S_{ab}\,y^{S_{ab}} \to [p_a,p_b]\) for each pair of objects \((a,b)\), and coalgebra homomorphisms witnessing the compositional structure. This is indeed precisely the data of Definition 3.1 of Shapiro and Spivak (2023).
More intuitively, a dynamic category describes how a collection of parts (‘objects’) comes together (‘composes’), and how this pattern of interaction may change (‘dynamics’). The pattern of interaction of a dynamic category is, unsurprisingly, that of a category: for each pair of parts, there is a system governing the dynamics of their possible interactions, and these systems compose one after another — which we might think of as witnessing a “hierarchy of abstraction of the parts”.
Remark. More generally we may consider, as Shapiro and Spivak (2023) do, other compositional patterns of interaction: for example, we may wish to place interactions side by side, for which we may seek dynamic monoidal categories. Since \(\Poly(\Set)\) and \(\Coalg\) are both monoidal, \(\Org\) inherits a monoidal structure \((\otimes,\id_y)\): this acts to put systems in parallel, so that they emit tensor products of morphisms; its unit is the system with trivial state space that emits the identity \(\id_y\) on the monoidal unit \(y\) in \(\Poly(\Set)\). We can use this to construct dynamic monoidal categories — but we needn’t stop at monoidal structures, as long as the underlying animated category is sufficiently richly structured.
In an upcoming article, I will show how animated and (generalized) dynamic categories may be used to describe the patterns of interaction implied by the theory of the “Bayesian brain” (Knill and Pouget 2004) known as predictive coding (Friston and Kiebel 2009).
References
Footnotes
A locally small category is a category \(\Ca\) in which every hom set \(\Ca(a,b)\) is a ‘proper’ set, meaning an object of \(\Set\). In other words, a locally small category is a category enriched in \(\Set\).↩︎
Alternatively, we could ask for \(\Ca\) to be a “pointed object” in \(\Cat\), hence equipped with a distinguished object \(I:\Ca\).↩︎