Open factor graphs
Constructing decorated cospan double categories of generalized factor graphs.
\[ \def\Ca{\mathcal{C}} \def\Cat{\mathbf{Cat}} \def\CCat{\mathbb{C}\mathbf{at}} \def\CCospan{\mathbb{C}\mathbf{ospan}} \def\Comon{\mathbf{Comon}} \def\FinSet{\mathbf{FinSet}} \def\FinStoch{\mathbf{FinStoch}} \def\Rel{\mathbf{Rel}} \def\Rr{\mathbb{R}} \def\SSpan{\mathbb{S}\mathbf{pan}} \def\Set{\mathbf{Set}} \def\Vect{\mathbf{Vect}} % \def\disc{\mathsf{disc}\,} \def\id{\mathsf{id}} % \let\op=\relax \def\op{^{\,\mathrm{op}}} % \def\FG{\Bbb{FG}} \]
Figure 1 depicts a factor graph: a graphical representation of the factorization of a function, often (but not necessarily) interpreted as the density of a probability distribution. With this interpretation, Figure 1 represents a function \(f\) of the form
\[f(a,b,c,d,e) = f_0(a,b)\, f_1(a,c)\, f_2(b,c,d)\, f_3(d,e)\, f_4(e)\, f_5(d)\]
where the variables have the types \(a:A,b:B,c:C,d:D\).
Figure 1 is a ‘bipartite’ graph, whose two classes of nodes represent on the one hand the factors (in boxes, the functions \(f_i\)) and on the other the variables (in circles, the types \(A,B,\) and so on). This is the form of factor graph described by Wainwright and Jordan (2007, sec. 2.1.3) and adopted by Wikipedia.
Alternatively, we may observe that the circles serve only to label the edges of the graph, and thus obtain a different depiction of the same factorization, using the form adopted by de Vries and Friston (2017) (amongst other authors):
The variation in the form of factor graphs is in part owed to their informality, but it is notable that the form of Figure 2 begins to approach that of a string diagram in a monoidal category. The formalization of this article indicates that factor graphs may in fact be generally understood as undirected wiring diagrams (Yau 2018, chap. 7), justifying their use in statistics to represent undirected graphical models1. We show this by constructing in Section 3 a double category \(\FG\) of open factor graphs; this is the content of Theorem 1.
1 Factors, categorically
The ‘factors’ of a factor graph are functions into some object of scalars; for example, we might interpret \(f_2\) (above) to have the type \(B\times C\times D \to \Rr_+\), for some spaces \(B,C,D\). Should these spaces be \(\mathbb{R}_+\)-vector spaces, then we might suppose \(f_2\) to be a linear functional or covector on their product. \(\Rr_+\)-vector spaces form a category \(\Vect_{\Rr_+}\) which we can equip with a tensor product \(\otimes\). The 1-dimensional space, isomorphic to \(\Rr_+\) itself but denoted \(I\), constitutes a unit for \(\otimes\) and this makes \(\Vect_{\Rr_+}\) into a monoidal category. Covectors on \(X\) then correspond to morphisms \(X\to I\); more generally, we may call morphisms into the monoidal unit costates (since we call morphisms out of the unit states).
This suggests that we may ‘internalize’ the notion of factor in a monoidal category \((\Ca,\otimes,I)\) simply as the costates in \(\Ca\). However, for this to be a nontrivial notion, we need to be careful to choose \(\Ca\) so that it does actually have nontrivial costates. For example, costates in the monoidal category \((\Set,\times,1)\) are trivial, because \(1\) is the terminal object there; but we could instead choose the monoidal category \((\Rel,\times,1)\) of relations between sets.
Likewise, in categorical probability theory, one often works with a category of measurable spaces and Markov kernels between them, but such Markov categories by definition only have trivial costates (Fritz 2019, Equation 2.5). This is the case, for example, in the category \(\FinStoch\) of sets and finitely-supported conditional probability distributions (i.e., the Kleisli category of the ‘distributions’ monad). In order to find nontrivial costates in categorical probability theory, we need to relax the defining assumption that all morphisms emit normalized distributions. Formally, this means working not with Markov categories, but with copy-discard categories (Cho and Jacobs 2017, Definition 2.2): monoidal categories \((\Ca,\otimes,I)\) in which each object \(X\) is equipped with a canonical comonoid structure \((\delta,\epsilon)\), with the comultiplication \(\delta : X\to X\otimes X\) interpreted as ‘copying’, and the counit \(\epsilon : X\to I\) as ‘discarding’2.
Every Markov category is a fortiori a copy-discard category, and so each example of a Markov category typically embeds into a larger copy-discard category. In the explicitly probabilistic cases, these larger categories may be obtained by weakening the assumption above that every distribution must be normalized; as an example, this gives an embedding \(\FinStoch \hookrightarrow \Vect_{\Rr_+}\). This also captures the ‘deterministic’ case of \((\Set,\times,1)\), which is a Markov category, by its embedding into \((\Rel,\times,1)\), which is a copy-discard category; indeed, we may obtain \(\Rel\) as \(\Vect_2\) (the category of vector spaces over the ring of Boolean truth values) and \(\Set\) as its affine subcategory.
Henceforth, we will assume an ambient copy-discard category \((\Ca,\otimes,I)\) and understand ‘factor’ to mean a costate in \(\Ca\) on a monoidal product of objects.
2 Composition of factors
Missing from traditional treatments of factor graphs is an account of their compositional structure. Notably, the factor graph of Figure 2 may be understood in various ways as the composition of certain ‘open’ subgraphs; for example:
More generally, we could seemingly graft another ‘open’ factor graph onto any of the edges in our graphs above, by introducing the mysterious equality node (as long as the types match). How does this work?
In brief, a factor is a costate, a factor graph is a (to-be-defined) composition of factors, and an ‘open’ factor graph will be a factor graph with some variables exposed. Each factor may itself be understood as an atomic factor graph, and we will see that factor graphs (and hence factors) compose by copy-composition3. It will then come as no surprise that the equality node depicted above represents the canonical comultiplication associated to the indicated type by the copy-discard structure. Packaged together, this will yield a decorated cospan double category in the sense of Patterson (2023).
The main observation underlying the construction was already made above: simply that factors are costates in a copy-discard category. This means that, given factors \(f_0:A\otimes B\to I\) and \(f_2:B\otimes C\to I\), we can compose them first by forming their tensor \(f_0\otimes f_2:A\otimes B\otimes B\otimes C\to I\), and then by pre-composing the copier, to give a costate \(A\otimes B\otimes C\to I\). Using the graphical calculus for monoidal categories, we can represent this composite factor with the following string diagram:
The copying makes explicit that this really is copy-composition, and the object \(B\) remains part of the resulting type. We will see below that the comonoid laws entail that this composition law is appropriately (weakly) unital and associative.
3 A decorated cospan double category
When composing \(f_0\) and \(f_2\) above, we glued together the sets of objects on their domains: we formed the pushout of the span \(\{A\; B\} \leftarrow \{B\} \to \{B\; C\}\) of finite sets, yielding the set \(\{A\; B\; C\}\) of objects in the domain of the composite factor. This suggests that we can use the machinery of decorated cospans (Fong 2015) to formalize this composition: in brief, we will take finite sets and decorate them with factors over (monoidal products of) the objects indexed by those sets.
For technical reasons that we will soon encounter, we won’t be able to use Fong’s original machinery. Instead, we will follow Patterson (2023) in observing that adding compositional decorations to categorical constructions is often a sign of a Grothendieck fibration. And since cospans collect into a double category, this means giving a double fibration (Cruttwell et al. 2022). Making use of the associated “double Grothendieck construction”, this is equivalent to specifying an indexed double category (i.e., a lax double pseudo functor) \(\CCospan(\FinSet)\to\SSpan(\Cat)\), where \(\CCospan(\FinSet)\) is the double category of cospans in \(\FinSet\) and \(\SSpan(\Cat)\) is the double (2-)category of spans in the (2-)category \(\Cat\) (Cruttwell et al. 2022, Construction 3.6); see Patterson (2023, sec. 3.1) for a summary. (Note that we use spans of categories for our decorations, rather than spans of sets: having nontrivial morphisms between factors will allow us to manipulate their structure.)
Definition 1 (Lax double functor) A lax double (pseudo) functor \(F:\Bbb{C}\to\Bbb{D}\) consists of a pair of (pseudo) functors \(F_0:\Bbb{C}_0\to\Bbb{D}_0\) (from the vertical4 category of \(\Bbb{C}\) to that of \(\Bbb{D}\)) and \(F_1:\Bbb{C}_1\to\Bbb{D}_1\) (from the horizontal category of \(\Bbb{C}\) to that of \(\Bbb{D}\)), along with (pseudo) natural transformations \(\mu\) and \(\eta\) witnessing the compatibility of \(F_1\) and \(F_0\) with the double category structures of \(\Bbb{C}\) and \(\Bbb{D}\), as in the following diagrams:
Here, \(\diamond\) denotes the ‘external’ (2-cell) composition of a double category structure and \(I\) its unit, so that, in the context of indexed double categories, \(\mu\) implements the indexed external composition, and \(\eta\) implements its unit; compare the notion of lax monoidal functor.
In our case, \(\Bbb{C} = \CCospan(\FinSet)\). Its vertical category \(\CCospan(\FinSet)_0\) is simply \(\FinSet\), and its horizontal category \(\CCospan(\FinSet)_1\) is the category of diagrams in \(\FinSet\) of shape \(\bullet\to\bullet\leftarrow\bullet\), i.e. the functor category \(\FinSet^{\{\bullet\to\bullet\leftarrow\bullet\}}\). Dually, \(\Bbb{D} = \SSpan(\Cat)\); its vertical 2-category is simply \(\Cat\) and its horizontal 2-category is \(\Cat^{\{\bullet\leftarrow\bullet\to\bullet\}}\).
Consequently, we need to define pseudo functors \(F_0 : \FinSet \to \Cat\) and \(F_1 : \FinSet^{\{\bullet\to\bullet\leftarrow\bullet\}} \to \Cat^{\{\bullet\leftarrow\bullet\to\bullet\}}\), along with pseudo natural transformations \(\mu\) and \(\eta\) of the corresponding types. These structures must all satisfy certain standard coherence conditions5, the details of which the reader may find in Cruttwell et al. (2022, Definition 3.12) or summarized in Patterson (2023, sec. 3.1).
The next three sections construct \((F_0,F_1,\mu,\eta)\) for open factor graphs; to see a summary and avoid the details, skip down to Section 3.4 below.
3.1 Vertical decoration: factors’ interfaces
The pseudo functor \(F_0:\FinSet\to\Cat\) assigns categories of decorations to finite sets. For our purposes, finite sets act to index factors’ interfaces — the sets of objects in their domains — but these two classes of set are not formally the same; we need to make this indexing explicit, and this is the job of \(F_0\).
The interface of a factor is a finite set of objects in \(\Ca\), which is equivalently a discrete diagram in \(\Ca\): a functor \(\disc X\to\Ca\), for some finite set \(X\), where \(\disc X\) is the discrete category on \(X\). However, if we were just to define \(F_0X\) to be the functor category \(\Cat(\disc X, \Ca)\), we would run into problems later with the composition of 2-cells. This is because vertical morphisms — decorated by morphisms in the image of \(F_0\) — may be used to transform the interface types, but the composition of factors proceeds by copying. For this to be compatible with morphisms of interfaces, we must restrict the latter to be comonoid homomorphisms.
There is one further restriction we must make. Because finite sets are not ordered and we plan to define factors over accordingly indexed monoidal products, we must assume that the monoidal structure \((\otimes,I)\) on \(\Ca\) is symmetric. Were we not to do so, two different products of the same interface set may have different (and generally incomparable) sets of factors.
These stipulations made, we define \(F_0:\FinSet\to\Cat\) as follows. On objects \(X:\FinSet\), let \(F_0X\) be the functor category \(\Cat(\disc X, \Comon(\Ca,\otimes,I)\) where \(\Comon(\Ca,\otimes,I))\) denotes the (wide) monoidal subcategory of \(\Ca\) whose morphisms are the comonoid homomorphisms. Note that, because the objects \(\chi\) of \(F_0X\) are functors on a discrete category, a morphism \(\varphi:\chi\to\chi'\) (hence a natural transformation) is simply given by an \(X\)-indexed family of comonoid homomorphisms \(\varphi_x:\chi(x)\to\chi'(x)\) in \(\Ca\).
Given a morphism \(f:X\to Y\) in \(\FinSet\), we let \(F_0\) return a functor \(F_0X\to F_0Y\) which we denote by \(f_*\) and define as follows. If \(\chi\) is an object of \(F_0X\) then we define \(f_*\chi\) by the mapping \(y\mapsto\otimes_{x:f^{-1}(y)}\chi(x)\) (for \(y:Y\)). Then, if \(\varphi:\chi\to\chi'\) is a morphism of \(F_0X\), we define each \(y\)-component \((f_*\varphi)_y\) of \(f_*\varphi\) to be \(\otimes_{x:f^{-1}(y)}\varphi_x\). Note that this definition yields components of the correct type \[(f_*\varphi)_y : (f_*\chi)(y) = \otimes_{x:f^{-1}(y)}\chi(x) \xrightarrow{\otimes_{x:f^{-1}(y)}\varphi_x} \otimes_{x:f^{-1}(y)}\chi'(x) = (f_*\chi')(y) \; .\] Naturality is trivially satisfied, because \(\chi\) and \(\chi'\) are discrete functors, and the functoriality of the reindexing \(f_*\) follows from the functoriality of \(\otimes\). Likewise, the pseudofunctoriality of \(F_0\) itself follows from the functoriality of inverse images and the functoriality and symmetry of \(\otimes\), so that \(\otimes_{y:g^{-1}(z)} \otimes_{x:f^{-1}(y)} \cong \otimes_{x:(g\circ f)^{-1}(z)}\).
3.2 Horizontal decoration: ‘open’ factors
The pseudo functor \(F_1 : \FinSet^{\{\bullet\to\bullet\leftarrow\bullet\}} \to \Cat^{\{\bullet\leftarrow\bullet\to\bullet\}}\) assigns spans of categories to cospans of finite sets, compatibly with \(F_0\). We interpret a cospan \(A\xrightarrow{a} X\xleftarrow{b} B\) as follows. The apex set \(X\) will be decorated with two kinds of data: first, a category of interfaces suited to factors in \(\Ca\); and second, for each possible interface, a category of factors on that interface (compatibly with morphisms of interfaces). The legs of the cospan \(a\) and \(b\) will be used to expose those parts of the interface which are open to composition, and in this way the objects \(A\) and \(B\) will be decorated by \(F_0A\) and \(F_0B\) respectively; this is one of the indexed-double-categorical coherence conditions (‘well-definition’).
The apex decoration has a peculiar form, with one category of decorating data (the interfaces) further decorated by a family of others (the factors). This is because the apex decorations are themselves obtained by a Grothendieck construction, defined functorially for each finite set.
The first step in this construction is to define the interface categories. We will want the interfaces again to be described by functors from (the discrete category on) finite sets to \(\Ca\), but now, since we have the cospans determining which parts of the interfaces are exposed for factor composition, there may be some parts which remain ‘latent’ — and transformations of these parts are not constrained to be comonoid homomorphisms. Therefore, we can allow the transformations of the latent parts of the interfaces to be arbitrary morphisms in \(\Ca\), as long as the exposed parts remain constrained as before. (It is this part of the construction that means we cannot use Fong’s original decorated cospans: here, the decoration on the apex necessarily depends on the legs, in order that we have comonoid homomorphisms in the right places.)
This line of thought formalizes as follows. Given a cospan \(m:\bigl(A\xrightarrow{a} X\xleftarrow{b} B\bigr)\), we first define an enlarged version of \(F_0X\). We denote this enlargement by \(\tilde{F_0}m\) and construct it as the following limit of categories:
Note that, because the left and right vertical functors in this diagram are embeddings, so is the induced functor \(\tilde{F_0}m\to\Cat(\disc X,\Ca)\). We can therefore understand \(\tilde{F_0}m\) as the wide subcategory of \(\Cat(\disc X,\Ca)\) whose morphisms are those \(X\)-indexed families of morphisms of \(\Ca\) whose \(A\)- and \(B\)-components are comonoid homomorphisms.
The next step is to decorate \(\tilde{F_0}m\) with factors. Given an interface decoration \(\chi:\disc X\to\Ca\), let \(\chi^\otimes\) denote the monoidal product \(\otimes_{x:X}\,\chi(x)\) in \(\Ca\). Likewise, given a morphism of interfaces \(\varphi:\chi\to\chi'\), let \(\varphi^\otimes\) denote the morphism \(\otimes_{x:X}\,\varphi_x:\chi^\otimes\to{\chi'}^\otimes\). Since \(\Ca\) is a category, factors on the interface \(\chi\) form a set \(\Ca(\chi^\otimes,I)\). We can thus define a presheaf \(\underline{F_1}m:\tilde{F_0}m\op\to\Set\) as follows. On objects (interfaces) \(\chi\), we define \(\underline{F_1}m(\chi)\) to be \(\Ca(\chi^\otimes,I)\). On morphisms of interfaces \(\varphi:\chi\to\chi'\), we set \(\underline{F_1}m(\varphi):\underline{F_1}m(\chi')\to\underline{F_1}m(\chi)\) to be the precomposition functor \(\Ca(\varphi^\otimes,I):\Ca({\chi'}^\otimes,I)\to\Ca(\chi^\otimes,I)\) and denote it by \(\varphi^*\). This yields a well-defined presheaf by the functoriality of \(\otimes\) and of precomposition.
The apex decoration returned by \(F_1\) on \(m\) is then given by the Grothendieck construction of \(\underline{F_1}m\), otherwise known as its category of elements, and denoted \(\int\underline{F_1}m\). The objects of \(\int\underline{F_1}m\) are pairs \((\chi,f)\) where \(\chi:\disc X\to\Ca\) is an interface and \(f:\chi^\otimes\to I\) is a factor on that interface. The morphisms of \(\int\underline{F_1}m\) are factorizations of factors through transformations of interfaces. That is to say, a morphism \((\chi,f)\to(\chi',f')\) in \(\int\underline{F_1}m\) is a morphism of interfaces \(\varphi:\chi\to\chi'\) such that \(f = \varphi^* f'\), i.e. making the following diagram commute in \(\Ca\):
This tells us how to decorate the apices of cospans — but, given the cospan \(m\), \(F_1\) needs to return a whole span of categories \(F_0A \xleftarrow{\pi_a} \int\underline{F_1}m \xrightarrow{\pi_b} F_0B\). We obtain the legs of this span through the universal properties of the construction of \(\int\underline{F_1}m\). First, because \(\int\underline{F_1}m\) is obtained by a Grothendieck construction, it is equipped with a projection functor \(\pi_m : \int\underline{F_1}m \to \tilde{F_0}m\) which acts by ‘forgetting’ the factors; i.e. \(\pi_m(\chi,f) = \chi\). Next, the construction of \(\tilde{F_0}m\) as a limit means that it is equipped with canonical projection functors to \(F_0A\) and \(F_0B\):
Therefore, we obtain the requisite span of categories \[F_1m := \bigl(F_0A \xleftarrow{\pi_a} \int\underline{F_1}m \xrightarrow{\pi_b} F_0B\bigr)\] as the composition of this span of limit projections with \(\pi_m\):
We now need to define the action of \(F_1\) on morphisms of cospans. Suppose then that \(m' := \bigl(A'\xleftarrow{a'}X'\xrightarrow{B'}\bigr)\) is another cospan. A morphism \(\phi:m\to m'\) is a triple \((\phi_l,\phi,\phi_r)\) of maps of finite sets making the following diagram commute:
We know from the treatment above that \(F_0\) is functorial, and this yields the action of \(F_1\) on the left and right components \(\phi_l\) and \(\phi_r\). Moreover, the construction of \(\int\underline{F_1}m\) is functorial: first, the construction of \(\tilde{F_0}m\) is functorial in \(m\), by the functoriality of taking limits. Likewise, the Grothendieck construction is functorial, yielding a functor \(\int\underline{F_1} : \FinSet^{\{\bullet\to\bullet\leftarrow\bullet\}} \to \Cat\). The following diagram then commutes by construction:
Consequently, we define the action of \(F_1\) on the morphism of cospans \(\phi\) to return the morphism of spans \(F_1\phi\) whose components are \((F_0\phi_l,\int\underline{F_1}\phi,F_0\phi_r)\), and we see that this action is functorial by construction.
3.3 Factor composition
The functors \(F_0\) and \(F_1\) describe how to attach factors to interfaces, and how these interfaces (and the factors on them) may be transformed by morphisms in \(\Ca\). But they do not tell us about the operation at the heart of the open factor graph construction: the composition of factors, or the horizontal composition of their morphisms. These operations are formalized by the external composition structure \((\mu,\eta)\) with which we now equip \(F\). Let us begin with \(\mu\), which in the present case must be a pseudo natural transformation filling the following diagram:
The external composition \(\diamond_{\CCospan(\FinSet)}\) acts on composable cospans by pushout, taking \(m:\bigl(A\xrightarrow{a}X\xleftarrow{b}B\bigr)\) and \(n:\bigl(B\xrightarrow{b'}Y\xleftarrow{c}C\bigr)\) to the outer cospan in the pushout diagram
(Psuedo functoriality of \(\diamond_{\CCospan(\FinSet)}\) follows from the functoriality induced by the universal properties of colimits.) The composite pseudo functor along the top and right of the above diagram therefore acts on \(m\) and \(n\) to yield the span of categories \[F_0A \xleftarrow{\pi_{\iota_X\circ a}} \int\underline{F_1}(n\diamond m) \xrightarrow{\pi_{\iota_Y\circ c}} F_0C \; .\] Dually, \(\diamond_{\SSpan(\Cat)}\) acts on composable spans by pullback, so that the composite pseudo functor along the left and bottom of the diagram acts to yield the outer span in the following diagram:
Note that, here and throughout this section, \(\pi_m\) and \(\pi_n\) are the projections out of the pullback, rather than the Grothendieck fibrations of the previous section.
In both cases, the feet of the span are \(F_0A\) and \(F_0C\) on the left and right respectively; this must be the case for \(F\) to be a well-defined double functor. Consequently, the component of \(\mu\) at \((m,n)\), denoted \(\mu_{m,n}\), must be a functor \(\int\underline{F_1}m\times_{F_0B}\int\underline{F_1}n\to\int\underline{F_1}(n\diamond m)\) making this diagram commute:
This means that \(\mu_{m,n}\) takes a pair of factors over \(X\) and \(Y\), that are composable by agreeing over \(B\), and returns a composite factor over \(X+_BY\), all while preserving the exposed interfaces over \(A\) and \(C\). Unsurprisingly, we will define \(\mu_{m,n}\) to act by copy-composition.
First, notice that both \(\int\underline{F_1}(n\diamond m)\) and \(\int\underline{F_1}m\times_{F_0B}\int\underline{F_1}n\) are (discrete) fibrations; the former over \(\tilde{F_0}(n\diamond m)\) by definition and the latter over \(\tilde{F_0}m\times_{F_0B}\tilde{F_0}n\), by the universal property of the pullback:
\(\mu_{m,n}\) should therefore correspond to a morphism of fibrations. By the Grothendieck construction6, such a morphism is equivalent to a pair \((\mu^0_{m,n},\mu^1_{m,n})\) of a functor and a natural transformation as in the following diagram:
To define the functor \(\mu^0_{m,n}: \tilde{F_0}m \times_{F_0B} \tilde{F_0}n \to \tilde{F_0}(n\diamond m)\), note that the objects in its domain are pairs \((\chi,\gamma)\) of functors \(\chi:\disc X\to\Ca\) and \(\gamma:\disc Y\to\Ca\) that agree on \(B\). By the universal properties of the coproduct \(X+Y\) and the pushout \(X+_BY\), there must be copairings \([\chi,\gamma]:\disc(X+Y)\to\Ca\) and \([\chi,\gamma]_B:\disc(X+_BY)\to\Ca\) making the following diagram commute; in particular, \([\chi,\gamma]\) must factor through \([\chi,\gamma]_B\):
Since the image of \((\chi,\gamma)\) under \(\mu^0_{m,n}\) must be a functor \(\disc(X+_BY)\to\Ca\), we make the universal definition \(\mu^0_{m,n}(\chi,\gamma) := [\chi,\gamma]_B\). Functoriality follows from the functoriality of colimits.
The corresponding component of \(\mu^1_{m,n}\) must then be a function \[ \mu^1_{m,n}(\chi,\gamma) : \Ca(\chi^\otimes,I) \times \Ca(\gamma^\otimes,I) \to \Ca([\chi,\gamma]_B^\otimes,I) \; .\] Note that \([\chi,\gamma]^\otimes \cong \chi^\otimes \otimes \gamma^\otimes\). We can thus factor \(\mu^1_{m,n}\) as \[ \Ca(\chi^\otimes,I) \times \Ca(\gamma^\otimes,I) \xrightarrow{\otimes} \Ca(\chi^\otimes\otimes\gamma^\otimes,I) \xrightarrow{\sim} \Ca([\chi,\gamma]^\otimes,I) \to \Ca([\chi,\gamma]_B^\otimes,I)\] so that, by Yoneda, we seek a morphism \(\delta_{\chi,\gamma}:[\chi,\gamma]_B^\otimes \to [\chi,\gamma]^\otimes\) in \(\Ca\).
It is here that copying finally enters. For each \(j:X+_BY\), copying induces a morphism \[\delta_{\chi,\gamma}^j:[\chi,\gamma]_B(j) \to \bigotimes_{i:[\iota^X_B,\iota^Y_B]^{-1}(j)}\, [\chi,\gamma](i)\] in \(\Ca\) which is unique up to coassociativity. Note that the codomain of this morphism is, by the pullback constraint, the monoidal product of \([\iota^X_B,\iota^Y_B]^{-1}(j)\)-many copies of the same object \([\chi,\gamma]_B(j)\) so that this copying is well-defined; and when \(j\) is not in the image of \(B\), then this product is only unary, so that in this case, \(\delta^j_{\chi,\gamma}\) is the corresonding identity morphism. Then \[\bigotimes_{j:X+_BY} \delta_{\chi,\gamma}^j : \bigotimes_{j:X+_BY} [\chi,\gamma]_B(j) \to \bigotimes_{j:X+_BY} \bigotimes_{i:[\iota^X_B,\iota^Y_B]^{-1}(j)} [\chi,\gamma](i)\] has the requisite type, since \(\otimes_{j:X+_BY}\, [\chi,\gamma]_B(j) = [\chi,\gamma]_B^\otimes\) and \[\bigotimes_{j:X+_BY} \bigotimes_{i:[\iota^X_B,\iota^Y_B]^{-1}(j)} [\chi,\gamma](i) = \bigotimes_{i:X+Y} [\chi,\gamma](i) = [\chi,\gamma]^\otimes\] both by definition. Hence we define \(\delta_{\chi,\gamma} := \otimes_{j:X+_BY}\;\delta_{\chi,\gamma}^j\).
We need to verify that \(\delta_{\chi,\gamma}\) is natural in \(\chi\) and \(\gamma\). Since identity morphisms are always natural, we only need to check the case that \([\iota^X_B,\iota^Y_B]^{-1}(j)\) has more than one element — and this in turn only obtains for \(j\) in the image of \(B\). By construction, morphisms of interfaces on those components are necessarily comonoid homomorphisms, and these are the morphisms for which \(\delta^j_{\chi,\gamma}\) is natural. Therefore, \(\delta_{\chi,\gamma}\) is always natural in \(\chi\) and \(\gamma\), and since every factor of \(\mu^1_{m,n}\) is accordingly natural, so is \(\mu^1_{m,n}\) itself, as required.
This means that \((\mu^0_{m,n},\mu^1_{m,n})\) constitutes a well-defined morphism of indexed sets, thereby inducing a well-defined morphism \(\mu_{m,n}\) of discrete fibrations. Given factors \((\chi,f)\) and \((\gamma,g)\) that agree over \(B\), the external composition \(\mu_{m,n}\) acts to return \(\bigl([\chi,\gamma]_B,(f\otimes g)\circ\delta_{\chi,\gamma}\bigr)\). We can depict an example of this string-diagrammatically:
Here we have assumed that \(A\) has \(l\)-many elements, \(C\) has \(n\)-many elements, and the coupled interface \(B\) has \(m\)-many elements, with \(B_1\) shared once between \(f\) and \(g\) but \(B_m\) shared an arbitrary number of times, demonstrating copy-composition of higher arity. (We have also assumed that \(X\cong A+B\) and \(Y\cong B+C\), but this need not be the case, as we will see below.)
The naturality of \(\mu_{m,n}\) means that if we were to have \(f'\) and \(g'\) defined by transforming the interface \(B\) along \(\beta\) like so
then copy-composing \(f'\) and \(g'\) along \(B'\) yields the same as copy-composing \(f\) and \(g\) along \(B\) and then transforming the \(B\) interface of the result accordingly:
However, this only works for ‘determinsitic’ transformations \(\beta\): although the rules of open factor graphs do allow us to transform factors using arbitrary morphisms, we can only pull these transformations ‘outside’ the composites when the transformations are comonoid homomorphisms.
We have almost finished defining the external composition structure \((\mu,\eta)\): it only7 remains to define the unit \(\eta\), a pseudo natural transformation as in the following diagram:
The component \(\eta_X\) at each finite set \(X\) is determined by a functor \(\eta_X : F_0X \to \int\underline{F_1}(\id_X)\) as in the diagram
where \(\id_X\) denotes the identity cospan \(X=X=X\) on \(X\). Note that because \(\int\underline{F_1}(\mathsf{id}_X)\) is defined over the identity span, \(\tilde{F_0}(\id_X) \cong F_0X\), and so \(\pi_X\) simply forgets the factor decorations. Moreover, this means that all components of morphisms of interfaces in \(\tilde{F_0}(\id_X)\) are comonoid homomorphisms in \(\Ca\).
We define \(\eta_X\) using the counits of the comonoid structures on the interfaces on \(X\). That is to say, \(\eta_X\) maps an interface \(\chi:\disc X \to \Comon(\Ca)\) to the pair \((\chi, \epsilon_\chi)\), where here \(\epsilon_\chi\) denotes the canonical discarding map \(\chi^\otimes \to I\). The functoriality of \(\eta_X\) follows from the fact that all morphisms in \(F_0X\) are comonoid homomorphisms, so if \(\varphi\) is a morphism \(\chi \to \chi'\) then \(\epsilon_\chi = \varphi^*\epsilon_{\chi'}\). Pseudo naturality of \(\eta\) obtains likewise, and its unitality with respect to \(\mu\) follows from the counitality of the comonoid counits.
At this final point, we should also validate the remaining laws that a lax double functor should satisfy: in particular, the associativity of \(\mu\). However, we shall at this point simply assert that associativity follows from the associtivity of \(\otimes\), the coassociativity of the relevant comonoid structures, and the universal properties of the (co)limits involved in the constructions, leaving a detailed examination of this claim to future exposition.
3.4 Putting it all together
We have now defined a lax double pseudo functor \(F\) — a doubly-indexed double category of factors on interfaces over cospans of finite sets. Let us now summarize the data involved, before we describe the double category \(\FG\) of open factor graphs as a double Grothendieck construction.
- The functor \(F_0\) associates to each finite set a category of interfaces in \(\Ca\) whose morphisms are deterministic transformations of interfaces; and it acts on maps of finite sets by ‘reindexing’ interfaces according to those maps.
- The functor \(F_1\) associates to each cospan a span of categories whose feet are given by \(F_0\) and whose apex is a category (really, discrete fibration) of factors on interfaces over the apex set of the cospan. Transformations of interfaces in the apex category are allowed to be non-deterministic (as long as they have deterministic components over the feet), and thus morphisms of factors are “factorizations along transformations”.
- The external composition \(\mu\) composes factors by copy-composition: pre-composing them with copiers and thus gluing together the composed interfaces to obtain the resulting factor’s domain. It is this that forces morphisms on composable interfaces to be deterministic: gluing morphisms together means pulling them through the copier, which in turn requires them to be comonoid homomorphisms.
- The unit of external composition \(\eta\) attaches canonical discarding factors to exposed interfaces, so that, by comonoid counitality, composing with these factors is the same as not copying in the first place. Again, we find that morphisms on exposed interfaces must be comonoid homomorphisms, as they must disappear when discarded.
To obtain \(\FG\), we follow Patterson (2023) and apply the double Grothendieck construction (Cruttwell et al. 2022, Theorem 3.51) to \(F\).
The objects (0-cells) of \(\FG\) are pairs \((A,\alpha)\) where \(A\) is a finite set and \(\alpha\) is an object of \(F_0A\) — i.e., an interface in \(\Ca\) on \(A\).
A vertical 1-cell \((A,\alpha)\to(A',\alpha')\) is a pair \((f,\varphi)\) of a map \(f:A\to A'\) and a deterministic transformation of interfaces \(\varphi:f_*\alpha\to\alpha'\) — where \(f_* = F_0(f)\) so that \(\varphi\) is a morphism in \(F_0A'\).
A horizontal 1-cell \((A,\alpha)\nrightarrow(B,\beta)\) is a quintuple \((X,a,b,\chi,p)\) where \(\bigl(A\xrightarrow{a}X\xleftarrow{b}B\bigr)\) is a cospan of finite sets, \(\chi\) is an interface on \(X\), and \(p\) is a factor on \(\chi\) — i.e., \((\chi,p)\) is an object of \(\int\underline{F_1}(X,a,b)\) — such that \(\pi_a(\chi,p) = \alpha\) and \(\pi_b(\chi,p) = \beta\).
A 2-cell as in the diagram
is a pair \((f,\varphi)\) where \(f:X\to X'\) is a map of finite sets and \(\varphi\) is a transformation of interfaces \(f_*\chi\to\chi'\), such that \(f_* p = \varphi^* p'\), \(\pi_{a'}(\varphi) = \varphi^l\), \(\pi_{b'}(\varphi) = \varphi^r\) (with \(\varphi^l\) and \(\varphi^r\) deterministic), and such that \((f,f^l,f^r)\) constitutes a morphism of cospans, meaning that \(a'\circ f^l = f\circ a\) and \(b'\circ f^r = f\circ b\).
Vertical 1-cells \((f,\varphi):(A,\alpha)\to(A',\alpha')\) and \((f',\varphi'):(A',\varphi')\to(A'',\varphi'')\) compose to yield \((f'\circ f, \varphi'\circ f'_*\varphi):(A,\alpha)\to(A'',\alpha'')\); the vertical composition of 2-cells is similar. The vertical identity 1-cell on \((A,\alpha)\) is the pair \((\id_A,\id_\alpha)\) of identity morphisms \(\id_A:A\to A\) and \(\id_\alpha:\alpha\to\alpha\). We denote vertical composition by \(\circ\).
Finally, horizontal 1-cells (and 2-cells, horizontally) compose by cospan- and copy-composition, as described in Section 3.3. Horizontal identities are given by the external unit \(\eta\). We denote horizontal composition by \(\diamond\).
In sum, we have proved the following theorem.
Theorem 1 (Open factor graphs) The foregoing data define a pseudo double category \(\FG\).
4 A lax embedding
If \(\Ca\) is a category of transition kernels, then it is possible to build statistical models in \(\Ca\), the compositional construction of undirected models being one of the major motivations for the work presented here. In computational statistics, often more important than the distribution or measure output by a given kernel is its density function. Cho and Jacobs (2017, Definition 8.1) show us how to characterize density functions abstractly.
Definition 2 (Density function) If a morphism \(c:X\to Y\) can be written in the form
for some state \(\mu:I\to Y\) and costate \(p:X\otimes Y\to I\), then we say that \(p\) is a density function for \(c\) with respect to \(\mu\).
If we restrict \(\Ca\) to the subcategory \(\Ca_d\) all of whose morphisms \(c\) are equipped with chosen densities \(p_c\) (with respect to states \(\mu_c\)), then there is a lax embedding of \(\Ca_d\) into \(\FG\). A little more precisely, we first restrict \(\FG\) to the ‘globular’ bicategory \(\FG_1\) obtained by excluding all but the identity 1-cells from \(\FG\). This yields the following proposition (the full proof of which we leave to the reader).
Proposition 1 There is a lax embedding \(p:\Ca_d\hookrightarrow\FG_1\) mapping each object \(X\) of \(\Ca\) to the 0-cell \((1, X)\) and each morphism \(c:X\to Y\) to the factor \(p_c:(1,X)\nrightarrow(1,Y)\). In this context, \(1\) represents the singleton set, and \(X\) represents the functor \(\mathbf{1}\to\Comon(\Ca)\) picking out the object \(X\) (and its identity). The apex of the cospan of \(p_c\) is the 2-element set \(2\), decorated by the functor mapping those elements to \(X\) and \(Y\) along with the factor \(p_c:X\otimes Y\to I\) given by the density \(p_c\). Given also \(d:Y\to Z\), the laxator \(\lambda_{d,c}:p_d\diamond p_c\to p_{d\circ c}\) is given by pulling \(p_d\diamond p_c\) back along the state \(\mu_c\):
Note that in the composite \(p_d\diamond p_c:X\nrightarrow Z\) the intermediate object \(Y\) is not exposed, so this non-deterministic transformation is legitimate.
For \(\Ca_d\) to be a category, we must have density functions for the identity morphisms, and for these to be well-behaved they should behave like ‘cap’ morphisms: counits for the ‘self-dual’ adjunction thus imposed on each object. One then expects each object not only to be equipped with a canonical comonoid structure but also a canonical monoid structure, with the latter compatible with the former. This is to say that, in this case, every object in \(\Ca_d\) should be a Frobenius algebra, and hence \(\Ca_d\) itself becomes a hypergraph category.
This situation obtains for finite-dimensional vector spaces (and indeed, for our earlier example \(\Vect_{\Rr_+}\)), and leads to the formal setting for quantum information theory (Heunen and Vicary 2019). But it does not obtain for general \(\Ca\). In these cases, \(\Ca_d\) will only be a semicategory (category without identities), and the embedding of Proposition 1 will then not be unital.
5 Graphical calculus
In his original presentation of decorated cospans, in which only the apices of cospans are decorated (typically only by sets of data), and which does not extend to a double-categorical structure, Fong (2015) showed that the resulting symmetric monoidal category is always a hypergraph category. Fong and Spivak (2019) then showed that hypergraph categories can in turn always be understood as algebras for a multicategory of cospans, which licenses presenting decorated cospans as (algebras for) undirected wiring diagrams (Yau 2018, chap. 7; Spivak 2013); this has been used (for example) in the computational context of epidemiological modelling.
One may then expect that a similar algebraic story can be told for higher-dimensional decorated cospans of the kind presented here — we should expect at least for the algebras to be valued in \(\Cat\) rather than \(\Set\) — but a proof of this is not presently known to this author. Even so, we can sketch a graphical syntax for open factor graphs, inspired by undirected wiring diagrams.
The first thing to note is that this line of enquiry demonstrates that factor graphs should be understood as composing operadically (multicategorically): that is to say, by nesting; thus, within each factor of a factor graph may be hiding a whole other factor graph! This formalizes the situation sometimes encountered in the literature — e.g. Koudahl, van de Laar, and de Vries (2023, fig. 20) — in which a collection of factors is grouped into a composite factor, sometimes depicted by drawing an extra box around the collection.
To hew more closely to existing undirected wiring diagrams, as well as the standard string diagrams for monoidal categories, we will adopt a syntax in which factors are depicted with circular boxes, exposed variables by emanating wires, and transformation morphisms by rectangular boxes. We will depict comonoid homomorphisms as ‘diamond’ shaped boxes, both to indicate that they commute with horizontal composition \(\diamond\) and also to suggest that they can pierce through the bubble enclosing a factor; this means we will attach wires to their corners. Conversely, we will depict arbitrary morphisms with standard rectangular boxes, attaching wires to their sides.
Here is a simple example of a composite factor \(f'\), constituted by another factor \(f\) with three exposed variables, two of which are composed with transformations, one deterministic (\(\varphi\)) and one not (\(\psi\)):
Thinking statistically, we can interpret exposed variables (dangling wires) as representing observed random variables. We can draw a bubble around the whole of Figure 3 to obtain a new factor with three unobserved variables; nonetheless these variables are observable, which we depict by terminating them with a black dot. As a horizontal 1-cell in \(\FG\), the resulting factor has the type \(f_0:0\nrightarrow 0\), where \(0\) denotes the empty set.
The drawing of the bubble represents a 2-cell \(f_0\Rightarrow f'\). We can think of this 2-cell, being directed from \(f_0\) to \(f'\), as “reaching into \(f_0\)”, pulling observable wires out to expose them as the wires of \(f'\).
We can compose \(f\) with some factor \(g\) (with a compatible interface), repurposing the black dot — or spider, in the terminology of Coecke and Kissinger (2016) — to represent the locus of composition:
(In this way, we can think of a one-legged spider, representing an observable variable, as a locus of potential composition.)
We can reach inside \(g\diamond f'\) to expose the internal wire:
And this gives us another way to understand the deterministic morphisms: as those that can slide past spiders, multiplying on the way into factors. For example, suppose have another composite factor \(h'\) involving \(\varphi\), such as this:
Then we could compose along \(\varphi\) with \(f'\) to obtain \(h'\diamond f'\):
Notice that here we have made use of the ability (but not necessity!) of copy-composition to couple multiple variables together.
If again we expose the composed variable, we can pull \(\varphi\) outside, too:
Notice that, by using spiders, there would be no ambiguity if we omitted the outer bubble: observed variables are those that are not terminated in a one-legged spider; and deterministic transformations “push forward” from exposed variables to multiply through spiders in the direction of their target factors.
We can use non-deterministic morphisms to render observable variables unobservable, by ‘marginalizing’ them out. For instance, here we have marginalized out one of the observed variables of \(h'\) by transforming it with a state \(\nu\):
Here, we have used the standard string-diagrammatic representation of a state in \(\Ca\).
Finally, let us depict the factor graph of Figure 1 and Figure 2 in this language:
Note how similar this depiction is to the non-compositional form of Figure 1: the major change is the replacement of ‘variable’ nodes with (labelled) two-legged spiders, making the distinction between observed and observable variables. Beyond this distinction, the syntax of \(\FG\) tells us precisely what kinds of compositions are allowed — including supplying notions of transformation of factors.
6 Closing comments
This article presented a formalization of the compositional structure of factor graphs in a general copy-discard category \(\Ca\), yielding a double category of decorated cospans \(\FG\). When \(\Ca\) is interpreted as a category of transition kernels suitable for constructing statistical moels, the horizontal 1-cells of \(\FG\) correspond to undirected graphical models in the standard statistical sense.
We were principally concerned here with defining \(\FG\) (and establishing its well-definedness). In a future article, we will consider directed (or causal) graphical models and their connection to undirected models; and later we will study inference on these models as implemented by message passing algorithms.
References
Footnotes
Again, see Wainwright and Jordan (2007, sec. 2.1.3).
The undirectedness captures an important difference with ‘causal’ (directed) models: undirected models are allowed to contain cycles, as evidenced by both Figure 1 and Figure 2.↩︎
The notion of Markov category is equivalent to that of ‘affine’ copy-discard category: a Markov category is a copy-discard category in which the discarding morphims are the components of a natural transformation, which makes the monoidal unit terminal.↩︎
I introduced the notion of copy-composition in a paper at ACT 2023 (St Clere Smithe 2023), and sketched the form presented in this article in a bonus slide of my presentation.↩︎
We take “double category” to mean pseudo double category, oriented such that composition is strict in the vertical category and weak in the horizontal.↩︎
These enforce things like associativity and unitality, and compatibility between the 2-cells and their vertical and horizontal boundaries.↩︎
Strictly speaking, we also need to validate that \(\mu_{m,n}\) is pseudo natural in the cospans \(m,n\). This is easy to confirm but tedious to communicate, so we omit it here.↩︎