Current research
This page outlines some of my present research questions. Many of them are elaborated in more detail in the final chapter of my DPhil thesis (St Clere Smithe 2023a, chap. 8). They are held together by the common aim of seeking adequate mathematics for understanding life.
Please get in touch if anything here likewise interests you: I am looking for collaborators and students!
I am also interested in computer implementation, particularly in the context of Algebraic Julia, and particularly in relation to compositional approximate and active inference.
Spatial systems theory and ‘ergogenesis’
Categorical systems theory (Myers 2022) tells us how interacting systems come together and how they may influence and relate to each other, but it mostly lacks spatial content: in our world, systems have spatial extent and their interfaces typically have geometry; moreover, their internal structure may be spatial.
- Can we profitably extend categorical systems theories from fibrations to stacks?
- Which physical systems fit into the resulting framework?
- Does this class include Bayesian mechanics (Ramstead et al. 2022)?
- Can we use sheaf-theoretic tools such as cohomology to understand when spatial systems agree about their states or their environment?
- Might this help us design intelligent systems which seek consensus?
- Does the class of such systems include multi-agent active inference?
- Might we hence obtain a compositional account of morphogenesis (perhaps extending K. Friston et al. 2015)?
- More generally, we may thus seek an account not only of the emergence of form, but also of function: a process which one might call ergogenesis.
Compositional inference and Bayesian deep learning
Compositional approaches promise new machine learning tools that are at once more supple (with respect to ease-of-use) and more subtle (with respect to structures implicit in data). Here are some questions on my mind.
On factor graphs and message-passing:
- Factor graphs have an elegant double-categorical structure.
- Can we use this to generalize or improve message-passing algorithms (cf. Wainwright and Jordan 2007, sec. 2.5.1; Yedidia, Freeman, and Weiss 2002; Şenöz et al. 2021)?
- How might this relate to a sheaf-theoretic approach (such as that of Peltre 2020), or to rewriting?
- Is there a type-theoretic perspective on message-passing?
- Does message-passing fit into the framework of spatial systems theory?
- Whereas factor graphs are undirected models, directed models have various compositional formalizations (e.g. Lorenz and Tull 2023).
- What are the relationships between them?
- Are undirected to directed models as relations are to functions?
On predictive coding and generalized deep learning:
- Predictive coding is a scheme for inverting certain directed models that may be understood as deep learning in an information geometry.
- How far can this analogy be pushed?
- Is there a satisfactory (weak) dynamic category (Shapiro and Spivak 2023) for predictive coding in this sense?
- How flexible can we make the topology of predictive coding systems (cf. Salvatori et al. 2023, sec. 3)?
On structure and learning:
- Can we usefully formalize the notion of structure-learning using the idea of base change?
- We distinguish structure-learning (learning structure) from structured learning (St Clere Smithe 2023a, sec. 8.1.2).
- Does a stochastic or fuzzy generalization of functorial database theory (Spivak 2012) help us understand both, or either?
- Can we understand generative models as stochastic database instances?
Dynamics and coalgebra
By definition, adaptive systems respond to change, which means we should have some idea of what ‘change’ means: this is the purview of dynamics. In computer science, coalgebra has emerged as a powerful formalism for studying computational dynamics (Jacobs 2017), and it seems there is much scope for cross-pollination between coalgebra and traditional approaches to dynamics, especially where the dynamical systems of interest have information content, as all living systems do.
Can we instantiate compositional inference in a dynamic setting? Which setting is best?
- Does the Kalman filter to have a simple expression here? It should!
- How does the resulting account of filtering relate to variational filtering (K. J. Friston, Trujillo-Barreto, and Daunizeau 2008) in “generalized coordinates” (Da Costa et al. 2021), or indeed to other methods such as BFFG (van der Meulen 2022)?
Coalgebras for a polynomial functor over \(\mathbf{Set}\) can be understood as deterministic discrete-time open dynamical systems, and their category forms a topos (Spivak 2022, sec. 3.1). It is possible to generalize this construction to continuous-time systems (St Clere Smithe 2023b).
- Do the resulting categories also form topoi?
What is the relationship between the topoi of coalgebras and topoi of interval sheaves (Schultz, Spivak, and Vasilakopoulou 2020)?
What is the relationship between (i) the internal language of a topos of coalgebras, (ii) coalgebraic logic (Kurz 2006), and (iii) temporal type theory (Schultz and Spivak 2019)?
Is the abstract relationship between Markov processes and random dynamical systems akin to randomness pushback (Fritz 2019, Definition 11.19)?
Compositional active inference
It is natural to think of agent-environment interactions as constituted by two dual processes: one being perception; the other being action. Active inference explains both of these using approximate Bayesian inference: perception involves an agent changing its internal model, while action involves the agent changing its environment; in both cases, the aim is to minimize surprise or prediction error, given beliefs about how the world should be.
Compositional active inference (St Clere Smithe 2021b, 2022a) is a category-theoretic account of active inference and its compositional structure, which aims to expose the fundamental structures underlying multi-agent systems, from multicellular life to ecosystems and economies. It is a work in progress, grounded in my DPhil work and expected to fit into the frameworks of categorical cybernetics (Capucci et al. 2022; St Clere Smithe 2021a) and categorical systems theory. Since it touches on most of the topics on this page, there are many open questions; specific to active inference itself are the following.
Does multi-agent active inference yield a class of spatial systems theories?
- Can we use it to study self-organizing or otherwise cooperative complex systems?
- Can we see systems using action to reach consensus — can they persuade each other?
What is the type of a plan (Botvinick and Toussaint 2012)?
How does the systems theory of active inference relate to a putative systems theory of reinforcement learning?
More generally, can we express the claimed universality of active inference (Parr, Da Costa, and Friston 2019) as a universal property — perhaps using adjunctions of systems theories?
‘Géo-logie’: topos theory to map the universe
There is a profound relationship between topology and logic, exhibited in topos theory and yielding what Anel and Joyal (2020) have called ‘topo-logie’. By the Curry-Howard-Lambek correspondence, this relationship can be extended to computation, and the resulting trinity underlies homotopy type theory (Univalent Foundations Program 2013) and the dependently typed programming languages which instantiate it.
At the same time, there is evidence that the mammalian brain represents its environment with a cognitive map (Behrens et al. 2018), even when that environment is quite abstract (Garvert, Dolan, and Behrens 2017; Bernardi et al. 2018). Following the “logic of space” (Shulman 2017), every category can be seen as an abstract directed space (Grandis 2009). This suggests a formal approach to understanding the cognitive map and its apparent generality: the brain embodies a topos that represents its model of the world.
However, missing from such an account would be geometry. The ability to measure distances is important for inference and learning, just as it is for measuring the effort involved in moving. Extending topo-logie to a geometric context may yield géo-logie, and prompt such questions as follow.
- What is the computational or type-theoretic content of geometric structure? Dually, what is the physical meaning of (stochastic) computation?
- Ideas from information geometry may be relevant here — but perhaps we can follow the classical trinity and use Kolmogorov complexity to be more explicitly computational (Wolpert 2019).
In such a framework, it seems likely that we can understand (directed) generative models semantically as “stochastic sections”, and that this might supply a topos-theoretic account of agents’ internal models (their internal universes).
- What is the syntax of stochastic sections, or of ‘metric’ topoi more generally? How is it related to linear type theory?
- Can we express inference naturally in this type theory?
- Does this yield connections to, or a suitable formal home for, the quantum free energy principle (Fields et al. 2022)?
- Do the resulting model universes bear any relation to topos-theoretic models of (aspects of) the physical universe?
Given a topos-theoretic account of agents’ models, we may return to multi-agent systems and active inference.
Do spatial systems theory and quantitative type theory supply the formal tools for an abstract account of open Bayesian-mechanical systems, ‘particular’ physics (K. Friston 2019; K. Friston et al. 2022), and hence the physics of life?
Is ‘navigation’ a viable proof strategy? Do we find here an abstract cognitive map?
Biosemiotics, metaphysics, and life
Living systems are necessarily open systems, in interaction with their environments. Biosemiotics interprets every process of interaction of a living system as a process of communication (Barbieri 2007): a linguistic process. Pursuing this idea leads to a number of metaphysical-mathematical questions.
Can we understand message-passing for approximate inference semiotically?
Can we understand neural spiking activity (Doursat 2013) as (possibly higher-order) message-passing?
Can we give a ‘local’ compositional account of fundamental physics (or a toy model) via semiosis / message-passing?
- Might the objective universe obtain as the colimit of observers’ internal universes? (Conflict occurs where no consensus may be reached.)
What is a ‘symbol’ or a ‘message’, mathematically? Both are observer-dependent. Does the concept of ‘thing’ admit a similar description?
- Perhaps: observer-dependence comes from working relative to an object (e.g. the base of a topos, or the type-theoretic context); a message is a generalized state; an interpreter is a costate; a symbol is anything that may be substituted.
Can we think of computation as “the dynamics of semiosis” (St Clere Smithe 2023a, sec. 8.4.3)? (The idea being that computation is a ‘meaningful’ dynamical process.)
Amongst open systems, living systems are characterized not only by their propensity to resist disorder (survival), but also by their propensity to remake their environment in their image (autopoiesis). Autopoiesis may be a strategy for survival, but together these properties produce ecosystems: fractal systems of systems, across spatiotemporal scales.
- Do the mathematical tools discussed above allow us to characterize this process mathematically, perhaps following St Clere Smithe (2022b)?
- Can we see messages themselves as ‘proto-living’ systems, like memes or viruses?
- What structure must the substrate have for this to work?
- Does answering these questions indicate how, by a process of consensus-seeking, living systems themselves form the substate for supervenient meta-systems such as societies or corporations?
- (Presumably, natural selection and the resulting drive to survive pushes systems to self-organize in this way.)
- Can we use this new understanding to design robust self-organizing systems, or to control biology? To guide “ecosystems of intelligence” (Karl J. Friston et al. 2022) to sustainability?