Here I discuss a lengthy new paper of mine entitled Towards a Geometry and Analysis for Bayesian Mechanics (preprint arXiv:2204.11900, cited here as [Sak22]). I’ll give a broad overview of the paper, contextualising the results and walking through the thought process. A future post will be devoted to an in-depth look at Sections 4 and 6 of [Sak22].1 There will be a more collaborative paper following this one which talks more about this line of enquiry, so I’ll give a glimpse of that here too, but that may also get its own post. Lastly, a preliminary account of these results appeared in this talk. The gauge theory stuff in Section 6 makes some more sense when combined with pictures.

In any case, if I were to file this post under an MSC code, it would probably be 51P05: classical or axiomatic geometry and physics. Allow me to explain why.

In the 1900s, modern mathematical physics came to become what we know it as today: an exercise in mathematics, tethered loosely to physical motivations. This is a little too self-deprecating to be true, but it’s correct in the sense that most of modern physics is far more abstract than it was prior to Einstein’s great insights into the geometric nature of relativity, where general relativity and the equivalence principle mark a real turn in the way theorists do and think about physics. In particular, our understanding of gravity, quantum fields, strings, and even classical physics have all been formulated in purely mathematical terms, especially geometric ones. Suspiciously absent from this list? Thermodynamics. That’s not totally fair either—Kolmogorov, the father of probability theory, worked quite a lot in what we might today identify as statistical physics. But the fact remains: the complicated systems described by statistical mechanics, such as thermodynamical systems, condensed matter, and soft matter or biological systems, are complete mysteries. From the perspective of axiomatic, mathematical formulations of physics, anyway, we have little idea how to characterise these systems in any general, rule-based fashion.

These papers gesture at how we might do just that—take a new physics for self-organising statistical systems, Bayesian mechanics, and put some geometry onto it (and some analysis, too).

To zoom way out, there are really two things going on in this paper: one is the aim, and the other is the method. It’s best read with this dichotomy in mind, in that the point of the paper can seem detached from the techniques used if one doesn’t. We aim to place an example of the FEP on solid mathematical ground, and in doing so, go towards an axiomatic picture of biological physics. This has sort of been done in the 2019 monograph and key prior papers (e.g., a Fri12, PDCF19), but they can be a bit rough, especially when it comes to questions about ergodicity, stationarity, approximate Bayesian inference, and solenoidal flows, questions which have been famously kicked about in the literature. So, assuming the FEP answers all our questions about biophysics, this paper seeks to answer some questions about the FEP in a rigorous and well-defined way.

The problem with grounding the free energy principle in solid maths and physics is that free energy itself is sort of mysterious. There’s not a lot of work showing what properties it has, especially not in probability theory, which is key to the questions we wish to answer (if we wish to answer them formally). However, there’s some very nice work out there looking at entropy in a potential, whose maximum is identical to a free energy minimum. And there’s a tonne of formal work on entropy. That motivated me to introduce entropy as the object of interest, rather than free energy, without loss of generality. This equivalence is actually proven throughout Section 4 of the paper, and it turns out we can get everything from the FEP back from entropy in a potential (this is how we think of duality in physics, cf. the holographic principle, S and T duality, double copy gravity; that every calculation has a corresponding item on the other side of the duality). But there’s a second, more subtle transformation, which I’ve tried to call attention to: entropy in a potential only really makes sense if we look at the potential (and thus entropy) as something on an agent’s states, rather than environmental states. Maybe that seems innocuous enough to you, or maybe you object to that; it turns out we need to do quite a lot of work to make sense of this mathematically, but conceptually it’s quite the same thing. Indeed, we get everything from the FEP back, just through a different lens. This is exemplified in an adjunction, a type of mathematical relationship which always hides some highly non-trivial structure behind an apparently simple pair of opposites. The sequence I’m talking about is evident in Section 4, where we explicitly pass from free energy to an entropy functional on beliefs to a self-entropy under a constraint. Sections 2 and 3 are all the backdrop to that sequence, motivating the techniques used—not necessarily the paper itself.

So remember: in [Sak22], we don’t set out to rewrite the FEP. We just need some better leverage to make sense of it, so we transform it a little. To paraphrase a quote by Chris Buckley, we’ve left the front, made gains elsewhere, and returned with new weapons. This is a common enough way of working in pure mathematics, but can be a bit tricky to get right. It’s also tough to read without deconstructing a bit, but hopefully I’ve taken care of that here. One of the ultimate takeaways from this paper is that if one buys into max ent—and most people do—one automatically buys into some form of the free energy principle. Conversely, whenever one uses the FEP, they get max ent for free. This duality is the method to our aim. It’s ironic that in trying to make the FEP work, we ignore the FEP, but it’s completely sensible that we end up bypassing it to prove statements about an equivalent structure. Moreover, some would say it’s fitting that to get to biophysics we’ve had to go back to maximum entropy. I’d be inclined to agree, but again—if you are the sort of person to say that, this body of work posits that you have to recognise you’re still secretly in favour of (some simple2 form of) the FEP.

A partner to this paper is on the way, which does take all of this new stuff and contextualise it within the FEP. Here is where the duality and the passing to maximum entropy gets upgraded from a mere technical tool to an actual statement about the FEP, and about what it means, conceptually. Under Maxwell Ramstead’s careful eye and the collective effort of the entire lab, each of whom is doing some phenomenal work to put this together, we aim to bring the construction more firmly to the FEP. My paper gets away from the FEP in several spots where entropy really takes over (there’s a joke somewhere in there), and it has some unexplored implications for what the future of the FEP might look like, that we really wanted to address. Although not the stated aim, in effect I treat the FEP as an interesting question in biophysics that justifies doing some nice maths. It turns out I don’t need to think about free energy at all, and we’re off. The group’s paper actually focusses on the FEP and relates these new results, derived in the context of entropy, to existing (and hopefully future) results in the FEP.

From the very top, one of the contributions of the paper is to define a simple version of Bayesian mechanics. Under the free energy principle, Bayesian mechanics is the physical theory describing what systems that engage in approximate Bayesian inference do. The approximate Bayesian inference lemma in the monograph (here, upgraded to a theorem) is the observation that systems can be modelled as minima of variational free energy, since this records tautologies about what systems do and how they are perturbed by their environments. I define BMech under the FEP by formulating it in terms of an equivalent structure, maximum entropy. This work is in some sense a sequel to this paper by Da Costa et al—or, as I remark in Section 2.1, perhaps it’s a prequel.

To give an overview of the paper, the global pieces of the argument basically proceed like this:

I first (S2) give a summarisation of three important pieces of information, namely, the FEP, maximum entropy, and the duality we are interested in. As I said, a key to the construction is that dual pairs are a special sort of equivalence: dualisation preserves a pair of objects (in fact, adjunctions are an isomorphism of Hom sets of objects, if that means anything to you). Simultaneously, they invert the morphisms between objects themselves. So, by dualising, we’re telling the same story from two totally different points of view. This equivalence is important for us, because by establishing the duality holds, we prove that the FEP is true whenever max ent is true. That’s good for the FEP, and allows us to go even further by using maximum entropy.

Next, I discuss what self-entropy could possibly mean in this context (S3), some philosophical or biophysical backdrop against which I can actually do some maths. Already we have fixed one issue with the FEP: the Markov blanket is nothing to fuss about, because it is ultimately a bit of a tautology, and simply evinces a definition of a system with controlled internal states that couple to any set of external states. Clearly we’re on the right track, so constrained self-entropy is the way to go.

In the beginning of S4, I actually construct the duality. This happens in two steps:

  • First, we go to constrained entropy over beliefs. In doing so, we can prove that systems which constrain themselves perform approximate Bayesian inference on average. This is Theorem 4.1.

  • Secondly, we prove that putting a \(\mu\) everywhere we see an \(\eta\) and vice versa changes nothing about the FEP.

Here, it is the second theorem, Theorem 4.2, which proves the duality: that under the right constraints, constrained self-entropy gives us all of the features of the FEP. There’s some technical work which needs doing in order to get there, but we do get there in the end.

In S4.1, I use this to make some sense of the formal structure of Bayesian mechanics. To paraphrase a quote from Lance Da Costa, this paper focusses on the synchronisation between internal and external, and reformulates it in terms of maximising entropy subject to some particular constraint; since maximising constrained entropy has been extensively studied in the past, the literature in this area can be put in the service of Bayesian mechanics.

In S4.2, I focus on how this illuminates the formal nature of random dynamics under the FEP. In particular, how does this affect our notions of ergodicity in the FEP? To that end I formulate a notion of local ergodicity, which is essentially a stand in for a probability density that arises from paths being constrained to an attractor. The nice thing is, this makes perfect sense mathematically, and patches some concerns about both ergodicity and attractors under the FEP.

In S4.3 I zoom out a little to think about what this buys us in the context of the philosophy behind the FEP. This section is fairly self-explanatory.

Now, on to S6. In S6 I focus on some more esoteric aspects of both max ent and the FEP. One is related to my previous work on what I’ve called a constraint geometry; the interested reader can see what this is about at an older blog post. Three motifs become important.

In S6.1, I discuss gauge theory, giving a brief and probably too-terse-by-half overview of what a gauge theory is. I also talk a little about my parallel transport results, which get a little bit more attention in the post linked above.

In S6.2, I elaborate on the statement that the target density is ‘essentially a stand in for a probability density that arises from paths being constrained to an attractor.’ This is given in a long proof, Theorem 6.1. I also discuss the infamous Helmholtz decomposition here, which turns out to be less frightening than it seems: it merely arises from a constraint geometry, and so in this simple case, is nothing but a gauge-symmetric aspect of maximum entropy.3

In S6.3 I discuss generalisations of all this to less simple cases like non-equilibria. We can get some results, but not great ones; so it remains a future task to extend this to genuine non-equilibra.

By the end of it, we have essentially proven what Bayesian mechanics is and what it does. In particular, we have proven that (simple) systems which constrain themselves can be modelled as performing inference against those constraints, and that this constructs an attractor of system-like states, in a primitive version of allostasis-by-perception.

Zooming out a little bit once more, some material of independent interest has been discussed, especially a connection between the FEP and max ent (again, some would say it is not suprising at all that we had to ask questions about max ent to uncover the formal structure of the FEP; I would be inclined to agree—but following that, I’d point out that we did get the FEP back in the end). This places the FEP more transparently within both statistics and statistical physics. By connecting the FEP to max ent and both to biophysics, I believe this will be a more circumspect chapter for the FEP, where the maths is more careful, and the statements, less opaque.

  1. As usual, my motivation for this is along the lines of readability. For interdisciplinary audiences, like the target of this work, mathematical papers can be a bit inscrutable on a first read. That is to say, due to the common structure of papers in maths, they can be a quagmire of technical statements without all the perspective one would like—the idea is, this should help orient readers through the paper. Blogging informally or writing more expository material about one’s results has become common in pure maths for this reason. Outside the proofs, this paper is quite reader friendly, though—the roadmap I’ve given here is nothing more or less detailed than the roadmaps in the paper. 

  2. In a few places I refer to this as a simple form of the FEP. The difference in my eyes is that we’re discussing states and not trajectories, and effective equilibria rather than genuine non-equilibria. I have a very brief paper about this forthcoming—in the next week or so, in fact—which is the precursor to doing the whole thing over again over paths to get to the ‘real’ FEP. Throughout, I find the example of a stone to be an interesting edge case that allows us to test the theory. It’s interesting that this simplified version is most informative about boring things—stones and the like—clearly, there’s more work to be done. 

  3. Perhaps that’s not actually less frightening.