A Working Definition of Bayesian Mechanics (Bayesian Mechanics II)

In a previous blog post, I discussed a paper of mine about the maths behind Bayesian mechanics (cited here as [Sak22]) and mentioned that an additional lengthy paper was on its way about the physics behind Bayesian mechanics (the free energy principle, in particular). That paper, joint with collaborators of mine in the VERSES Research Lab, is now available on the arxiv as preprint arXiv:2205.11543. Here, I’ll go over that paper, which takes previous work and proposes a definition of Bayesian mechanics. The remainder of the post will be devoted to deconstructing the preface of Section 4 (i.e., up to 4.1) of [Sak22], where I prove the approximate Bayesian inference lemma, a crucial result for Bayesian mechanics as it is spoken of in [RSH+22].

The primary aim of these papers is to define Bayesian mechanics. Work in mathematical physics generally falls into one of two broad classes of question: physically inspired mathematics, or mathematically rigorous physics. The former usually means discovering new and interesting mathematical structures using physical intuition or inspiration, like Donaldson theory. The latter means looking at a physical problem through a comprehensive and exacting mathematical lens, like the ongoing efforts to formulate quantum field theory rigorously. This is actually a good example: we have a collection of phenomenological results about what QFT should do and we know roughly what different facets of it ought to look like, but have no set of axioms, or even a definition, for QFT in four space-time dimensions. We simply don’t know enough about QFT to say how it should look from first principles, and the existing maths is insufficient to make it work non-perturbatively (read: without approximations of some sort). Here, at a high level, we want to get an axiomatic framework for some sort of mathematical biophysics. We can do this by using the structure already put in place by the FEP. However, first, we need to better understand the mathematics of the FEP (its detailed geometric and analytic formulation) and the physics of the FEP (a physics of beliefs, and dually, by beliefs).

More particularly, in these papers, we make an effort to define Bayesian mechanics, foreshadowing a hopeful connection to some sort of rigorous physics of complex systems. In [RSH+22] it is proposed that Bayesian mechanics is a consequence of the principle that surprisal is minimised (and thus, necessarily, that variational free energy is minimised) in precisely the same sense as classical mechanics is a consequence of the principle that the classical Lagrangian is stationary. Consider the following analogy:

The equations of motion yielded by classical mechanics are variants of Newton’s second law, a statement that systems accelerate along force gradients in virtue of exchanging kinetic energy for potential energy.
The equations of motion yielded by Bayesian mechanics are variants of approximate Bayesian inference, a statement that systems track modes or paths in virtue of their states parameterising probabilistic beliefs about some dual object.

The different sorts of mechanical and dynamical theories possible under each application build out a typology or taxonomy of incarnations of the FEP, which has previously been unclear in the literature, due to the conflation of different applications in regimes where they are formally distinct—type errors, if you will. In a similar sense as continuum mechanics is a subtype of Newton’s laws of motion applying to fluids, which is a much different subtype from orbital mechanics (applying to satellite motion), active inference \(\neq\) the free energy principle \(\neq\) the sort of mode-tracking appearing in this paper.

After that is fully accounted for in Sections 2 and 3 of [RSH+22], we re-define mode-tracking in gauge-theoretic terms. Sections 4 and 5 of [RSH+22] are an elaboration on the results in Section 6 of [Sak22], which is slighly nicer to the reader than its predecessor, mostly thanks to Brennan Klein’s unrivalled artistic ability. Being a mathematician foremostly, to me, gauge theory is geometry; in particular, gauge theory is the study of a particular sort of space called a fibre bundle. Fibre bundles happen to be the natural setting for certain classical field theories for the physics of high energy particles, but that turns out to be irrelevant in the general case. I raise this point in case the geometry and analysis paper was tough going for the reader, to flag that this one is a bit more intuitive (thanks primarily to these figures).

The idea behind this is that we can formulate the Helmholtz decomposition in this re-definition of Bayesian mechanics. In particular, there’s a natural interpretation of the Helmholtz decomposition in gauge-theoretic terms (chasing modes is following a gauge force, a Newtonian language for the equivalent Bayesian statement) and vice-versa; that beginning from the point of gauge theory and getting maximum entropy/the FEP, we get a splitting of flows for free. A simple statement, but be forewarned: this just offloads the difficulty to constructing the fibre bundle and defining a gauge theory for max ent. Nonetheless, once we do so, a splitting exists naturally. In future work I’d like to do this last step more formally, since it would be nice to show that the equations from Friston (and others) arise directly from gauge theory. I’ve sketched out (in a manner of speaking) how that might look and it seems promising; stay tuned for more.

As for [Sak22], the main result of the first half of the paper is in some sense Theorem 4.2. From the previous blog post: “from the very top, one of the contributions of the paper is to define a simple version of Bayesian mechanics. Under the free energy principle, Bayesian mechanics is the physical theory describing what systems that engage in approximate Bayesian inference do. The approximate Bayesian inference lemma in the monograph (here, upgraded to a theorem) is the observation that systems can be modelled as minima of variational free energy, since this records tautologies about what systems do and how they are perturbed by their environments. I define BMech under the FEP by formulating it in terms of an equivalent structure, maximum entropy. This work is in some sense a sequel to this paper by Da Costa et al—or, as I remark in Section 2.1, perhaps it’s a prequel.” What I don’t say is that the interplay between randomness and regularity is what drove me to postulate the use of constrained maximum entropy instead of VFE, which happens to be a productive alternative viewpoint to take (perhaps this is unsurprising) and that the fact that they are isomorphic and we get everything back in a new language with better maths to its name is a happy accident (although perhaps it was ordained). In the later sctions of [RSH+22], this physical motivation (which is in some sense the real story) gets its due.

Another bit of meta-commentary about this; when we have a hard problem in mathematics, it is common to prove things about a simpler problem that we can recover the hard problem from. I resist any insinuation that this work is of the calibre of Wiles’ proof of Fermat’s last theorem, but this is how Wiles proved Fermat’s last theorem—instead of proving a difficult number-theoretic result about Diophantine equations, he was able to prove an easier (relatively speaking—perhaps ‘more straightforward’ or ‘less hopeless’ would be better) but equivalent statement about certain elliptic curves and their modular forms. This got him the result he needed. In fact, it did so by shedding more light on the structure of the problem than previous number-theoretic methods, which were typically brute force, infinite descent arguments. (The interested reader can find a good account of this here: The Biggest Project in Modern Mathematics, Quanta Magazine.)

In fact, I’ll be brief in my coverage of Sections 2.3 and 3, since I think these are fairly self-explantory. Section 2.3 introduces the idea that we can switch from free energy over beliefs to constrained entropy over states, and that this symmetry is no accident. Section 3 deconstructs what the latter could actually mean.

Section 4 is the first of two big sets of results, and in S4 we meet approximate Bayesian inference and Bayesian mechanics. The first half of S4 (up to Theorem 4.1) is showing that the mathematics of some sort of physics of beliefs posed by the FEP is equivalent to constrained maximum entropy. Placing a constraint that observations of the environment are unsurprising captures the variational inference aspects of the FEP, in Proposition 4.1 and Corollary 4.1. This is equivalent to a constraint that a system whose internal states parameterise such a belief occupy some ideal value of that parameter, i.e., that the system is in harmony with its environment. That is, Lemmas 4.1 (belief parameterisation exists via \(\sigma\)) and 4.2 (variational inference can be formulated under L4.1) combine to make Proposition 4.2 (placing constraints on the parameters of beliefs does variational inference). So far, the argumentative sequence is fairly straightforward, and is a nice view on the FEP as a theory of control.

Using all these results, we can prove the approximate Bayesian inference lemma for mode-matching:

When conditioned on blanket states, the internal states of a self-evidencing system on average perform approximate Bayesian inference on external states, via a minimisation of variational free energy.

We then move on to formulate this equivalence in the dual sense, i.e., in terms of constraints on agent states per se, rather than the beliefs carried by internal states. The point of attack for this is the idea that placing constraints on beliefs is placing constraints on states as parameters for those beliefs, which we spoke about up to Proposition 4.2. We have a big technical section around page 25 which just establishes some data that allows me to invert \(\sigma\) under very generic conditions (i.e., almost always). So now we can talk about the environment’s beliefs about the agent—a relational symmetry, in the words of Rosen. This brings us to Theorem 4.2. There, we arrive at the idea that we can redo all the maths we did up to Proposition 4.2, but with \(\sigma\) inverted, and constraints placed on agent states as the domain of our beliefs about the agent—a bit different from constraints on agent states as parameters of its beliefs about us (or the environment, more generally—but here, an observer in the environment). What this means for the FEP is still being explored, but it definitely means we can think of systems doing inference over themselves and what sort of constraints they are obeying (I’m going to be talking about this next week; see here for more). There’s a wealth of other philosophical implications, discussed towards the end of [RSH+22]. The mathematical implications are a bit more straightforward: approximate Bayesian inference, and the FEP more broadly, is true whenever a certain, very generic constraint exists on the entropy of a system. As such, Bayesian mechanics is an apt new description of quite a lot of physics, phrasing physical dynamics in terms of belief mechanics under ABI (rather than physical mechanics).