Classical Physics for the Bayesian Mechanic (Bayesian Mechanics III)
Here I’ll briefly summarise some of the coarser points of a recent preprint of mine, arXiv:2206.12996, and in doing so, summarise the state of the art of Bayesian mechanics. The introduction of the paper frames this summary quite well. I’ll get to talk about supersymmetry too, which will be fun.
Everybody likes examples. It’s well and good to speak in abstracts, but a good example seems to make things ‘click’ better than any statement or restatement of a general principle. One thing the free energy principle and Bayesian mechanics appear to lack is examples. It’s important to note that that isn’t quite true, but nevertheless, the impression one gets when looking at the literature is that it’s a mathematical theory, spoken of in full generality, and occasionally makes its way down to earth to meet physics or biology. A page summarising my research interests now has, at the bottom, a Bayesian mechanical bibliography; here I’ve recorded some key moments in the story so far. Only two of those key moments are about examples. I’m partially guilty of making that worse, and I recognise that it’s not a terribly reader-friendly state of affairs. This was the beginning of my motivation for writing the paper I will discuss here.
More specifically, in developing the narrative for this paper (we’ll call it [RSH+22]) it became apparent to me that there was a very nice way of tying Bayesian mechanics into the classical stationary action principle, which Karl has emphasised is one ‘right’ way of thinking about the free energy principle from the beginning; it was also pointed out to me by Brennan Klein that, just like we have a few different use cases of Newton’s laws of motion under least action, it looks like we might have different use cases of Bayesian inference under what I have presumptuously called ‘least surprisal.’ Between that, and my ongoing desire to write Bayesian mechanics as a consequence of the principle of maximum calibre (what Maxwell Ramstead has referred to as \(G\)-theory), there was enough there to actually write something.
The critical Section 2.2 begins with a discussion of what the FEP as a least surprisal principle looks like; that is, I sketch out some definitions of what it looks like for a trajectory to make a surprisal action stationary. The statements here are partially new but mostly tidied up versions of those in this paper. From there it becomes obvious what it means to be a stationary action principle for random dynamics: the FEP and least surprisal is just a statement that the noise in a random process contributes to fluctuations away from a mean path, and that the path we expect is this expected path, which minimises the action by having no fluctuations. If it seems a tautology, that’s because it is, but like the tautology of classical least action, it’s a very useful one.
The rest of the paper walks through the same typology as what exists in the abstract in [RSH+22], with the concrete boundary condition that the mode we try to match is a classical equation of motion. Ultimately, the paper just demonstrates that minimising surprisal parameterised by classical equations of motion gets you classical equations of motion, which is precisely the tautology characterising attractor states in Bayesian mechanics—the idea is really as simple as, ‘systems that go to attractors are unsurprising when they go to attractors.’ If \(P(s)\) is ‘system occupies an attractor’ and \(Q(s)\) is ‘system minimises surprisal on that attractor,’ then the converse (\(Q(s)\implies P(s)\)) and inverse (\(\neg P(s)\implies\neg Q(s)\)) statements are more interesting, and these are what ugrade BMech to a modelling tool which says something useful about a system; these are ‘systems minimise surprise go to an attractor’ and ‘the surprisal of systems that do not go to attractors increases beyond what is acceptable to a system in such an attractor,’ respectively. Consistent with this, we are able to show that a quantum system minimising its surprisal (\(Q(s)\)) goes to a classical attractor (\(P(s)\)), producing classical physics (Section 4). This is the converse to the idea that classical systems (\(P(s)\)) minimise their surprisal, avoiding quantum fluctuations off of classical attractors (\(Q(s)\)). The inverse follows in the obvious way: we know systems that deviate from classical EoMs must have some sort of fluctuation, here, assumed to be contributions due to quantum effects (as opposed to other sorts of stochastic noise, although to what extent those are functionally different is a physical question, rather than a mathematical one).
In Sections 5, 6, and 7, we have a few such examples of what that attractor is, one being a set of states around a fixed mode, another being a set of states around a moving mode, and the third being an attractor in a path space. There’s far more to say about the topology of this lattermost situation, as well as chaotic attractors in Bayesian mechanics, that we don’t get to cover. We do get to sketch out what it looks like to do a gradient descent on a path space, and this functional gradient becomes equivalent to the Euler-Lagrange equation as a gradient in a path space, which is a neat result. These worked examples give us an idea of what mode-matching, mode-tracking, and path-tracking are (as introduced in this paper), the various guises of approximate Bayesian inference under the free energy principle, and hence, the different sorts of Bayesian mechanics we could be interesteed in. In the mode-tracking cases we also get back the Helmholtz decomposition after some partially informal arguments, which is a nice result, and which appears everywhere in the literature (cf. here, here, and here).
We lastly discover that the degree of exploration within one such attractor—for instance, the exponential divergence of trajectories in ergodic systems which explore every state in an attractor chaotically—has something to do with supersymmetry. Here I’ll carry on about supersymmetry a bit, but the key deliverables of the paper have been summarised thus far in such a way that this portion of the post is extra material. In a future post I intend on talking a lot more about supersymmetry and this last set of results in greater depth.
Supersymmetry initially came from the observation that certain grand unified theories are invariant under exchanges of fermionic particles (matter, or ‘stuff’) and bosonic particles (forces). That usually means there is a special operator called a supercharge (often this is a particular operator which is also called a BRST operator) that changes fermions to bosons in the theory, where the operator and its adjoint do not commute. This superalgebra (super-X is just a mathematical word for some X where there are two sub-X’s, one of whom has elements that anti-commute; here we have bosonic and fermionic algebras, and fermions anti-commute) allows us to talk about both types of physical degrees of freedom in one theory, and relate them, without an initial ad hoc separation of the two.
The interesting thing about noticing that a supersymmetry pops out of our Bayesian mechanical path integral is two-fold. One, Bayesian mechanics makes the path integral approach to classical physics absolutely natural; it simply makes sense in light of the relation to quantum fluctuations. Two, we know from previous work (cited therein) that the fermionic degrees of freedom that pop out correspond to Jacobi fields, which measure how paths that begin close to one another diverge. Thus there is a promising connection to classical chaos in \(G\)-theory that deserves more attention, and which is reminiscent of stochastic supersymmetries, and features of complexity in condensed matter. We have yet to experience the full power of supersymmetric Bayesian mechanics, but the glimpse we are offered in this paper is nice.