The Sparse Coupling Conjecture: New Results

In a recent paper of mine, arXiv:2207.07620, I give a set of proofs that resolve (to the best extent which is likely possible, anyway) an open question in the study of complex systems and random dynamics. Here I’ll discuss the conjecture (including relevant background), and my results, at a high-level. This blog post is slightly lower-effort than usual as I am currently at ACT2022 (say hello if you are as well)—all the important stuff is here though.

Regular readers of this blog will most likely be familiar with the name Karl Friston and will find this superfluous. Folks from a broader audience might want to read it. In a 2021 paper—first appearing a year and a week ago, in fact, although the problem was being discussed for some time prior to this—a colleague and collaborator of mine, Karl Friston, introduced the sparse coupling conjecture, a statement motivated by the structure of a particular class of random dynamical system relevant for complex systems theory (and statistical physics more broadly, in fact). Take two coupled random dynamical systems, interacting in such a way that they are separable, i.e., are statistically distinguishable. That is, either (in fact, both) system’s statistics are conditionally independent of the other respective system, making them measurable and hence statistically distinct. This sort of structure is called a Markov blanket. The SCC is a conjecture that any sufficiently high-dimensional system with non-linearities and which exhibits certain features of complexity possesses a Markov blanket. There is a deeper statement that a Markov blanket is the source (or, the reflection) of such complexity, and this is one way of thinking about Bayesian mechanics.¹ However, purely mathematically, it is a statement that a certain class of systems with some properties that make them useful models of physics also come with the property of having a Markov blanket.

Markov blankets are, at their core, just about separability. Suppose two systems interact in such a way that they are coupled but do not mix. Then system $\mu$ ought to be independent of system $\eta$ given some variable separating them; that is to say, we should be able to infer everything about system $\mu$ from the interface $b$ between them. The Markov blanket is the locus of states that keeps one system from another, or, the quantity $b$ for which $p(\eta \mid b, \mu) = p(\eta \mid b)$ (and likewise for $\mu$). The sparse coupling conjecture is, in particular, a statement that all sufficiently high-dimensional and non-linear random dynamical systems have a Markov blanket. Some reasons why we might expect this are outlined in the introduction: high-dimensional and non-linear systems are usually the sorts of complex systems that can engage in complex behaviours to remain cohesive (in other words, a cohesive whole, seperate from its environment; imagine a system which begins sharing the fast flutuations of quantities that characterise the statistics of its environment, in such a way that it begins breaking into quickly changing pieces and dissipating into its environment). They are usually better at control, computation, and self-organisation; see the high-dimensionality of the brain or deep learning architectures, or the non-linearities in Turing pattern PDEs. Simply thinking mathematically, in larger state spaces there will be fewer states coupled at all, so as dimension increases the coupling structure will become sparser: not every state will couple to every other state. In high-dimensions there are many more orthogonal directions. And, there are $e^n$ almost-orthogonal unit vectors in $n$-dimensions (if one were to prove this they would use exactly the Hoeffding inequality). Also imagine that dimension reduction is a thing which leads to our submanifold question. Since maintaining an MB is exactly enforcing one such system-like attractor (a submanifold for the system), this is something we could take an interest in. We also have the intuition that higher-dimensional systems have more room to be sparse since they have more directions in which to flow, and additionally, more room to generate such submanifolds and still remain ‘systemic’—that is, these submanifolds can be larger and exhibit interesting dynamics in higher-dimensional systems. So there are heuristic but mathematically solid arguments that this blanket can be expected to be ubiquitous in high-dimensional systems. These arguments partially formalise the intuition that there is simply too much ‘room’ in the degrees of freedom of a high-dimensional system to not be sparsely coupled, and hence, will (almost) certainly have a weak blanket. (Note also that, by reducing this to a problem of frequency—how many states are blanketed, what proportion—we can use new techniques, like probabilistic estimates on the commonality of MBs in dimension $n$.)

Having a Markov blanket is a little more complicated than just $p(\eta \mid b, \mu) = p(\eta \mid b)$, as this implies $\ln p(\eta \mid b, \mu) = \ln p(\eta \mid b)$, and hence, $\partial_{\eta\mu} \ln p(\eta \mid b, \mu) = \partial_{\eta\mu} \ln p(\eta \mid b) = 0$; hence, there is a particular structure to the Hessian matrix of the surprisal of Markov blanketed systems. This implies a further relation uncovered by Conor Heins and Lancelot Da Costa earlier this year. It turns out this condition is pretty stringent, and, as pointed out by Miguel Aguilera and coauthors, strict MBs are a bit difficult to find in low-dimensional systems and it looks like it might be challenging to get them to generalise. Moreover, physical arguments would have us throw away strict separation anyway: complex systems like biophysical systems often interact with their environment in interesting ways, like eating, excreting, etc. The solution to this is obvious: weaken the strength of that separation to separation in almost all state variables on some key timescale. Not only is this physically sensible, but as it happens, we can get better results this way (this is probably not a coincidence). We can even prove the SCC.

So, a quick summary: sparse coupling is essentially a statement that every controlled system must have a Markov blanket in virtue of controlling itself and avoiding mixing with its environment, and more than that, that every distinct system has a blanket distinguishing itself. the SCC is largely a mathematical statement, just about high-dimensional and non-linear systems. We observe that MBs are a bit too restrictive to make physical sense in full generality. It just so happens that they are difficult to construct. We need to assume sparsity for them to make physical sense, and also to get decent results; but this is still a set of full measure and also allows us to get good results, whereas counterexamples exist on set of measure zero. In some sense, this result formalises the intuition that high-dimensional systems are better at control and computation. Further, it sheds light on the ubiquity of Markov blankets in systems that do not mix. Markov blankets can be found almost everywhere, in the literal measure-theoretic sense of almost everywhere and consequently the colloquial sense of existing for essentially every system (certainly every interesting, non-trivial, physically-realistic system).

Some intuitions for the proofs are as follows. Conor and Lance were able to cook up a particular condition on what is evidently some sort of inner product in their sparse coupling paper; namely, if a certain inner product is zero, there is a Markov blanket. In fact, this inner product is precisely the amount of conditional independence encoded in some $H_{\eta^i\mu^j}$ (verify this by looking at the penultimate equation of Theorem 1). Therefore we turn a topological problem of classifying all dynamical systems of a certain sort into a probabilistic one of how frequent certain dynamical systems are, which we turn back into a topological problem of the shape of the state space of such dynamical systems. How many inner products are small? Where are they? This subtelty is somewhat hidden but it’s the reason why the whole business works.

Adiabatically, we are thinking of the fact that things do not couple to things that change too fast. Certain variables change fast, implying a decoupling in slow variables. This also clarifies our hypotheses that the mean ought inner product to be zero. In high-dimensional systems, not every internal degree of freedom will couple to every external degree of freedom, making the measure on products of $Q$ and $H$ zero in expectation. Using this result we were able to show that the deviation from this average is, when suitably normalised, limitingly zero. The whole idea behind the proof is like saying there are more ways to be sparse, and if the mean is zero there are lots of ways to be nearly zero. That is, there is too much distance to cover for these inner products to be large, whilst on the other hand, there are lots of opportunities to be zero or nearly zero.

Following this, we prove all our statements on the degree of blanketedness, which says there are more ways to have a weak blanket (nearly zero inner product) than to not have a blanket at all (have a large blanket index). Thus the normalised blanket index seems to be the right metric to use, and we prove that it is small.

A simple combinatorial argument is a nice intuition with respect to the argument that there are more ways to be zero:

Proposition. For ternary $Q$ and $H$ and arbitrary $\eta^i\mu^j$, as dimension increases,

\[\left| \frac{\text{Ind}(J_{\eta^i\mu^j})}{\text{Ind}_{\text{max}}(J_{\eta^i\mu^j})} \right| \to 0\]

with probability one under a uniform measure on entries in $Q$ and $H$.
Proof. Let ${\bf F}_3$ be the field $\{-1, 0, 1\}$ and suppose entries in $Q$ and $H$ are valued in ${\bf F}_3$ (i.e., $Q$ and $H$ are ternary matrices), with those values chosen uniformly. Then we have the prefactor given in Lemma 1 by (6) and seek to prove that $\left| \text{Ind}(J_{\eta^i\mu^j}) \right| \ll n-1$ for arbitrary pairs of $Q$ and $H$ elements with probability one as $n$ increases. The inner product $\text{Ind}(J_{\eta^i\mu^j})$ is a sum over $n-1$ pairs of items in ${\bf F}_3$, since these are the possible values for entries in $Q_{\eta^i}$ and $H_{\mu^j}$. This completely determines the value of $\text{Ind}(J_{\eta^i\mu^j})$ as a function of the coefficients of $Q_{\eta^i}$ and $H_{\mu^j}$ regarded as vectors. Assuming the items in any such pair are ordered, we can enumerate the set of possible $Q_{\eta^i}$ and $H_{\mu^j}$ configurations by enumerating every way to choose $n-1$ combinations of pairs of ${\bf F}_3$ elements. That is to say, since $Q_{\eta^i}$ and $H_{\mu^j}$ are $n-1$ dimensional ternary vectors, and there are $3^{2(n-1)}$ ways of forming pairs from three elements $n-1$ times, given that order matters, there are $3^{2(n-1)}$ unique ways to have configurations of $(Q_{\eta^i}, H_{\mu^j})$. We now have the following: for the blanket index of almost every ternary $J_{\eta^i\mu^j}$ to be small as $n$ increases, the probability that $\left| \text{Ind}(J_{\eta^i\mu^j}) \right| \ll n-1$ needs to be near $3^{2(n-1)}$. In this way, every configuration of $J_{\eta^i\mu^j}$ is likely to have small blanket index as $n$ gets large. Since there are two sets of ways of making maximal combinations of entries (we have $n-1$ pairs valued in $\{-1, 1\}$ or $\{1, -1\}$ and $n-1$ pairs valued in $\{-1, -1\}$ or $\{1, 1\}$, both of which yield $\left| \text{Ind}(J_{\eta^i\mu^j}) \right| \ll n-1$), the number of maximal combinations of entries is $2(4^{n-1})$. This means that there are $9^{n-1} - 2(2^{2(n-1)})$ ways of producing a non-maximal blanket index where $\left| \text{Ind}(J_{\eta^i\mu^j}) \right| \ll n-1$ by construction. The probability of not choosing one of these methods under a uniform distribution is

\[\frac{9^{n-1} - 2(4^{n-1})}{9^{n-1}},\]

which clearly goes to one quickly with increasing $n$. $\blacksquare$

Note also that, relatively speaking, sets of measure zero can still be quite large; see for instance the infamous Cantor set. This result only says that Markov blankets are more frequent than non-blanketed systems in the natural world. In some sense we have really given a dual result: weak Markov blankets are exceedingly common, indeed, almost tautological. More faithful blankets where each such index vanishes are much more difficult to find in general but are also common, indeed almost definitional, of high-dimensional systems. This latter statement is the sparse coupling conjecture.

Simplicity, however, is most always an adiabatic approximation to microscopic complexity. There is an interesting remark towards the end of the section on the physics of Markov blankets, which basically asks how system-like subsystems self-assemble into a large, system-like system? For instance, how do stone-like particles self-assemble into a large stone-like MB? This is a problem of scales. At a certain scale those molecules are sparsely coupled. As you zoom out the Markov blankets between such particles are no longer visible, because that timescale is too fast and that spatial scale too small. At that scale it looks as though those molecules are strongly coupled. They give rise to a scale at which the locus of states they define become the states internal to some Markov blanket evincing the measurement of stone-ness at that scale. This is what we discover as a stone in nature. So, due to this very adiabatic approximation, the SCC can be extended to any system (low- or high-dimensional) with this initial sparsity under adiabatic hypotheses; moreover, the dimension of the state space in terms of states is always an effective count of the degrees of freedom of a system, where we necessarily do not know what sort of zeroes we could add to the system in terms of states which are too fast to couple to certain other states. This allows for a sort of sleight of hand where anything can be given a Markov blanket by conditioning on sufficiently many hidden states—instant Markov blanket, just add fast external dynamics—but this is actually nicely consistent with what we intuitively know to be true of all of statistical physics and effective field theory, where things are independent of the physics at scales which are too small to apply to those things… not to mention the observation from statistics that everything can be made Markovian if conditioned on sufficiently many hidden states. This basically says that there is no thing which is conditionally dependent on the state of the entire universe, which seems true, given that most of physics happens pretty locally. If Markov blankets are our ways of interfacing with the world locally, and thus observing objects, then this result is actually not that surprising.

As a final remark, morally, the SCC is a statement that systems whose flows are decoupled have an MB, which becomes the more precise statement that systems whose Jacobian entries in $\eta$ and $\mu$ are zero also have zero surprisal between $\eta$ and $\mu$, as measured by that Hessian entry. It would be interesting to extend this to path-based formalisms for statistical physics like maximum calibre or stochastic thermodynamics (or Bayesian mechanics, if that’s what unifies the two) and see how the conjecture changes shape to achieve the motivation just described.

For more, see the following page: darsakthi.github.io/research. ↩