Statistical inference and the parallel transport of probability
I gave the following talk at the Union College Mathematics Conference, in the Stochastic Analysis and Applications track.
Abstract:
Methods in statistics like maximum entropy usually focus on the probability measure associated to a dynamical system or field theory with probabilistic degrees of freedom; in so doing, statistical inference yields the probability of observing any given state of the system under some randomness. We prove that the solution to maximum entropy is parallel transport over the state space of the system. This gives a principled reason for the almost unreasonable success of certain energy-based statistical algorithms, in that finding the probability measure over states is simplified from solving a difficult PDE to an entirely geometric characterisation of a flow along a potential function.
Remark: In the beginning when I mention that stationary diffusion equations maximise entropy ‘via integration by parts,’ this is a reference to the fact that the time evolution of a probability density under Brownian motion can be made into a Fokker–Planck equation using an integration by parts identity, which naturally minimises the Lagangian of the entropy (the current) when stationary.
Slides: UCMC Jun 2022.pdf
Other details: link to website