That is... an unusual use of regression, to say the least. When I see something sufficiently off the wall, I always wonder how the authors happened to think of that. It also raises a lot of questions like, how much of physics can be usefully approximated by some random forests? Could you replace most of a physics engine with an appropriate small neural network?
Random Forests are part of how the LHC found the higgs, based on my armchair understanding of the slide decks.
Measurement, Monte Carlo and Machine Learning form an interesting triangle. My impression of the LHC pipe is they used monte carlo model sims to train up classifiers that would flag the data relevant for distinguishing alternate models.
The major difference there is that LHC data measures processes which are inherently stochastic (because QM), so using these methods is pretty natural. The OP, however, is applying the same methods to highly complex, but nominally deterministic problems.
I'm curious about this. A lot of us in the computational world have to deal with able but slow iterative methods for things like fluid dynamics and electromagnetic simulations. I'd be interested to know if this works with MHD, and how well it does.
It probably would, but it is harder to get performance guarantees. Seems like a taylor expansion on steroids, but with the taylor expansion you get analytical tractability (to prove correct convergence) along with speed, which you don't get in this case.
Check out NeuroAnimator[0] from 1997 (also early work from Hinton). It covers some of this using local-spaced hierarchies of neural networks that predict the deltas for the next time-step.
Given that ANNs are universal function approximators, it is natural that one would use them to actually generate a simplified model of simulations. What you see here, taking an already accurate model and generating an approximate implementation, is definitely rare but not unexpected. Long-running simulations can have their run-time drastically compressed simply by developing a neural network capable of being parallelized on GPUs, as in this paper. The ANN run time could be further compressed by generating a single-hidden-layer equivalent. Even though they are known to be exponentially larger than multiple hidden layers, the implementation can be valuable in real-time systems where analysis latency is at a premium, e.g. image recognition in machine control systems.
Neural nets are seen as universal function approximators when you study them in detail. While it isn't the usual "neural net news" you see posted online, this is exactly what ANNs are meant to do.
It occurred to me and I work as a web developer. Just think of things in abstract enough terms and possibilities like this pop out. Having the technical sophistication and time to do the work is another matter.
If it isn't based on the Navier Stokes equations, can I actually use this to reliably simulate fluid behavior? Or is this more useful for creating pseudo-realistic water scenes?
This is the latter. There are a lot of reasons you would not want this for accurate simulation, one being that you basically prime your simulation on your training data. The other one being that SPH is already quite an approximation of Navier Stokes.
But for real-time animation, this is very nice. The problem with particle fluid simulations is that everything scales linearly, but only in the number of operations per particle per simulation timestep. But the more particles you have, the smaller the timestep needs to be in order to not let the simulation explode. And this is a real problem, because you want multi-million particle simulations for realistic water, but then the number of steps you need to compute per frame increases.
They managed to do a single simulation step per frame.
"Our approach showed the potential to be a good replacement
of standard solvers in settings, where running times is more important than the exactness of a simulation, such as in computer games or interactive design."
"In our future work we want to [...] combine learning methods with standard solvers to obtain both fast and highly accurate simulations."
It appears to be based on Navier-Stokes, from the abstract:
> We designed a feature vector, directly modelling individual forces and constraints from the Navier-Stokes equations, giving the method strong generalization properties to reliably predict positions and velocities of particles in a large time step setting on yet unseen test videos.
The main goal of mechanics is finding solutions for important variables of interest, from which every other dynamical variable can be easily computed. In fluid mechanics, the main objective for a given problem is finding a velocity field (once you have this, you're done). Since the NS equations are non-linear partial differential equations, finding exact solutions is impossible in most cases. Some notable non-trivial exceptions exist [1]. Navier-Stokes is extremely successful in describing flows in many real physical situations. It is believed, but has not been experimentally confirmed, that Navier-Stokes successfully describes turbulent flow. Most physical theories have a fairly well known domain in which they are applicable and not applicable (Newtonian physics breaks down at velocities near the speed of light). Currently, we don't even know if solutions to NS always exist, never mind make physical sense [2]. With this in mind, we have to acknowledge the uncertainty of the correctness of NS in describing an arbitrary flow. It is backed by well-grounded theory (parts of the equations can be derived from conservation laws) and works under many different conditions. This comment landed up longer than I expected, but my main point is that the question can we "reliably simulate fluid behavior" for an arbitrary fluid - is largely up in the air [3].
However, we should restrict our attention to flows that we very strongly believe are correctly described by NS. From the original paper:
>> The main problem of our method is the same as the weakness of all machine learning approaches; the learning methods are not capable to extrapolate the model far outside the data observed during training.
The training data are existing numerical simulations of specific fluid flows. The learning algorithm learned fluid flow dynamics from simulations. This is extremely impressive. It also achieves a significant speed up in simulation time - which is also very impressive. However, as the authors say, it does not generalise well. So, in short the answer to your first question is probably "sometimes yes, but in general not really".
edit: I should add that the simulations in the paper are decidedly based on Navier-Stokes.
I think they are using navies stokes to generate the dataset to learn from.. and the regression forest learns how to reproduce what navier stokes did with a bunch of smaller timesteps, but learns to do it in a single timestep.. which sounds amazing so I'm probably wrong
Not a stupid question at all! Not exactly the same approach, but there's been some recent interesting research on neural net and ML driven global illumination:
Reminds me a lot of various machine learning approaches to molecular dynamics - basically trying to fit a cheap function to the energy landscape. I wonder if the more classical (rather than quantum) nature of these systems makes it more approachable?
So in my layman's understanding of this, instead of a normal physics engine, which simulates particle movement in small timesteps, this treats the whole system as more of statistical problem? How does machine learning factor into this?
Can anyone explain very roughly what's going on here? I'm familiar with physics engines, but couldn't make heads or tails of the paper or wikipedia's article on random forests...
as one of the authors, I might be able to answer some of the questions raised in this discussion
cba9 - In theory given sufficient amount of data (which can be generated), any physics engine can be approximated by a machine learning algorithm, however it might be tricky to make it faster than an existing solver. We got to the conclusion NN can not beat the fastest solvers (PBF). This might change with the newest development of the hardware.
krapht - The feature vector is based on N-S equations, so there is a potential to approximate the solver to any precision
david_ar - We tried to analyze the model, and as for many learning approaches it seem kinda impossible to see what is going on.
fenomas - Instead of solving diff equations, we train a regression model, which predicts the same thing as a numerical solver.
lifeformed - any regression is a machine learning approach by definition
daniel-levin - at the time we wrote the paper, we were not sure about the generalization properties. We didn't find any failure case either (if we did, we would just include such simulations in the training set). However, in the last week before the conference we tried various different forms of additional forces to simulate sand/ash/whatever and to do so we used the model trained only on water simulations. Surprisingly, it generalized quite well. However, not extrapolating outside the training set has some advantages - our method is stable and simply never diverges even if we try really hard.
irascible - after we do some optimizations (I guess it can be sped up by a factor of 10 or so) and finish/publish all kinds of other materials (smoke, fire, foam, elastic bodies, fractures, cloth, ..), we plan to do library, a plugin for Unity, etc and licence it