Machine learning, when used in climate science builds an actual understanding of the climate system. This means we can trust machine learning and further its applications in climate science, say the authors.
Man or machine
Large, complex climate models are often impractical to work with as they need to run for months on supercomputers. As an alternative, climate scientists often study simplified models.
Generally, two different approaches are used to simplify climate models: A top-down approach where climate experts estimate what impact left out functions will have on the parts kept in the reduced model. And a bottom-up approach, where climate data is fed a machine learning program, which then simulates the climate system.
The two methods turn out comparable results. It is a challenging problem, however, to physically understand data-driven (bottom-up) approaches to fully trust them. Do machine learning programs ‘understand’ that they are dealing with a complex dynamical system, or are they simply good at statistically guessing the right answers?
Intelligent solution
Now, a group of scientists prove analytically and using computer simulations, that a machine learning program called Empirical Model Reduction (EMR) in fact knows what it is doing. The study shows that this computer program reaches comparable results to the top-down reductions of larger models because machine learning constructs its own version of a climate model in its software.
“I think what we do in this investigation is give some sort of physical evidence of why this particular data-driven protocol works. And that to me is quite meaningful, because the method has been in the atmospheric sciences for quite a long time. Yet there was still quite a lot of gaps in the understanding of the methodologies,” says Ph.D. student Manuel Santos Gutiérrez.
Encouraging and useful
The study indicates that the machine learning method is dynamically and physically sound and produces robust simulations. According to the authors, this should motivate the further use of data-driven methods in climate science as well as other sciences.
“It is a very encouraging step. Because in some sense, it means the data-driven method is intelligent. It is not an emulator of data. It is a model that captures the dynamical processes. It is able to reconstruct what lies behind the data. And that indicates these theoretical derivations give you an object which is algorithmically useful,” says Valerio Lucarini, professor of statistical mechanics at the University of Reading.
The result is important in a range of fields: applied mathematics, statistical physics, data science, climate science, and complex system science. And it will have implications in a range of industrial contexts, where complex, dynamical systems are studied but only partial information is accessible—like the engineering of airplanes, ships, wind turbines, or in traffic modeling, energy grids, distribution networks.