Using the visual cortex as a model in the human brain, the research group led by ERC-award-winner Thomas Pock has developed new mathematical models and algorithms as the basis for faster and more intelligent image processing programs.
Our visual cortex can capture images and recognize objects in a fraction of a second, even if they are barely visible or only fragmentary. One reason for this fantastic peak performance is the highly efficient hierarchical layer architecture of the visual cortex. It filters the visual information, recognizes connections, and completes the image using familiar patterns. The process behind this is still hardly understood in its complexity. It is true that deep learning algorithms now exist that can match or, in some cases, exceed human performance on certain pattern recognition tasks. One disadvantage of these algorithms, however, is that it is hard to understand what they have learned, how they work, or when they make mistakes.
Thomas Pock from the Institute of Computer Graphics and Vision at Graz University of Technology (TU Graz) was on the trail of this knowledge as part of his ERC Starting Grant project HOMOVIS (High-Level Prior Models for Computer Vision). He worked intensively on the question of how known modes of operation of the visual cortex can be calculated using mathematical models and transferred to image processing applications. After five years of research, 41 publications, and one patent later, the researcher and his research group have accumulated extensive knowledge that enables new image processing algorithms for a wide variety of applications.
The main founder of Gestalt psychology used these laws to try to explain the process of human vision, in which stimuli and sensory impressions are put together to form a large whole. “Humans can already correctly recognize partial or incomplete objects on the basis of single points or subjective contours (illusory contours). The human brain automatically fills in the missing image information. For example, by connecting the points via curves that are as smooth as possible,” says Pock. Pock and his team described this phenomenon of shape finding for the first time using mathematical models based on Euler’s elastic curves—a famous equation by the mathematician Leonhard Euler that can be used to calculate curves of minimum curvature.
Representation in a higher-dimensional space
Based on Euler’s elastic curves, Pock’s group developed new algorithms to solve certain curvature-dependent image processing problems. Consequently, the solution is all the easier if the (2D) images and their features are represented as data points in three-dimensional space. “In the third dimension, we get an additional variable with the orientation of the object edges,” Pock explains. This, too, is modeled on human vision and goes back to the pioneering work of two Nobel laureates, David Hubel and Torsten Wiesel, who established in 1959 that the visual cortex is composed of orientation-sensitive layers.
From a mathematical and computer science point of view, the biggest advantage of this three-dimensional embedding is that image processing problems can be solved using convex optimization algorithms. In mathematical optimization, the boundary between convex and non-convex optimization is considered as the great barrier that distinguishes solvable from unsolvable problems. “Thus, we are guaranteed to be able to calculate the best image for all the given input images—of course, only with respect to the mathematical model used,” says Pock.
Future Outlook
Now, Pock and his team are working on improved models that combine the known structural properties of the visual cortex with deep-learning algorithms. The goal is to develop models that perform as well as current deep-learning algorithms, but also allow a deeper understanding of the structures learned. Initial successes have already been achieved in the reconstruction of computer tomography and magnetic resonance images. “With the newly developed algorithms, it is now possible to reconstruct images with the highest quality despite less data being recorded. This saves time and computing power, and thus also costs,” explains Pock.