Patterns and anomalies in big data can help businesses target likely customers, reveal fraud or even predict drug interactions. Unfortunately, these patterns are often not easily observable. To extract the needles of useful information out of haystacks of data, data scientists need increasingly powerful methods of machine learning.
The contribution of this lab is to expand the universe of tools and techniques.
Many machine learning and data mining algorithms use graphs, which are simply lists of connections between people, groups, or objects. Examples include “friend,” “like” or “follow” relationships in social networks, or the list of videos streamed or marked as favorites in a streaming subscription service.
These mountains of data hide useful information whose extraction belongs to an area known as graph inference. Graph inference has many interesting and useful applications—for example, suggesting movies in a streaming service based on viewing history or purchasing suggestions in online shopping. It also can reveal patterns in the spread of epidemics, or provide insights into the folding of proteins, which is important in understanding how proteins function.
This work for the first time proposes and analyzes techniques to improve graph inference by absorbing nongraph information, whose efficient blending with graph information was previously not well understood. Examples of non-graph information include a person’s age and residence ZIP code, which are individual attributes.
In almost every practical application involving graphs, there exist nongraph data of great relevance. The kind of work these researchers are doing is further upstream, developing the mathematical models, theory, and techniques, but it has widespread applications.
The second component of the research addresses data security. This work harnesses the natural variations of wireless channels to provide layers of security for data transmission. This area of work, known as physical layer security, aims to leverage the imperfections of the communication channel as a tool for security. Part of this research is aimed at developing techniques for making the presence of electronic communication undetectable to cybercriminals.
“To give a simple example, a password works by leveraging the difference between what is known by a legitimate user versus cybercriminals who want to steal information. This work creates, amplifies, and analyzes statistical asymmetry of information against adversaries in ways that do not involve passwords or keys, and uses them for securing communications.