CMU-CS-22-103 Computer Science Department School of Computer Science, Carnegie Mellon University
Mining and Learning With Graphs and Tensors Namyong Park Ph.D. Thesis May 2022
Data generated in diverse contexts can be modeled as graphs. Examples are numerous, from citation and social networks to the World Wide Web. Many real-world networks are multi-aspect, where multiple types of entities interact with each other via various relations. Also, many of them are dynamic, modeling relationships among entities and their features that evolve over time. These real-world networks with rich side information (e.g., node and edge types, and edge timestamps) are naturally modeled as tensors (i.e., multi-dimensional arrays). Given graphs and tensors, how can we understand them, and utilize them for downstream tasks? Specifically, how can we analyze and model large real-world networks, and gain a better understanding of how they form and evolve? Also, how can we design algorithms that leverage graphs and tensors for important applications such as recommendation and ranking? This thesis focuses on these fundamental problems by developing effective and efficient methods for mining and learning with graphs and tensors. In the first part of the thesis, we focus on addressing important mining and learning tasks for static graphs and tensors. We first propose novel graph-regularized semi-supervised algorithms for estimating node importance in a knowledge graph, which achieve up to 25% higher accuracy than the best baseline. Then we develop distributed frameworks for large-scale tensor factorization, which decompose and summarize large tensors up to 180x faster than existing methods, with near-linear scalability. We also design a meta-learning based approach for automatic graph learning model selection, which is up to 15x more accurate than using popular methods consistently. In addition, we develop a method that explains product recommendations, up to 21% more accurately than the best baseline, by performing personalized inference over a product graph. In the second part of the thesis, we focus on modeling and reasoning with dynamic graphs and tensors, which represent various types of time-evolving networks and dynamic real-world phenomena. We propose a framework to learn differential equations (DEs) that model the observed phenomena (such as weather and water quality), which produces interpretable and physically plausible DEs that achieve up to 34% higher forecasting accuracy than related baselines. We then tackle the task of finding communities in networks and tracking their evolution by designing a contrastive graph clustering framework, which shows up to 27% higher clustering accuracy than existing methods. Further, we develop a method for reasoning over temporal knowledge graphs (TKGs), which infers new knowledge from the given TKG, up to 116% more accurately than the best baseline, while being 30x faster in model training. Throughout the thesis, a strong emphasis is placed on developing effective, accurate, and scalable tools. To this end, we use mathematical techniques (e.g., approximation), exploit the characteristics of real-world networks, incorporate prior knowledge and experience, and employ powerful theoretical and practical frameworks, including graph neural networks, latent variable modeling, temporal point processes, and distributed computing. We successfully apply these tools to a large array of real-world datasets and applications, establishing new state-of-the-art results.
286 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |