HotCSE Seminar
Computational Science & Engineering
Wednesday September 23, 11am-12pm, 1116-W Klaus

Sparse Hierarchical Tucker Factorization and its Application to Healthcare

Ioakeim (Kimis) Perros
Advisor: Prof. Jimeng Sun


This work introduces a new tensor factorization method, called the Sparse Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors. Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker method, which aims to compute a tree-structured factorization of an input data set that may be readily interpreted by a domain expert. However, Sparse H-Tucker uses a nested sampling technique to overcome a key scalability problem in Hierarchical Tucker, which is the creation of an unwieldy intermediate dense core tensor; the result of our approach is a faster, more space-efficient, and more accurate method.
We extensively test our method on a real healthcare dataset, which is collected from 30K patients and results in an 18th order sparse data tensor. Unlike competing methods, Sparse H-Tucker can analyze the full data set on a single multi-threaded machine. It can also do so more accurately and in less time than the state-of-the-art: on a 12th order subset of the input data, Sparse H-Tucker is 18x more accurate and 7.5x faster than a previously state-of-the-art method. Even for analyzing low order tensors (e.g., 4-order), our method requires close to an order of magnitude less time and over two orders of magnitude less memory, as compared to traditional tensor factorization methods such as CP and Tucker. Moreover, we observe that Sparse H-Tucker scales nearly linearly in the number of non-zero tensor elements. The resulting model also provides an interpretable disease hierarchy, which is confirmed by a clinical expert.


Ioakeim (Kimis) Perros earned the Diploma and M. Sc. degrees in Electronic & Computer Engineering from the Technical University of Crete, Greece, in 2012 and 2014 respectively. Currently, he is with the SunLab group, working as a Research Assistant and pursuing a Ph.D. degree in Computational Science & Engineering from the Georgia Institute of Technology. His general research interests span the areas of Data Mining, Machine Learning and Healthcare Analytics. His current research focus is on developing knowledge extraction methods for high-dimensional healthcare and biological data.