Wednesday November 04, 11am-12pm, 1116-W Klaus
An Input-Adaptive and In-Place Approach to Dense Tensor-Times-Matrix Multiply.
Advisor: Prof. Richard Vuduc
This talk describes a novel framework, called InTensLi (“intensely”), for producing fast single-node implementations of dense tensor-times-matrix multiply (Ttm) of arbitrary dimension. Whereas conventional implementations of Ttm rely on explicitly converting the input tensor operand into a matrix—in order to be able to use any available and fast general matrix-matrix multiply (Gemm) implementation— our framework’s strategy is to carry out the Ttm in-place, avoiding this copy. As the resulting implementations expose tuning parameters, this paper also describes a heuristic empirical model for selecting an optimal configuration based on the Ttm’s inputs. When compared to widely used single-node Ttm implementations that are available in the Tensor Toolbox and Cyclops Tensor Framework (Ctf), InTensLi’s in-place and input-adaptive Ttm implementations achieve 4× and 13× speedups, showing Gemm-like performance on a variety of input sizes.
Jiajia Li is a 3rd year PhD student in Computational Science & Engineering, Georgia Institute of Technology. She works in Dr. Richard Vuduc’s hpcgarage group. Her research area is high performance computing, which includes optimizing linear algebra kernels, such as dense/sparse matrix and tensor operations on various platforms.