Wednesday Mar 24, 12pm-1pm, [Meeting Link]
Dense Semiring Linear Algebra on Modern CUDA Hardware
Advisor: Prof. Richard Vuduc
We present a new open-source library for dense semiring matrix multiply (SRGEMM) operations on GPUs. Based on a fork of Nvidia Cutlass, our approach achieves close to peak instruction throughput on modern CUDA hardware. We also describe the model-based tuning strategy we use to attain this level of performance. Traditionally, hardware vendors and their supplied linear algebra libraries discuss peak theoretical and achieved performance as the maximum number of floating point Fused Multiply Accumulate (FMA) operations that can be performed in a second. When considering the peak performance of semiring linear algebra on modern hardware, Rpeak is not an insightful metric. The hardware does not expose any means to achieve this peak for non-FMA instructions. Finally, we discuss the use of this library in a large-scale knowledge discovery application whose bottleneck is a dense, 2D distributed All-Pairs Shortest Path Algorithm on ORNL Summit.
Vijay is a second-year computer science master student at Georgia Tech working on accelerated computing and parallel algorithms with Dr. Rich Vuduc. He first fell in love with HPC during his junior year when he started the student cluster team during his time at Boston University. He continues to mentor a cluster competition team at Georgia Tech, and is the graduate mentor for Team Phoenix at SC20 and SC21. Apart from of his personal research and SCC involvement, Vijay enjoys volunteering at conferences, serving as a student volunteer at SC19, HotChips 32 and as a lead student volunteer at SC20 and SC21. Currently, he’s an intern under the Cutlass team at Nvidia.