Tuesday, October 19, 2021 | 3:00pm – 4:15pm ET
Speakers: Yujun Lin and Song Han, MIT
Deep learning on point clouds plays a vital role in a wide range of applications such as autonomous driving and AR/VR. These applications interact with people in real time on edge devices and thus require low latency and low energy. Compared to projecting the point cloud to 2D space, directly processing 3D point cloud yields higher accuracy and lower #MACs. However, the extremely sparse nature of point cloud poses challenges to hardware acceleration. For example, we need to explicitly determine the nonzero outputs and search for the nonzero neighbors (mapping operation), which is unsupported in existing accelerators. Furthermore, explicit gather and scatter of sparse features are required, resulting in large data movement overhead. In this paper, we comprehensively analyze the performance bottleneck of modern point cloud networks on CPU/GPU/TPU. To address the challenges, we then present PointAcc, a novel point cloud deep learning accelerator. PointAcc maps diverse mapping operations onto one versatile ranking-based kernel, streams the sparse computation with configurable caching, and temporally fuses consecutive dense layers to reduce the memory footprint. Evaluated on 8 point cloud models across 4 applications, PointAcc achieves 3.7X speedup and 22X energy savings over RTX 2080Ti GPU. Codesigned with light-weight neural networks, PointAcc rivals the prior accelerator Mesorasi by 100X speedup with 9.1% higher accuracy running segmentation on the S3DIS dataset. PointAcc paves the way for efficient point cloud recognition.
Speaker Bio: Yujun Lin is a 4th year Ph.D. student at MIT, advised by Prof.Song Han. He received B.Eng from Tsinghua University. His research is at the intersection of computer architecture and machine learning, especially software and hardware co-design for deep learning and its applications.
Speaker Bio: Song Han is an assistant professor in MIT’s Department of Electrical Engineering and Computer Science. His research focuses on efficient deep learning computing. He has proposed “deep compression” as a way to reduce neural network size by an order of magnitude, and the hardware implementation “efficient inference engine” that first exploited model compression and weight sparsity in deep learning accelerators. His team’s work on hardware-aware neural architecture search has been integrated by PyTorch and AutoGluon, and received six low-power computer vision contest awards in flagship AI conferences. He has received best paper awards at the International Conference on Learning Representations and Field-Programmable Gate Arrays symposium. He is also a recipient of an NSF Career Award and MIT Tech Review’s 35 Innovators Under 35 award. Many of his pruning, compression, and acceleration techniques have been integrated into commercial artificial intelligence chips. He earned a PhD in electrical engineering from Stanford University.
Appeared at the International Symposium on Microarchitecture (MICRO) 2021 October 18-22, view Session Video and Project Details.
Explore
Photonic Processor Could Enable Ultrafast AI Computations with Extreme Energy Efficiency
Adam Zewe | MIT News
This new device uses light to perform the key operations of a deep neural network on a chip, opening the door to high-speed processors that can learn in real-time.
AI Method Radically Speeds Predictions of Materials’ Thermal Properties
Adam Zewe | MIT News
The approach could help engineers design more efficient energy-conversion systems and faster microelectronic devices, reducing waste heat.
A New Way to Let AI Chatbots Converse All Day without Crashing
Adam Zewe | MIT News
Researchers developed a simple yet effective solution for a puzzling problem that can worsen the performance of large language models such as ChatGPT.