Principal Investigators: Song Han, Anantha Chandrakasan
Natural language processing (NLP) models based on transformers and the attention mechanism have become widely used for recommendation systems, language modeling, question-answering tasks, sentiment analysis, and machine translation. In recent years, much focus has been placed on accelerating convolutional neural networks (CNNs) for image processing applications. There have been several works that accelerate recurrent neural networks (RNNs), but transformer models and the attention mechanism have remained largely neglected until recently. As more devices rely on voice commands, it becomes more critical to develop efficient processors for language processing directly on the edge device to ensure privacy, low latency, and extended battery life.
Our goal is to accelerate the entire transformer model (as opposed to just the attention mechanism) to further reduce data movement across layers. We will exploit domain-specific properties of NLP models using techniques such as token pruning, piecewise linear quantization, and low-precision softmax to reduce memory and computation requirements and enable high-performance NLP models to be deployed on edge devices.
In collaboration with: Alex Ji, Hanrui Wang
Explore
Energy-Efficient and Environmentally Sustainable Computing Systems Leveraging Three-Dimensional Integrated Circuits
Wednesday, May 14, 2025 | 12:00 - 1:00pm ET
Hybrid
Zoom & MIT Campus
Analog Compute-in-Memory Accelerators for Deep Learning
Wednesday, April 30, 2025 | 12:00 - 1:00pm ET
Hybrid
Zoom & MIT Campus
Agile Design of Domain-Specific Hardware Accelerators and Compilers
Wednesday, February 12, 2025
Hybrid
Zoom & MIT Campus