MIT AI Hardware Program

Fall Research Update

October 14, 2022
10:00 AM - 1:00 PM ET
Virtual Event


About

This online meeting gave progress updates on research projects supported by the program. Principal investigators, students, and researchers gave 12 minute talks on each project followed by a short Q&A. Program co-leads made a short presentation giving an overview of the program and upcoming activities.

Presentations are recorded and available below.


Agenda


Each talk will be 12 minutes followed by a 3 minute Q&A session.

Fall Research Update

October 14, 2022
10:00 AM – 1:00 PM
10:00-10:10 AM ET Greetings and Program Update

Aude Oliva and Jes​​ús del Alamo

10:10-10:25 AM ET Introduction to Photonic Deep Learning

Dirk Englund

10:25-10:40 AM ET Delocalized photonic deep learning on the internet’s edge

Alex Sludds [Dirk Englund]

10:40-10:55 AM ET Single chip photonic deep neural network with accelerated training

Saumil Bandyopadhyay [Dirk Englund]

10:55-11:10 AM ET
 MCUNet: Tiny Deep Learning on IoT Devices

Song Han

11:10-11:25 AM ET On-Device Training under 256KB Memory

Song Han

11:25-11:40 AM ET Ferroelectric Synapse Technology for Analog Deep Neural Networks

Jes​​ús del Alamo

11:40-11:50 AM ET Break
11:50-12:15 PM ET Keynote: Devices and Algorithms for Analog Deep Learning

Murat Önen [Jes​​ús del Alamo]

12:15-12:30 PM ET Exploring Energy Efficient Analog Neural Network Accelerator Designs

Joel Emer and Vivienne Sze

12:30-12:45 PM ET A 50nJ/Token Transformer Accelerator with Adaptive Model Configuration, Word Elimination, and Dynamic Memory Gating

Alex Ji [Anantha Chandrakasan]

12:45-1:00 PM ET A Threshold-Implementation-Based Neural-Network Accelerator Securing Model Parameters and Inputs Against Power and Electromagnetic Side-Channel Attacks

Saurav Maji [Anantha Chandrakasan]

Introduction to Photonic Deep Learning

Dirk Englund

The world of quantum mechanics holds enormous potential to address unsolved problems in communications, computation, and precision measurements. Efforts are underway across the globe to develop such technologies in various physical systems, including atoms, superconductors, and topological states of matter. The Englund group pursues experimental and theoretical research towards quantum technologies using photons and semiconductor spins, combining techniques from atomic physics, optoelectronics, and modern nanofabrication.

Download Slides

Delocalized photonic deep learning on the internet's edge

Alex Sludds [Dirk Englund]

Advances in deep neural networks (DNNs) are transforming science and technology. However, the increasing computational demands of the most powerful DNNs limit deployment on lowpower devices, such as smartphones and sensors and this trend is accelerated by the simultaneous move towards Internet-of-Things (IoT) devices. Numerous efforts are underway to lower power consumption, but a fundamental bottleneck remains due to energy consumption in matrix algebra, even for analog approaches including neuromorphic, analog memory and photonic meshes. Here we introduce and demonstrate a new approach that sharply reduces energy required for matrix algebra by doing away with weight memory access on edge devices, enabling orders of magnitude energy and latency reduction. At the core of our approach is a new concept that decentralizes the DNN for delocalized, optically accelerated matrix algebra on edge devices. Using a silicon photonic smart transceiver, we demonstrate experimentally that this scheme, termed Netcast, dramatically reduces energy consumption. We demonstrate operation in a photon-starved environment with 40 aJ/multiply of optical energy for 98.8% accurate image recognition and< 1 photon/multiply using single photon detectors. Furthermore, we show realistic deployment of our system, classifying images with 3 THz of bandwidth over 86 km of deployed optical fiber in a Boston-area fiber network. Our approach enables computing on a new generation of edge devices with speeds comparable to modern digital electronics and power consumption that is orders of magnitude lower.

Download Slides

Single chip photonic deep neural network with accelerated training

Saumil Bandyopadhyay [Dirk Englund]

As deep neural networks (DNNs) revolutionize machine learning, energy consumption and throughput are emerging as fundamental limitations of CMOS electronics. This has motivated a search for new hardware architectures optimized for artificial intelligence, such as electronic systolic arrays, memristor crossbar arrays, and optical accelerators. Optical systems can perform linear matrix operations at exceptionally high rate and efficiency, motivating recent demonstrations of low latency linear algebra and optical energy consumption below a photon per multiply-accumulate operation. However, demonstrating systems that co-integrate both linear and nonlinear processing units in a single chip remains a central challenge. Here we introduce such a system in a scalable photonic integrated circuit (PIC), enabled by several key advances: (i) high-bandwidth and low-power programmable nonlinear optical function units (NOFUs); (ii) coherent matrix multiplication units (CMXUs); and (iii) in situ training with optical acceleration. We experimentally demonstrate this fully-integrated coherent optical neural network (FICONN) architecture for a 3-layer DNN comprising 12 NOFUs and three CMXUs operating in the telecom C-band. Using in situ training on a vowel classification task, the FICONN achieves 92.7% accuracy on a test set, which is identical to the accuracy obtained on a digital computer with the same number of weights.

Download Slides

MCUNet: Tiny Deep Learning on IoT Devices

Song Han

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magni-tude smaller even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. Our study suggests that the era of always-on tiny machine learning on IoT devices has arrived.

On-Device Training under 256KB Memory

Song Han

On-device training enables the model to adapt to new data collected from the sensors. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. We propose Quantization-Aware Scaling to calibrate the gradient scales and stabilize 8-bit quantized training, and Sparse Update to skip the gradient computation of less important layers and sub-tensors. The algorithm innovation is implemented by a lightweight training system, Tiny Training Engine, which prunes the backward computation graph to support sparse updates and offload the runtime auto-differentiation to compile time. Our framework uses less than 1/1000 of the training memory of existing frameworks while matching the accuracy.

Ferroelectric Synapse Technology for Analog Deep Neural Networks

Jesús del Alamo

Analog crossbar arrays based on programmable non-volatile resistors are under intense investigation for acceleration of deep neural network training. Analog accelerators promise many-fold improvements in energy efficiency and throughput as compared with digital implementations. Resistive device requirements for this application in terms of speed, switching energy, linearity and endurance are relatively well understood. As of today, however, no device technology has been identified that meets them all. Addressing this important need is the goal of this research project. We aim to develop a CMOS-compatible non-volatile programmable resistor technology that exploits gradual ferroelectric switching in a multi-domain thin-film. It consists of a 3D thin-film ferroelectric HfZrO/InGaZnO vertical nanowire or vertical nanosheet transistor structure with nanoscale footprint to enable high-density 3D arrays.

Download Slides

Keynote: Devices and Algorithms for Analog Deep Learning

Murat Önen [Jesús del Alamo]

Efforts to realize analog processors have skyrocketed over the last decade as having energy-efficient deep learning accelerators became imperative for the future of information processing. However, the absence of two entangled components creates an impasse before their practical implementation: devices satisfying algorithm-imposed requirements and algorithms running on nonideality-tolerant routines. This thesis demonstrates a near-ideal device technology and a superior neural network training algorithm that can ultimately propel analog computing when combined together. The CMOS-compatible nanoscale protonic devices demonstrated here show unprecedented characteristics, incorporating the benefits of nanoionics with extreme acceleration of ion transport and reactions under strong electric fields. Enabled by a material-level breakthrough of utilizing phosphosilicate glass (PSG) as a proton electrolyte, this operation regime achieves controlled shuttling and intercalation of protons in nanoseconds at room temperature in an energy-efficient manner. Then, a theoretical analysis is carried out to explain the infamous incompatibility between asymmetric device modulation and conventional neural network training algorithms. By establishing a powerful analogy with classical mechanics, a novel method, Stochastic Hamiltonian Descent, is developed to exploit device asymmetry as a useful feature. Overall, devices and algorithms developed in this thesis have immediate applications in analog deep learning, whereas the overarching methodology provides further insight for future advancements.

Download Slides

Exploring Energy Efficient Analog Neural Network Accelerator Designs

Vivienne Sze & Joel Emer

There is a large design space for tensor accelerators for applications such as deep neural networks. This includes analog and digital-based accelerator designs to support dense and sparse workloads. Even within these categories, there is a large design space. In this talk, we will discuss methods to model this design space to enable apple-to-apple comparisons and facilitate design space exploration.

Download Slides

A 50nJ/Token Transformer Accelerator with Adaptive Model Configuration, Word Elimination, and Dynamic Memory Gating

Alex Ji [Anantha Chandrakasan]

We present a transformer accelerator adopting a SuperTransformer model that enables adaptive model configuration on-chip, word elimination to prune redundant tokens, and dynamic memory gating with power switches to minimize leakage power. We achieve 5.8x scalability in network latency and energy by varying the model size. Word elimination can further reduce network energy by 16%, with some accuracy loss. At 0.68V and 80 MHz, processing a 32-length input takes 0.6ms and 1.6uJ.

A Threshold-Implementation-Based Neural-Network Accelerator Securing Model Parameters and Inputs Against Power and Electromagnetic Side-Channel Attacks

Saurav Maji [Anantha Chandrakasan]

Neural network (NN) hardware accelerators are being widely deployed on low-power loT nodes for energy-efficient decision making. Embedded NN implementations can use locally stored proprietary models, and may operate over private inputs (e.g., health monitors with patient-specific biomedical classifiers), which must not be disclosed. Side-channel attacks (SCA) are a major concern in embedded systems where physical access to the operating hardware can allow attackers to recover secret data by exploiting information leakage through power consumption, timing and electromagnetic emissions. SCA on embedded NN implementations can reveal the model parameters as well as the inputs. To address these concerns, we present an energy – efficient ASlC solution for protecting both the model parameters and the input data against power-based SCA.

Download Slides

Researchers