MIT AI Hardware Program
2023 Symposium
Thursday, April 6, 2023 | 9:30am - 2:30pm ET
MIT Building 34-401, Grier Room
50 Vassar St, Cambridge, MA
Hybrid Event

About
The MIT AI Hardware Program is an academia-industry initiative between the MIT School of Engineering and MIT Schwarzman College of Computing. We work with industry to define and bootstrap the development of translational technologies in hardware and software for the AI and quantum age.
The symposium included project reviews of the current MIT AI Hardware Program portfolio as well as exposure to new projects.

Register
This event is open to the MIT community and AI Hardware Program members.
Registration is now closed. Email mcshane@mit.edu with any questions.
Agenda
9:30 – 10:00
Registration and Breakfast
10:00 - 10:15
Year in Review & the Year Ahead
Program Co-Leads
Jesús del Alamo, Donner Professor, Professor of Electrical Engineering and Computer Science, Mac Vicar Faculty Fellow
Aude Oliva, Director of Strategic Industry Engagement, MIT Schwarzman College of Computing; CSAIL Senior Research Scientist

10:15 – 10:45 | Keynote
Efficient AI Computing: from TinyML to Large Language Model Acceleration
Song Han, Associate Professor of Electrical Engineering and Computer Science
Rapid developments in silicon technology mean that what constitutes a large model today may become a tiny model tomorrow. Our work covers both tinyML research and the development of large language models, which have been integrated by the industry.

10:45 – 11:30 | Lightning Talks
Accelerated Photonic Deep Learning: Decentralized and Single Chip Solutions
Ryan Hamerly, Research Scientist at MIT and Senior Scientist at NTT Research
The field of quantum mechanics presents significant opportunities to tackle unresolved issues in communication, computation, and precision measurement. We are working to advance these technologies in diverse physical systems, such as atoms, superconductors, and topological states of matter.
In collaboration with Dirk Englund, Associate Professor of Electrical Engineering and Computer Science

Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity
Nellie Wu, PhD Candidate in Electrical Engineering and Computer Science
Optimization methods for deep neural networks (DNNs) often introduce models with various sparsity characteristics. In our work, we propose a software-hardware co-design methodology to efficiently support DNNs with weights and activations that are either fully dense or sparse with various degrees of sparsity.
In collaboration with Vivienne Sze, Associate Professor of Electrical Engineering and Computer Science; Joel S. Emer, Professor of the Practice, Electrical Engineering and Computer Science

An Energy-Efficient Neural Network Accelerator with Improved Protection Against Clock Glitch-Based Fault Attacks
Saurav Maji, PhD Candidate in Electrical Engineering and Computer Science
Deep neural networks (DNNs) have critical vulnerabilities for security-critical applications. In our work, we provide robust and reliable DNN inference for resource-constrained embedded platforms with secure memory systems.
In collaboration with Anantha Chandrakasan, Dean of the School of Engineering and Vannevar Bush Professor of Electrical Engineering and Computer Science

11:30-11:45
Coffee Break
11:45 – 12:15 | Keynote
DSAIL: Building Data Systems using Learning
Sam Madden, Professor of Electrical Engineering and Computer Science
In this talk, I will describe recent progress in the Data Systems Group at MIT on using machine learning algorithms to improve the performance of large scale systems that work with data like database systems and file systems.

12:15 – 1:00 | Lightning Talks
Ferroelectric Synapses
Yanjie Shao, PhD Candidate in Electrical Engineering and Computer Science
The recent discovery of prominent ferroelectric properties of CMOS-compatible HfO2-based materials has brought new functionality to microelectronics in the form of a new memory technology and the potential of transistors that can operate at very low voltages. An intriguing new application of these remarkable findings is in low-energy analog synapses.
In collaboration with Jesús del Alamo, Donner Professor, Professor of Electrical Engineering and Computer Science, Mac Vicar Faculty Fellow

There’s Always a Bigger Fish: A Clarifying Analysis of a Machine-Learning-Assisted Side-Channel Attack
Mengjia Yan, Assistant Professor of Electrical Engineering and Computer Science
Machine learning has made it possible to mount powerful attacks through side channels that have traditionally been seen as challenging to exploit. However, due to the black-box nature of machine learning models, these attacks are often difficult to interpret correctly. Models that detect correlations cannot be used to prove causality or understand an attack’s various sources of information leakage. In this talk, we show that a state-of-the-art website-fingerprinting attack powered by machine learning was only partially analyzed, leading to an incorrect conclusion about its root causes. We further show how a careful analysis can reveal the mechanisms behind this powerful attack.

Nonvolatile Photonic Memory for Computing
Juejun Hu, Professor of Materials Science and Engineering
Photonic hardware accelerators are gaining interest as the prevailing von Neumann computing architecture struggles with AI tasks, and chalcogenide phase change materials (PCMs) are promising due to their optical property contrast and unique behavior as a photonic memory material, enabling on-chip and free-space re-programmable optical computing network architectures with ultrahigh speed, massive parallelism, and low energy consumption.

1:00 – 2:30
Lunch & Poster Session
The session will feature up to 30 posters on the state of the art MIT research on energy efficient systems and devices, computing at the edge, generative AI and new hardware and software technologies.
2:30 – 3:00
Break
3:00-5:00
Networking & customized meetings with MIT faculty and students
Poster Session

Electrochemical Ionic Synapses (EIS) with Mg2+ as the Working Ion
Miranda Schwacke, Jesús del Alamo, Ju Li, Bilge Yildiz
Dynamic doping by proton intercalation has shown great promise as a fast and energy-efficient mechanism for resistance modulation, but the use of protons limits retention of programmed states in air. In this work, we replace H+ with Mg2+ as the working ion in electrochemical ionic synapses (EIS) for significantly improved retention in air without the need for encapsulation.

Timing-dependent Programming of Electrochemical Synapses Enabled by Non-linear Voltage Kinetics
Mantao Huang, Ju Li, Jesús del Alamo, Bilge Yildiz
Electrochemical ionic synapses (EISs) are programmable resistors whose conductivity is controlled by electrochemical charge insertion with repeatable conductance modulation and low energy consumption, and are promising building blocks for neuromorphic computing hardware. In this work, we propose that their non-linear voltage kinetics can be utilized to implement neuroscience-guided learning rules.

Optimizing Electrolyte and Interface Proton Transport for Low Power Electrochemical Ionic Synapses
Heejung Chung, Jordan Meyer, Longlong Xu, Jesús del Alamo, Ju Li, Bilge Yildiz
Electrochemical ionic synapses (EIS) promise fast, energy-efficient conductance modulation using protons, but kinetic bottlenecks impede improvements in operating voltage and speed. This work investigates proton transport by studying CMOS-compatible oxide electrolytes and applying electrolyte-channel interface coatings, as well as identifying phonon-based descriptors to screen for fast ternary oxide electrolytes.

Architectural Evaluation of Processing-In-Memory Accelerators
Tanner Andrulis, Joel Emer, Vivienne Sze
Analog Processing-In-Memory (PIM) accelerators are a promising approach to efficiently run Deep Neural Networks (DNNs). In this work, we present a fast, flexible framework that models PIM at an architectural level, enabling researchers to see how device, circuit, and architectural innovations will affect the efficiency and performance of PIM accelerators.

Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity
Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel Emer
Optimization methods for deep neural networks (DNNs) often introduce models with various sparsity characteristics. In our work, we propose a software-hardware co-design methodology to efficiently support DNNs with weights and activations that are either fully dense or sparse with various degrees of sparsity.

Opportunities for High Throughput, In-Situ TEM for Ferroelectrics Oxides
Paul Miller, Frances Ross
We propose optimizing high throughput experimentation in in-situ Ultra High Vacuum Transmission Electron Microscopy via an integrated approach of microfabricated specimen architecture, in-situ processing and intelligent experimental design. Such an approach will be utilized to rapidly optimize processing parameters of HfxZr1-xO2 ferroelectrics to enable more efficient and reliable ferroelectric memory, and may be applied to a wide range of materials deposition and processing studies.

In-Situ STEM Characterization of Structural Changes in HfO2 for Neuromorphic Devices
Alexandre Foucher, Frances Ross
Neuromorphic materials are a rising class of components for novel memory devices such as late-generation memristors. To this end, HfO2 has been shown to be a promising candidate for memristors and improvement of resistive random-access memory. However structural changes on HfO2 under applied voltage and flow of current remain elusive. In this work, an in situ scanning transmission electron microscopy analysis was performed to observe changes in a cross section of an HfO2-based nanodevices under biasing conditions. Crystallization of the initial amorphous HfO2 under beam exposure and current flow was observed. I-V curve measurements are consistent with the characteristic behavior of memristors. The results point out a crystallization mechanism due to self-heating effects near the formation of an oxygen filament, a characteristic feature in neuromorphic devices. The understanding of changes in crystalline structure and oxygen concentrations in memristors is a crucial step to guide the design of novel nanodevices for next-generation electronics.

Pruning’s Effect on Generalization Through the Lens of Training and Regularization
Tian Jin, Michael Carbin, Daniel M. Roy, Jonathan Frankle, Gintare Karolina Dziugaite
Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads to a contradiction — while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it. Motivated by this contradiction, we re-examine pruning’s effect on generalization empirically. We show that size reduction cannot fully account for the generalization-improving effect of standard pruning algorithms. Instead, we find that pruning leads to better training at specific sparsities, improving the training loss over the dense model. We find that pruning also leads to additional regularization at other sparsities, reducing the accuracy degradation due to noisy examples over the dense model. Pruning extends model training time and reduces model size. These two factors improve training and add regularization respectively. We empirically demonstrate that both factors are essential to fully explaining pruning’s impact on generalization.

Electrical Modulation of Circular Polarization of Light by Spin-photon Conversion
Pambiang Abel Dainone, Nicholas Figueiredo Prestes, Martina Morassi, Aristide Lemaitre, Mathieu Stoffel, Xavier Devaux, Jean-Marie George, Henri Jaffrès, Pierre Renucci, Yuan Lu, Luqiao Liu
Direct high-speed modulation of circular polarization (Pc) of coherent light will open the way for new communication technology and offers the possibility to overcome the main bottleneck of the optical telecommunications. An innovative way to modulate the circular polarization of a light source is to harness the spin of electrons and photons. Remarkably, this spin-photon conversion enables ultrafast polarization modulation and highly efficient information coding, implemented even on a single photon.

Probabilistic Nanomagnets for Energy-Efficient Bayesian Inference Accelerators
Brooke McGoldrick, Marc Baldo, Luqiao Liu
Bayesian inference is an AI approach with applications spanning medicine, robotics, and speech recognition; however, the energy required for inference is high due to the number of high-precision multiplications and memory accesses needed on traditional hardware. We propose integrating our novel devices – nanomagnetic probabilistic bits – with traditional CMOS circuitry to realize a stochastic, scalable, and energy-efficient accelerator for Bayesian inference problems.

Hydrodynamic Transport and Inverse Design in Nanoscale Devices
Giuseppe Romano
We propose the development of a differentiable physics solver to compute and optimize electronic transport in the hydrodynamic regime. This tool will enable constrained large-scale shape optimization of nanoscale devices, opening up opportunities for novel computing architectures and end-to-end software and hardware optimization.

A Superconducting Platform for Energy-efficient Spiking Neural Networks
Matteo Castellani, Emily Toomey, Andres E. Lombo, Jesus Lares, Marco Colangelo, Chi-Ning Chou, Ken Segall, Nancy Lynch, and Karl K. Berggren
Superconducting spiking neural networks (SSNNs) are a promising technology for neuromorphic computing, with their potential for high energy efficiency and fast processing speeds. Two key components used in SSNNs are Josephson junctions and superconducting nanowires, both of which can emulate the firing behavior of neurons. Superconducting nanowires offer certain advantages over Josephson junctions, as they can generate higher voltage potentials and serve as low-loss transmission lines for interconnections between neurons. In this work, we present superconducting nanowire neurons and synapses for supervised learning in spiking neural networks, and we propose valuable applications of the technology, such as image recognition, matrix inversion, and winner-takes-all algorithms.

Lightning: A Reconfigurable Photonic-Electronic SmartNIC for Fast and Energy-Efficient Inference
Zhizhen Zhong, Mingran Yang, Christian Williams, Alexander Sludds, Homa Esfahanizadeh, Ryan Hamerly, Dirk Englund, Manya Ghobadi
We propose Lightning, a reconfigurable photonic-electronic smartNIC that uses a novel datapath to feed traffic from the network into the photonic domain without creating digital packet processing and data movement bottlenecks. Comparing with DPUs, GPUs, and smartNICs, Lightning accelerates inference serve time by up to 166x while consuming up to 416x less energy.

There’s Always a Bigger Fish: A Clarifying Analysis of a Machine-Learning-Assisted Side-Channel Attack
Jack Cook, Jules Drean, Jonathan Berhens, Mengjia Yan
This work highlights the limitations of relying on machine learning for side-channel attacks without completing a comprehensive security analysis.

TinyFL: Privacy Preserving On-device Training for Edge Devices
Irene Tenison, Juan Duitama, Lalana Kagal
Internet of Things (IoT) devices and microcontrollers generate a steady stream of data that is useful in various domains including patient monitoring, preventing breakdowns in factories, and even efficient traffic control. In many cases, this data is highly sensitive and may not be used or shared directly. Our project, TinyFL, enables privacy-preserving distributed machine learning on memory-constraint edge devices enabling models to be learned without sharing data.

HierBatching: Tera-Scale GNN Training in just a Single Machine
Tianhao Huang, Xuhao Chen, Muhua Xu, Arvind, Jie Chen
Graph neural networks (GNNs) have become increasingly popular for analyzing data organized as graphs but are faced with unique scalability challenges for large-scale datasets. In this work, we propose HierBatching, an out-of-core GNN training framework to tame tera-scale datasets with just a single machine. Our framework has been shown to be highly cost-efficient while incurring almost no loss of accuracy and runtime performance.

Programmable Photonics: New Paradigm for Fast Optical Signal Processing
Zhengqi Gao, Xiangfeng Chen, Zhengxing Zhang, Uttara Chakraborty, Wim Bogaerts, Duane Boning
Programmable integrated photonics are an alternative paradigm to application-specific integrated photonics. We have developed an automatic technique to realize arbitrary light-processing functions (e.g., splitting, filtering, wavelength selection) on a recirculating square-mesh programmable photonic circuit.

ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining
Yifan Yang, Joel Emer, Daniel Sanchez
Sparse CNNs dramatically reduce computation and storage costs over dense ones. But sparsity also makes CNNs more data-intensive, as each value is reused fewer times. Thus, current sparse CNN accelerators, which process one layer at a time, are bottlenecked by memory traffic. We present ISOSceles, a new sparse CNN accelerator that dramatically reduces data movement through inter-layer pipelining: overlapping the execution of consecutive layers so that a layer’s output activations are quickly consumed by the next layer without spilling them off-chip.

CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation
Abdullah Alomar, Pouya Hamadanian, Arash Nasr-Esfahany, Anish Agarwal, Mohammad Alizadeh, Devavrat Shah
CausalSim is an evolution of traditional trace-driven simulation, powered by AI-driven Causal Reasoning. During a 5-month deployment on an open-source video streaming platform, CausalSim led to 2.6x less video stalling compared to expert-tuned algorithms.

Gemino: Practical and Robust Neural Compression for Video Conferencing
Pantea Karimi Babaahmadi, Mohammad Alizadeh
Video compression faces challenges when bandwidth is limited, and to address this, compression methods must learn video regularities to predict frames from less data. To achieve this this project seeks develop a practical neural video compression approach for Internet video delivery, which can be tailored to specific video types (e.g., sports, game shows, movies) and use deep learning to better simulate human perception for improved quality criteria.

ConceptFusion: Open-set Multimodal 3D Mapping
Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Shuang Li, Alaa Malouf, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba
ConceptFusion presents a new approach to building open-set and queryable multimodal 3D maps. We show how features from foundation models like CLIP and DINO can be fused to 3D to enable a broad range of robot perception tasks.

Giving Computers the Sense of Touch
Michael Foshey, Wojciech Matusik
We developed an integrated hardware and software solution for high-resolution tactile data. Our tactile sensor arrays can be manufactured in a scalable, automated process. We show a range of different applications in robotics, wearables, and sensors placed in an environment.

Building Spatio-temporal Maps of Human Brain Processes
Benjamin Lahner, Aude Oliva
The human brain is a time machine; we are constantly remembering our past, and projecting ourselves into the future. Capturing the brain’s response as these moments unfold could yield valuable insights into both how the brain works and how to better design human-centered AI systems. Here, we present unique human brain spatiotemporal maps of when people see an image, hear a word or a sound, or encode a visual event into memory. Our method, based on the combination of fMRI and MEG, can lead to the development of human functional maps of perceptual and cognitive states, capture the dynamics and progression of brain disease, as well as inform brain-machine interface.

LEGO: an Optimizer and Hardware Generator for Efficient Linear Algebra & Deep Learning Accelerators
Yujun Lin, Zhekai Zhang, Song Han
We propose the LEGO framework that automatically optimizes and generates synthesizable RTL of spatial architecture design for tensor applications, especially neural network accelerators. LEGO introduces a hierarchical spatial architecture paradigm and a relation-centric hardware representation that offers high flexibility and expressiveness to describe the hardware design for tensor computation. By analyzing the relations specifying the dataflow and data assignment, LEGO can establish and optimize the interconnections among functional units and memory systems in the proposed architecture and finally generate the accelerator. Compared to the prior art, Gemmini, a template-based accelerator generator, LEGO achieved 1.1x-12x speedup on multiple networks varying from small models like MobileNetV2 to large models like Stable Diffusion.

On-Device Training Under 256KB Memory and Tiny Training Engine
Wei-Chen Wang, Song Han
On-device training enables the model to adapt to new data collected from the sensors by fine-tuning a pre-trained model; however, the training memory consumption is prohibitive for IoT devices with tiny memory resources. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB SRAM and 1MB Flash without auxiliary memory, using less than 1/1000 of the memory of PyTorch and TensorFlow while matching the accuracy of tinyML applications. Our study enables IoT devices not only to perform inference but also to continuously adapt to new data for on-device lifelong learning.

EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation
Han Cai, Song Han
This work introduces EfficientViT, a new family of models for on-device semantic segmentation. The core of EfficientViT is a novel lightweight multi-scale attention module that enables a global receptive field and multi-scale learning with hardware-efficient operations. Without performance loss, EfficientViT provides up to 38x mobile latency reduction over SOTA semantic segmentation models (e.g., SegFormer, SegNeXt).

TorchSparse++: Efficient Primitives for Deep Learning on Point Clouds
Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han
We developed TorchSparse, a high-performance library for point cloud deep learning. It achieves 2.9X, 3.3X, 2.2X and 1.8X measured end-to-end speedup on an NVIDIA A100 GPU over the state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference respectively; and is up to 1.3X faster than SpConv v2 in mixed precision training on the same device.

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han
SmoothQuant enables efficient and accurate W8A8 quantization for large language models by mitigating part of the quantization difficulty from activation to weights, which addresses the issues of systematic outliers.

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Jun-Yan Zhu, Song Han
During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users tend to make gradual changes to the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited regions. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With 1.2%-area edited regions, our method reduces the computation of DDIM by 7.5× and GauGAN by 18× while preserving the visual fidelity. With SIGE, we accelerate the speed of DDIM by 3.0x on RTX 3090 and 6.6× on Apple M1 Pro CPU, and GauGAN by 4.2× on RTX 3090 and 14× on Apple M1 Pro CPU.

Improving Robustness of Quantum Computing with Machine Learning
Hanrui Wang, Fred Chong, Song Han
We will discuss how ML can help quantum, specially, how circuit architecture search, gradient pruning and quantization improves the robustness of variational quantum algorithm on real quantum hardware.