2023 Symposium

MIT AI Hardware Program

Thursday, April 6, 2023 | 9:30am - 2:30pm ET
MIT Building 34-401, Grier Room
50 Vassar St, Cambridge, MA
Hybrid Event

Artificial Intelligence brain shape in a complex and modern GPU card in shades of purple, blue, and gold

About

The MIT AI Hardware Program is an academia-industry initiative between the MIT School of Engineering and MIT Schwarzman College of Computing. We work with industry to define and bootstrap the development of translational technologies in hardware and software for the AI and quantum age.

The symposium included project reviews of the current MIT AI Hardware Program portfolio as well as exposure to new projects.

Register

This event is open to the MIT community and AI Hardware Program members.

Registration is now closed. Email mcshane@mit.edu with any questions.

Agenda

9:30 – 10:00

Registration and Breakfast

10:00 - 10:15

Year in Review & the Year Ahead

Program Co-Leads

Jesús del Alamo, Donner Professor, Professor of Electrical Engineering and Computer Science, Mac Vicar Faculty Fellow
Aude Oliva, Director of Strategic Industry Engagement, MIT Schwarzman College of Computing; CSAIL Senior Research Scientist

left male wearing red sweater and white dress shirt in front of grey background, right female wearing teal blouse in front of multi color background

10:15 – 10:45 | Keynote

Efficient AI Computing: from TinyML to Large Language Model Acceleration

Song Han, Associate Professor of Electrical Engineering and Computer Science

Rapid developments in silicon technology mean that what constitutes a large model today may become a tiny model tomorrow. Our work covers both tinyML research and the development of large language models, which have been integrated by the industry.

photo of MIT Professor Song Han seated in lab

10:45 – 11:30 | Lightning Talks

Accelerated Photonic Deep Learning: Decentralized and Single Chip Solutions

Ryan Hamerly, Research Scientist at MIT and Senior Scientist at NTT Research

The field of quantum mechanics presents significant opportunities to tackle unresolved issues in communication, computation, and precision measurement. We are working to advance these technologies in diverse physical systems, such as atoms, superconductors, and topological states of matter.

In collaboration with Dirk Englund, Associate Professor of Electrical Engineering and Computer Science

side by side photos of ryan hammerly male wearing hawaiian pattern shirt and dirk englund male wearing light blue shirt standing outside

Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Nellie Wu, PhD Candidate in Electrical Engineering and Computer Science

Optimization methods for deep neural networks (DNNs) often introduce models with various sparsity characteristics. In our work, we propose a software-hardware co-design methodology to efficiently support DNNs with weights and activations that are either fully dense or sparse with various degrees of sparsity.

In collaboration with Vivienne Sze, Associate Professor of Electrical Engineering and Computer Science; Joel S. Emer, Professor of the Practice, Electrical Engineering and Computer Science

photo collage of two women and one man all wearing glasses

An Energy-Efficient Neural Network Accelerator with Improved Protection Against Clock Glitch-Based Fault Attacks

Saurav Maji, PhD Candidate in Electrical Engineering and Computer Science

Deep neural networks (DNNs) have critical vulnerabilities for security-critical applications. In our work, we provide robust and reliable DNN inference for resource-constrained embedded platforms with secure memory systems.

In collaboration with Anantha Chandrakasan, Dean of the School of Engineering and Vannevar Bush Professor of Electrical Engineering and Computer Science

photo collage of two males one young and one adult

11:30-11:45

Coffee Break

11:45 – 12:15 | Keynote

DSAIL: Building Data Systems using Learning

Sam Madden, Professor of Electrical Engineering and Computer Science

In this talk, I will describe recent progress in the Data Systems Group at MIT on using machine learning algorithms to improve the performance of large scale systems that work with data like database systems and file systems.

12:15 – 1:00 | Lightning Talks

Ferroelectric Synapses

Yanjie Shao, PhD Candidate in Electrical Engineering and Computer Science

The recent discovery of prominent ferroelectric properties of CMOS-compatible HfO2-based materials has brought new functionality to microelectronics in the form of a new memory technology and the potential of transistors that can operate at very low voltages. An intriguing new application of these remarkable findings is in low-energy analog synapses.

In collaboration with Jesús del Alamo, Donner Professor, Professor of Electrical Engineering and Computer Science, Mac Vicar Faculty Fellow

There’s Always a Bigger Fish: A Clarifying Analysis of a Machine-Learning-Assisted Side-Channel Attack

Mengjia Yan, Assistant Professor of Electrical Engineering and Computer Science

Machine learning has made it possible to mount powerful attacks through side channels that have traditionally been seen as challenging to exploit. However, due to the black-box nature of machine learning models, these attacks are often difficult to interpret correctly. Models that detect correlations cannot be used to prove causality or understand an attack’s various sources of information leakage. In this talk, we show that a state-of-the-art website-fingerprinting attack powered by machine learning was only partially analyzed, leading to an incorrect conclusion about its root causes. We further show how a careful analysis can reveal the mechanisms behind this powerful attack.

photo of Mengjia Yan seated in her office

Nonvolatile Photonic Memory for Computing

Juejun Hu, Professor of Materials Science and Engineering

Photonic hardware accelerators are gaining interest as the prevailing von Neumann computing architecture struggles with AI tasks, and chalcogenide phase change materials (PCMs) are promising due to their optical property contrast and unique behavior as a photonic memory material, enabling on-chip and free-space re-programmable optical computing network architectures with ultrahigh speed, massive parallelism, and low energy consumption.

JJ Hu wearing a dark blue vest and a blue and white striped shirt standing in front of wood shelves in his office

1:00 – 2:30

Lunch & Poster Session

The session will feature up to 30 posters on the state of the art MIT research on energy efficient systems and devices, computing at the edge, generative AI and new hardware and software technologies.

2:30 – 3:00

Break

3:00-5:00

Networking & customized meetings with MIT faculty and students

Poster Session

Electrochemical Ionic Synapses (EIS) with Mg2+ as the Working Ion

Miranda Schwacke, Jesús del Alamo, Ju Li, Bilge Yildiz

Dynamic doping by proton intercalation has shown great promise as a fast and energy-efficient mechanism for resistance modulation, but the use of protons limits retention of programmed states in air. In this work, we replace H⁺ with Mg²⁺ as the working ion in electrochemical ionic synapses (EIS) for significantly improved retention in air without the need for encapsulation.

Digital background depicting innovative technologies in (AI) artificial systems, neural interfaces and internet machine learning technologies

Timing-dependent Programming of Electrochemical Synapses Enabled by Non-linear Voltage Kinetics

Mantao Huang, Ju Li, Jesús del Alamo, Bilge Yildiz

Electrochemical ionic synapses (EISs) are programmable resistors whose conductivity is controlled by electrochemical charge insertion with repeatable conductance modulation and low energy consumption, and are promising building blocks for neuromorphic computing hardware. In this work, we propose that their non-linear voltage kinetics can be utilized to implement neuroscience-guided learning rules.

Artificial Intelligence digital concept with brain shape

Optimizing Electrolyte and Interface Proton Transport for Low Power Electrochemical Ionic Synapses

Heejung Chung, Jordan Meyer, Longlong Xu, Jesús del Alamo, Ju Li, Bilge Yildiz

Electrochemical ionic synapses (EIS) promise fast, energy-efficient conductance modulation using protons, but kinetic bottlenecks impede improvements in operating voltage and speed. This work investigates proton transport by studying CMOS-compatible oxide electrolytes and applying electrolyte-channel interface coatings, as well as identifying phonon-based descriptors to screen for fast ternary oxide electrolytes.

3D brain sitting on computer chip in shades of blue

Architectural Evaluation of Processing-In-Memory Accelerators

Tanner Andrulis, Joel Emer, Vivienne Sze

Analog Processing-In-Memory (PIM) accelerators are a promising approach to efficiently run Deep Neural Networks (DNNs). In this work, we present a fast, flexible framework that models PIM at an architectural level, enabling researchers to see how device, circuit, and architectural innovations will affect the efficiency and performance of PIM accelerators.

Artificial neuron in concept of artificial intelligence.

Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel Emer

abstract ferroelectric field in shades of blue, purple, and red

Opportunities for High Throughput, In-Situ TEM for Ferroelectrics Oxides

Paul Miller, Frances Ross

We propose optimizing high throughput experimentation in in-situ Ultra High Vacuum Transmission Electron Microscopy via an integrated approach of microfabricated specimen architecture, in-situ processing and intelligent experimental design. Such an approach will be utilized to rapidly optimize processing parameters of Hf_xZr_1-xO₂ ferroelectrics to enable more efficient and reliable ferroelectric memory, and may be applied to a wide range of materials deposition and processing studies.

Motherboard digital chip in shades of green

In-Situ STEM Characterization of Structural Changes in HfO2 for Neuromorphic Devices

Alexandre Foucher, Frances Ross

Neuromorphic materials are a rising class of components for novel memory devices such as late-generation memristors. To this end, HfO₂ has been shown to be a promising candidate for memristors and improvement of resistive random-access memory. However structural changes on HfO₂ under applied voltage and flow of current remain elusive. In this work, an in situ scanning transmission electron microscopy analysis was performed to observe changes in a cross section of an HfO₂-based nanodevices under biasing conditions. Crystallization of the initial amorphous HfO₂ under beam exposure and current flow was observed. I-V curve measurements are consistent with the characteristic behavior of memristors. The results point out a crystallization mechanism due to self-heating effects near the formation of an oxygen filament, a characteristic feature in neuromorphic devices. The understanding of changes in crystalline structure and oxygen concentrations in memristors is a crucial step to guide the design of novel nanodevices for next-generation electronics.

round lens shaped portal with flared light and colors in blues and purples representing data being moving through training

Pruning’s Effect on Generalization Through the Lens of Training and Regularization

Tian Jin, Michael Carbin, Daniel M. Roy, Jonathan Frankle, Gintare Karolina Dziugaite

Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads to a contradiction — while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it. Motivated by this contradiction, we re-examine pruning’s effect on generalization empirically. We show that size reduction cannot fully account for the generalization-improving effect of standard pruning algorithms. Instead, we find that pruning leads to better training at specific sparsities, improving the training loss over the dense model. We find that pruning also leads to additional regularization at other sparsities, reducing the accuracy degradation due to noisy examples over the dense model. Pruning extends model training time and reduces model size. These two factors improve training and add regularization respectively. We empirically demonstrate that both factors are essential to fully explaining pruning’s impact on generalization.

Impulse Oscillation with color lines and lights on black background

Electrical Modulation of Circular Polarization of Light by Spin-photon Conversion

Pambiang Abel Dainone, Nicholas Figueiredo Prestes, Martina Morassi, Aristide Lemaitre, Mathieu Stoffel, Xavier Devaux, Jean-Marie George, Henri Jaffrès, Pierre Renucci, Yuan Lu, Luqiao Liu

Direct high-speed modulation of circular polarization (P_c) of coherent light will open the way for new communication technology and offers the possibility to overcome the main bottleneck of the optical telecommunications. An innovative way to modulate the circular polarization of a light source is to harness the spin of electrons and photons. Remarkably, this spin-photon conversion enables ultrafast polarization modulation and highly efficient information coding, implemented even on a single photon.

Probabilistic Nanomagnets for Energy-Efficient Bayesian Inference Accelerators

Brooke McGoldrick, Marc Baldo, Luqiao Liu

Bayesian inference is an AI approach with applications spanning medicine, robotics, and speech recognition; however, the energy required for inference is high due to the number of high-precision multiplications and memory accesses needed on traditional hardware. We propose integrating our novel devices – nanomagnetic probabilistic bits – with traditional CMOS circuitry to realize a stochastic, scalable, and energy-efficient accelerator for Bayesian inference problems.

Hydrodynamic Transport and Inverse Design in Nanoscale Devices

Giuseppe Romano

We propose the development of a differentiable physics solver to compute and optimize electronic transport in the hydrodynamic regime. This tool will enable constrained large-scale shape optimization of nanoscale devices, opening up opportunities for novel computing architectures and end-to-end software and hardware optimization.

Abstract background of lines and glowing particles in shades of blue and black.

A Superconducting Platform for Energy-efficient Spiking Neural Networks

Matteo Castellani, Emily Toomey, Andres E. Lombo, Jesus Lares, Marco Colangelo, Chi-Ning Chou, Ken Segall, Nancy Lynch, and Karl K. Berggren

Superconducting spiking neural networks (SSNNs) are a promising technology for neuromorphic computing, with their potential for high energy efficiency and fast processing speeds. Two key components used in SSNNs are Josephson junctions and superconducting nanowires, both of which can emulate the firing behavior of neurons. Superconducting nanowires offer certain advantages over Josephson junctions, as they can generate higher voltage potentials and serve as low-loss transmission lines for interconnections between neurons. In this work, we present superconducting nanowire neurons and synapses for supervised learning in spiking neural networks, and we propose valuable applications of the technology, such as image recognition, matrix inversion, and winner-takes-all algorithms.

Lightning: A Reconfigurable Photonic-Electronic SmartNIC for Fast and Energy-Efficient Inference

Zhizhen Zhong, Mingran Yang, Christian Williams, Alexander Sludds, Homa Esfahanizadeh, Ryan Hamerly, Dirk Englund, Manya Ghobadi

We propose Lightning, a reconfigurable photonic-electronic smartNIC that uses a novel datapath to feed traffic from the network into the photonic domain without creating digital packet processing and data movement bottlenecks. Comparing with DPUs, GPUs, and smartNICs, Lightning accelerates inference serve time by up to 166x while consuming up to 416x less energy.

There’s Always a Bigger Fish: A Clarifying Analysis of a Machine-Learning-Assisted Side-Channel Attack

Jack Cook, Jules Drean, Jonathan Berhens, Mengjia Yan

This work highlights the limitations of relying on machine learning for side-channel attacks without completing a comprehensive security analysis.

Visualization of the signals coming from a phone being carried by a woman in a city business location.

TinyFL: Privacy Preserving On-device Training for Edge Devices

Irene Tenison, Juan Duitama, Lalana Kagal

Internet of Things (IoT) devices and microcontrollers generate a steady stream of data that is useful in various domains including patient monitoring, preventing breakdowns in factories, and even efficient traffic control. In many cases, this data is highly sensitive and may not be used or shared directly. Our project, TinyFL, enables privacy-preserving distributed machine learning on memory-constraint edge devices enabling models to be learned without sharing data.

HierBatching: Tera-Scale GNN Training in just a Single Machine

Tianhao Huang, Xuhao Chen, Muhua Xu, Arvind, Jie Chen

Graph neural networks (GNNs) have become increasingly popular for analyzing data organized as graphs but are faced with unique scalability challenges for large-scale datasets. In this work, we propose HierBatching, an out-of-core GNN training framework to tame tera-scale datasets with just a single machine. Our framework has been shown to be highly cost-efficient while incurring almost no loss of accuracy and runtime performance.

Quantum computing concept. Abstract glowing electronic circuit.

Programmable Photonics: New Paradigm for Fast Optical Signal Processing

Zhengqi Gao, Xiangfeng Chen, Zhengxing Zhang, Uttara Chakraborty, Wim Bogaerts, Duane Boning

Programmable integrated photonics are an alternative paradigm to application-specific integrated photonics. We have developed an automatic technique to realize arbitrary light-processing functions (e.g., splitting, filtering, wavelength selection) on a recirculating square-mesh programmable photonic circuit.

Concept image of cables and connections for data transfer in the digital world.3d rendering.

ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining

Yifan Yang, Joel Emer, Daniel Sanchez

Sparse CNNs dramatically reduce computation and storage costs over dense ones. But sparsity also makes CNNs more data-intensive, as each value is reused fewer times. Thus, current sparse CNN accelerators, which process one layer at a time, are bottlenecked by memory traffic. We present ISOSceles, a new sparse CNN accelerator that dramatically reduces data movement through inter-layer pipelining: overlapping the execution of consecutive layers so that a layer’s output activations are quickly consumed by the next layer without spilling them off-chip.

Video wall with small screens digital concept

CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation

Abdullah Alomar, Pouya Hamadanian, Arash Nasr-Esfahany, Anish Agarwal, Mohammad Alizadeh, Devavrat Shah

CausalSim is an evolution of traditional trace-driven simulation, powered by AI-driven Causal Reasoning. During a 5-month deployment on an open-source video streaming platform, CausalSim led to 2.6x less video stalling compared to expert-tuned algorithms.

Gemino: Practical and Robust Neural Compression for Video Conferencing

Pantea Karimi Babaahmadi, Mohammad Alizadeh

Video compression faces challenges when bandwidth is limited, and to address this, compression methods must learn video regularities to predict frames from less data. To achieve this this project seeks develop a practical neural video compression approach for Internet video delivery, which can be tailored to specific video types (e.g., sports, game shows, movies) and use deep learning to better simulate human perception for improved quality criteria.

abstract 3D mapping in primary colors of yellow red and pink, and blue and teal

ConceptFusion: Open-set Multimodal 3D Mapping

Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Shuang Li, Alaa Malouf, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba

ConceptFusion presents a new approach to building open-set and queryable multimodal 3D maps. We show how features from foundation models like CLIP and DINO can be fused to 3D to enable a broad range of robot perception tasks.

Giving Computers the Sense of Touch

Michael Foshey, Wojciech Matusik

We developed an integrated hardware and software solution for high-resolution tactile data. Our tactile sensor arrays can be manufactured in a scalable, automated process. We show a range of different applications in robotics, wearables, and sensors placed in an environment.

abstract rendering of human brain processing images and perceptions

Building Spatio-temporal Maps of Human Brain Processes

Benjamin Lahner, Aude Oliva

The human brain is a time machine; we are constantly remembering our past, and projecting ourselves into the future. Capturing the brain’s response as these moments unfold could yield valuable insights into both how the brain works and how to better design human-centered AI systems. Here, we present unique human brain spatiotemporal maps of when people see an image, hear a word or a sound, or encode a visual event into memory. Our method, based on the combination of fMRI and MEG, can lead to the development of human functional maps of perceptual and cognitive states, capture the dynamics and progression of brain disease, as well as inform brain-machine interface.

starburst in shades of white, pink, and purple across a background of blue and purple

LEGO: an Optimizer and Hardware Generator for Efficient Linear Algebra & Deep Learning Accelerators

Yujun Lin, Zhekai Zhang, Song Han

We propose the LEGO framework that automatically optimizes and generates synthesizable RTL of spatial architecture design for tensor applications, especially neural network accelerators. LEGO introduces a hierarchical spatial architecture paradigm and a relation-centric hardware representation that offers high flexibility and expressiveness to describe the hardware design for tensor computation. By analyzing the relations specifying the dataflow and data assignment, LEGO can establish and optimize the interconnections among functional units and memory systems in the proposed architecture and finally generate the accelerator. Compared to the prior art, Gemmini, a template-based accelerator generator, LEGO achieved 1.1x-12x speedup on multiple networks varying from small models like MobileNetV2 to large models like Stable Diffusion.

female hand using smartphone with icon technology artificial intelligence (AI) and internet of things (IOT)

On-Device Training Under 256KB Memory and Tiny Training Engine

Wei-Chen Wang, Song Han

On-device training enables the model to adapt to new data collected from the sensors by fine-tuning a pre-trained model; however, the training memory consumption is prohibitive for IoT devices with tiny memory resources. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB SRAM and 1MB Flash without auxiliary memory, using less than 1/1000 of the memory of PyTorch and TensorFlow while matching the accuracy of tinyML applications. Our study enables IoT devices not only to perform inference but also to continuously adapt to new data for on-device lifelong learning.

Artificial Neural Networks cpu circuit board

EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation

Han Cai, Song Han

This work introduces EfficientViT, a new family of models for on-device semantic segmentation. The core of EfficientViT is a novel lightweight multi-scale attention module that enables a global receptive field and multi-scale learning with hardware-efficient operations. Without performance loss, EfficientViT provides up to 38x mobile latency reduction over SOTA semantic segmentation models (e.g., SegFormer, SegNeXt).

TorchSparse++: Efficient Primitives for Deep Learning on Point Clouds

Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

We developed TorchSparse, a high-performance library for point cloud deep learning. It achieves 2.9X, 3.3X, 2.2X and 1.8X measured end-to-end speedup on an NVIDIA A100 GPU over the state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference respectively; and is up to 1.3X faster than SpConv v2 in mixed precision training on the same device.

Programming code abstract technology background of software developer and Computer script

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han

SmoothQuant enables efficient and accurate W8A8 quantization for large language models by mitigating part of the quantization difficulty from activation to weights, which addresses the issues of systematic outliers.

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Jun-Yan Zhu, Song Han

During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users tend to make gradual changes to the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited regions. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With 1.2%-area edited regions, our method reduces the computation of DDIM by 7.5× and GauGAN by 18× while preserving the visual fidelity. With SIGE, we accelerate the speed of DDIM by 3.0x on RTX 3090 and 6.6× on Apple M1 Pro CPU, and GauGAN by 4.2× on RTX 3090 and 14× on Apple M1 Pro CPU.

Abstract quantum machine learning in shades of purple, blue, red, and yellow

Improving Robustness of Quantum Computing with Machine Learning

Hanrui Wang, Fred Chong, Song Han

We will discuss how ML can help quantum, specially, how circuit architecture search, gradient pruning and quantization improves the robustness of variational quantum algorithm on real quantum hardware.