Helping Computer Vision and Language Models Understand What They See

September 13, 2023

Powerful machine-learning algorithms known as vision and language models, which learn to match text with images, have shown remarkable results when asked to generate captions or summarize videos.

While these models excel at identifying objects, they often struggle to understand concepts, like object attributes or the arrangement of items in a scene. For instance, a vision and language model might recognize the cup and table in an image, but fail to grasp that the cup is sitting on the table.

Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have demonstrated a new technique that utilizes computer-generated data to help vision and language models overcome this shortcoming.

The researchers created a synthetic dataset of images that depict a wide range of scenarios, object arrangements, and human actions, coupled with detailed text descriptions. They used this annotated dataset to “fix” vision and language models so they can learn concepts more effectively. Their technique ensures these models can still make accurate predictions when they see real images.

Complete article from MIT News.

Explore

Photonic Processor Could Enable Ultrafast AI Computations with Extreme Energy Efficiency

Adam Zewe | MIT News

This new device uses light to perform the key operations of a deep neural network on a chip, opening the door to high-speed processors that can learn in real-time.

Neural NetworksEnergy Efficient AI

Graphic showing light emanating from a cubic crystal and passing through a material with an array of square holes. A lattice of atoms appears on the other side

AI Method Radically Speeds Predictions of Materials’ Thermal Properties

Adam Zewe | MIT News

The approach could help engineers design more efficient energy-conversion systems and faster microelectronic devices, reducing waste heat.

Neural NetworksMaterials

Cartoon with several online chat windows saying “Oops something went wrong,” and one in the center with text bubbles showing it is continuing to perform.

A New Way to Let AI Chatbots Converse All Day without Crashing

Adam Zewe | MIT News

Researchers developed a simple yet effective solution for a puzzling problem that can worsen the performance of large language models such as ChatGPT.

Neural NetworksHigh Performance Computation

MIT AI Hardware Program

Helping Computer Vision and Language Models Understand What They See

Complete article from MIT News.

Explore

Photonic Processor Could Enable Ultrafast AI Computations with Extreme Energy Efficiency

AI Method Radically Speeds Predictions of Materials’ Thermal Properties

A New Way to Let AI Chatbots Converse All Day without Crashing