Architecture

Flamingo: a Visual Language Model for Few-Shot Learning

DeepMind·April 29, 2022

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc

TL;DR

Bridges a frozen vision encoder and language model for few-shot image-and-text tasks.

An influential recipe for building multimodal models on top of pretrained components.