Architecture
Flamingo: a Visual Language Model for Few-Shot Learning
DeepMind·April 29, 2022
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc
View on arXivTL;DR
Bridges a frozen vision encoder and language model for few-shot image-and-text tasks.
Why it matters
An influential recipe for building multimodal models on top of pretrained components.