This title was written by GPT-3, see how here...

Is attention really all you need?

Stumbling forward in the darkness of research, attention mechanisms are a flash of light so rarely encountered when dreaming up new ideas.

Stumbling forward in the darkness of research, attention mechanisms are a flash of light so rarely encountered when dreaming up new ideas.

<aside> 💡 *Why are people excited about attention?

Attention mechanisms have received more creative energy than almost any other part of deep learning. Why are researchers so inspired?*

I'm an applied machine learning grad student at the University of Illinois at Urbana Champaign, and I'm looking to learn and teach the best current research in machine learning. Follow along into my journey into the depths of attention mechanisms. Hold onto your hat and save all the links along the way :)

</aside>

Pictured: visualizing where a self-driving car is looking (in red colors). Remember: attention allows neural networks (NNs) to ignore input.

Pictured: visualizing where a self-driving car is looking (in red colors). Remember: attention allows neural networks (NNs) to ignore input.

Pictured: green and red bounding boxes visualize neural network attention in classification and visualization.
Attention in image-generation allows NNs to focus resources on creating high resolution improvements in a specific area.

Pictured: green and red bounding boxes visualize neural network attention in classification and visualization. Attention in image-generation allows NNs to focus resources on creating high resolution improvements in a specific area.

Motivation

I had to write this to understand the many creative ways ML designers are ignoring their training data. Attention == ignoring part of your input.

From AI Summer

From AI Summer

Before attention, convolutional neural nets (CNNs) were the most impactful idea in machine learning. Astonishingly, I view CNNs as a as a primitive form of attention 🤯 that's why I'm so excited.

CNN kernel:

Attention:

Why you should read this post (desired outcomes):

<aside> 👉 $\underbrace{TL;DR}$

Attention enables models to form richer, denser, internal representations. The model learns the salient parts and ignores the rest. Dense internal reps = more efficient use of smaller models.

Invariant forms of attention allow models to zero-shot (i.e. instantly) adapt to changing inputs. Keep an eye out for better and better forms of this.

Attention makes models robust to adversarial attacks (and substantially less likely to be fooled by imperceptible grain added to images).

Finally, attention aids explainability. But much work remains.

🧠👉 Read the highest-value next steps below to learn these concepts.

</aside>

Outline

https://media.giphy.com/media/1Le126ucKd4oQoy4Ni/giphy.gif

The short list of attention