How AI Understands Context

Every word looks at every other word at once. No information lost to distance.

The Analogy

How AI Understands Context

Imagine a room where everyone can hear everyone else at the same time. Now imagine a phone chain where each person only hears the one before them. Attention is the room.

Use the arrows below, the dots above, or your keyboard arrow keys to move through the stages.

Stage 1 -- The Problem

Before Attention: The Phone Chain

Memory of earlier words

The old way. Left to right, losing information with every step. By the end, the model barely remembers that "cat" is the subject of "ran." Long-distance relationships are lost.

Stage 2 -- The Breakthrough

Attention: The Room

The goal of this stage is simple: the action word ran should reconnect directly to cat. Watch the important words lift up, then follow the curved lines.

With attention, every word sees every other word. "Ran" looks directly at "cat" regardless of distance. No information lost.

Stage 3 -- Explore Attention

Click the Highlighted Word

This stage shows why the same word can mean different things. Click the purple word and follow the arrows to the strongest context clues.

Click the pulsing purple target word to reveal the attention pattern.
Select an example above, then click the purple word to see where it focuses its attention.
Stage 4 -- Multi-Head Attention

Three Heads, Three Perspectives

Multi-head attention means the model does not use only one spotlight. Different heads look for grammar, references, and meaning at the same time.

Click a head above to see what it pays attention to.

Multiple heads work simultaneously -- grammar, meaning, references, all in parallel. Each head specializes in a different type of relationship.

Takeaway

Word order and structure matter in your prompts. Clear sentences produce cleaner attention patterns and better results.

Attention connects words to context. But this is one step in the assembly line. What does the full pipeline look like? →