The Full Assembly Line

The Analogy

An assembly line where raw material enters one end and a finished product comes out the other. At each station, one specific transformation happens. Your text goes through the same process.

Use the arrows below, the dots above, or your keyboard arrow keys to move through the stages.

Stage 1 -- Text to Token IDs

Your Words Become Numbers

why did the chicken cross the

The sentence is now token IDs. The model no longer sees words directly. It sees numbered pieces from its vocabulary.

Stage 2 -- Token IDs to Embeddings

IDs Expand Into Meaning

Each ID becomes a vector: a list of numbers that carries meaning. In real models, each token may have hundreds or thousands of numbers. We show 6 so the idea is visible.

Stage 3 -- After Attention

Attention Connects the Clues

Before Attention

→

After Attention

Attention lets each token look at the other tokens. The token "cross" connects strongly with "chicken" and the unfinished phrase "cross the". This helps the model understand the context before guessing the next word.

Stage 4 -- Feed Forward Inside the Layers

Attention Finds the Clues. Feed Forward Makes Them Useful.

Plain-English idea: Attention gathers clues from the sentence. Feed forward is the refinement step inside each layer. It tests patterns, filters weak options, and makes the best signal stronger before the next layer.

Feed forward is not another attention step. Attention gathers the context. Feed forward privately refines each token, strengthens useful patterns, filters weaker options, and sends a cleaner signal to the next layer.

Stage 5 -- The Prediction

What Comes Next?

The model now predicts the missing next token in: why did the chicken cross the

After attention and feed forward refinement, the model predicts the next token. For this sentence, "road" wins because it completes the common phrase.