Your text enters one end, a prediction comes out the other. At each station, one specific transformation happens.
An assembly line where raw material enters one end and a finished product comes out the other. At each station, one specific transformation happens. Your text goes through the same process.
Use the arrows below, the dots above, or your keyboard arrow keys to move through the stages.
The sentence is now token IDs. The model no longer sees words directly. It sees numbered pieces from its vocabulary.
Each ID becomes a vector: a list of numbers that carries meaning. In real models, each token may have hundreds or thousands of numbers. We show 6 so the idea is visible.
Attention lets each token look at the other tokens. The token "cross" connects strongly with "chicken" and the unfinished phrase "cross the". This helps the model understand the context before guessing the next word.
Feed forward is not another attention step. Attention gathers the context. Feed forward privately refines each token, strengthens useful patterns, filters weaker options, and sends a cleaner signal to the next layer.
The model now predicts the missing next token in: why did the chicken cross the
After attention and feed forward refinement, the model predicts the next token. For this sentence, "road" wins because it completes the common phrase.
Every next word goes through this pipeline. Attention gathers context. Feed forward refines it. Then the model predicts one token at a time, which is why long answers require more computation.
Your prompt has been tokenized, embedded, and refined through layers of attention. But how does AI actually produce an answer? Not all at once -- one piece at a time. →