What is an LLM?

A Large Language Model predicts the likelihood of the next token based on the text that came before it. It does this over and over. It predicts the next token, adds it to the sequence, then repeats the process. Each prediction builds on everything that came before. It is essentially a prediction engine.

How does it decide what comes next?

It combines what it learned during training with the text you give it. Your input provides the context. The model uses it to predict what should follow.

What is an token?

A token can be a word, part of a word, or a single character. How text gets split into tokens varies from model to model.

image.png

Source: https://developers.google.com/machine-learning/crash-course/llm

What is a context window?

The amount of information a model can look at and reference when generating a response, plus the response it generates. Think of it like short-term memory. There is a limit to how much it can hold at once.

How does the context window change as I chat?

<aside> <img src="/icons/circle-dashed_gray.svg" alt="/icons/circle-dashed_gray.svg" width="40px" />

Standard Chat

</aside>

image.png

<aside> <img src="/icons/circle-dashed_gray.svg" alt="/icons/circle-dashed_gray.svg" width="40px" />

With extended thinking enabled

</aside>

image.png

Source:https://platform.claude.com/docs/en/build-with-claude/context-windows

<aside> <img src="/icons/circle-dashed_gray.svg" alt="/icons/circle-dashed_gray.svg" width="40px" />

Typical SillyTavern Context Window

</aside>

Blank diagram.png