A Large Language Model predicts the likelihood of the next token based on the text that came before it. It does this over and over. It predicts the next token, adds it to the sequence, then repeats the process. Each prediction builds on everything that came before. It is essentially a prediction engine.
It combines what it learned during training with the text you give it. Your input provides the context. The model uses it to predict what should follow.
A token can be a word, part of a word, or a single character. How text gets split into tokens varies from model to model.

Source: https://developers.google.com/machine-learning/crash-course/llm
The amount of information a model can look at and reference when generating a response, plus the response it generates. Think of it like short-term memory. There is a limit to how much it can hold at once.
<aside> <img src="/icons/circle-dashed_gray.svg" alt="/icons/circle-dashed_gray.svg" width="40px" />
Standard Chat
</aside>

<aside> <img src="/icons/circle-dashed_gray.svg" alt="/icons/circle-dashed_gray.svg" width="40px" />
With extended thinking enabled
</aside>

Source:https://platform.claude.com/docs/en/build-with-claude/context-windows
<aside> <img src="/icons/circle-dashed_gray.svg" alt="/icons/circle-dashed_gray.svg" width="40px" />
Typical SillyTavern Context Window
</aside>
