LLMs are not biological brains; they are prediction engines.
At their core, Large Language Models (LLMs) only do one thing: Next Token Prediction. They calculate the statistical probability of the next word based on the words that came before.
So, when a model says it is "Thinking" or "Reasoning," it is simply predicting a conversation with itself before showing you the final answer.
Since an LLM cannot "pause" to think silently, it must write to think.
The "Thinking" block is effectively a Scratchpad. By generating text in this scratchpad, the model manipulates its own Context Window to increase the accuracy of the final answer.
Note :- Scratchpad: A conceptual term for the intermediate tokens generated by an LLM during a "Chain of Thought." It allows the model to externalize its reasoning, effectively creating a history of its own logic that it can "read" to ensure the final prediction is accurate.
Corresponds to: "Breakdown into phases"
Complex problems (like math or coding) cannot be solved in a single "guess." The model needs to perform computation. Since LLMs compute by generating tokens, more tokens = more computation.