Tokens - Tokens are the basic units of text that a large language model (LLM) processes, representing words, parts of words, or punctuation marks.



|
|

Q.) Why tokens differ between model providers ?
Different model providers treat the input differently and produce different number of tokens based on the training as below.

Q.) How a tokenizer behaves when it encounters an unusual words?
LLM Tokenizer Unknown Word Rule: