Tokens | Notion

Tokens - Tokens are the basic units of text that a large language model (LLM) processes, representing words, parts of words, or punctuation marks.

                                                                                                |

                                                                                          |

Q.) Why tokens differ between model providers ?

Different model providers treat the input differently and produce different number of tokens based on the training as below.

Q.) How a tokenizer behaves when it encounters an unusual words?

LLM Tokenizer Unknown Word Rule:

Try matching the whole word.
If unknown, split into largest known subwords (BPE behavior).