We want to be able to match different query terms and have a matching concept come up (e.g. synonyms should not matter, etc.).

At a high level, we want to convert a query to a query representation with a representation function, and then get results from it.

image.png

We need some kind of representation that captures the meaning of a document.

Most simply, could use “bag of words” approach:

Words can be in many languages, but we generally use English. We process the words of a document as follows:

One possible approach is creating an embedding (representation) of a document.

Alternatively, we use a bag of words (i.e. word count).

image.png

Inverted Index - map context to documents