Vector

Imagine a vector as an arrow: the arrow's length represents how strong the quantity is, and its direction shows where it points.

It uses a list of numbers that represents meaning.

"hello world" → [0.12, -0.88, ..., 0.03] (float32 array)

Two texts that mean similar things will have vectors pointing in similar directions.

Vector Space

A vector space is a set of vectors that can be played with each other. We can add or multiply them, but need to follow some certain rules.

High-Dimensional Vector

The arrow points in many different directions at once. Each direction represents a different feature or aspect of something.

A picture of a dog

In a low-dimensional vector, we might only have a few directions representing basic features like color and type. In a high-dimensional vector, we could have hundreds or even thousands of directions. Those directions represent features. Like the shape of the dog ears, the color of the eye, the texture of its fur and so on.

A high-dimensional vector is like a detailed description of something. These high-dimensional vectors are used in things like machine learning and data analysis to capture complex information of something and relationships between things.

Embedding

The procedure to generate the vectors from words is called embeddings. Embeddings capture semantic relationships between words or documents by mapping them to continuous vector representations.

In such a way, similar words or documents are closer together in the vector space. For instance, the point (2, 3) represents the word joy while the point (3, 4) represents happy. Because these words have similar meanings, their vectors are located close to each other in the vector space.

The colocation of similar words in the vector space is crucial for various natural language processing tasks. Machine learning models use the information to capture and understand the semantic relationships between words. This information can be used in sentiment analysis, contextual information, language translation, and more.

Vector Search

A query can be translated into high dimensional vectors and find similar vectors to return in search results. Once candidates are found, the results are scored using similarity metrics based on the strength of the match.

KNN (K-Nearest Neighbors)

A commonly used classification and regression algorithm.