Developers & Practitioners

Image similarity search with MatchIt Fast

Give it a try — and either select a preset image or upload one of your own. Once you make your choice, you will get the top 25 similar images from two million images on Wikimedia images in an instant, as you can see in the video above. No caching involved.

The demo also lets you perform the similarity search with news articles. Just copy and paste some paragraphs from any news article, and get similar articles from 2.7 million articles on the GDELT project within a second.

Text similarity search with MatchIt Fast

Vector Search: the technology behind Google Search, YouTube, Play, and more

How can it find matches that fast? The trick is that the MatchIt Fast demo uses the vector similarity search (or nearest neighbor search or simply vector search) capabilities of the Vertex AI Matching Engine, which shares the same backend as Google Image Search, YouTube, Google Play, and more, for billions of recommendations and information retrievals for Google users worldwide. The technology is one of the most important components of Google's core services, and not just for Google: it is becoming a vital component of many popular web services that rely on content search and information retrieval accelerated by the power of deep neural networks.

So what's the difference between traditional keyword-based search and vector similarity search? For many years, relational databases and full-text search engines have been the foundation of information retrieval in modern IT systems. For example, you would add tags or category keywords such as "movie", "music", or "actor" to each piece of content (image or text) or each entity (a product, user, IoT device, or anything really). You’d then add those records to a database, so you could perform searches with those tags or keywords.

In contrast, vector search uses vectors (where each vector is a list of numbers) for representing and searching content. The combination of the numbers defines similarity to specific topics. For example, if an image (or any content) includes 10% of “movie”, 2% of “music”, and 30% of “actor”-related content, then you could define a vector [0.1, 0.02, 0.3] to represent it. (Note: this is an overly simplified explanation of the concept; the actual vectors have much more complex vector spaces). You can find similar content by comparing the distances and similarities between vectors. This is how Google services find valuable content for a wide variety of users worldwide in milliseconds.

With keyword search, you can only specify a binary choice as an attribute of each piece of content; it's either about a movie or not, either music or not, and so on. Also, you cannot express the actual "meaning" of the content to search. If you specify a keyword "films", for example, you would not see any content related to "movies" unless there was a synonyms dictionary that explicitly linked these two terms in the database or search engine.

Vector search provides a much more refined way to find content, with subtle nuances and meanings. Vectors can represent a subset of content that contains "much about actors, some about movies, and a little about music". Vectors can represent the meaning of content where “films”, “movies”, and “cinema” are all collected together. Also, vectors have the flexibility to represent categories previously unknown to or undefined by service providers. For example, emerging categories of content primarily attractive to kids, such as ASMR or slime, are really hard for adults or marketing professionals to predict beforehand, and going back through vast databases to manually update content with these new labels would be all but impossible to do quickly. But vectors can capture and represent never-before-seen categories instantly.

Vector search changes business

Vector search is not only applicable to image and text content. It can also be used for information retrieval for anything you have in your business when you can define a vector to represent each thing. Here are a few examples: