Task

Understand the best workflow for training these embeddings at scale

Create Embedded Vector Solutions

Qdrant + Classification Network(both)

Qdrant is a vector search engine that lets you fine-tune the various similarity search models for your specific use case. It’s super scalable in production with features like indexing. We then train a classification network and extract the embeddings from the network. From there just fine-tune. We used fine-tuning to take our sku matching tool from around 40% accurate to mid 90s.

https://qdrant.tech/documentation/

https://www.mygreatlearning.com/blog/alexnet-the-first-cnn-to-win-image-net/

https://sbert.net/ - Sentence transformers will be used for text based embedded vectors

We can also leverage a pretrained classification model for metric learning to extract embeddings https://github.com/christiansafka/img2vec

Marqo + Classification Network(both)

https://www.marqo.ai/

Marqo is a vector search engine that lets you fine-tune the various similarity search models for your specific use case. It’s super scalable in production with features like indexing. We then train a classification network and extract the embeddings from the network. From there just fine-tune. They have options for handling your deployment and cloud for you similar to pinecone

https://www.mygreatlearning.com/blog/alexnet-the-first-cnn-to-win-image-net/

https://sbert.net/ - Sentence transformers will be used for text based embedded vectors