Traditional machine learning mostly focuses on prediction and regression. In the last several years, however, machine learning and specifically deep learning, show great promise for ranking, information retrieval, recommendation, and deduplication. These new techniques allow companies to improve real-time retrieval and ranking of images, videos, customers, jobs, shopping catalogue items, friends, places, and many more.
However, this poses challenges that include management of large collections of items, real-time retrieval, fast batch scoring, metering and testing, distributed systems management, and many more. Edo Liberty is the founder and CEO at HyperCube and the Ex-Director of Research at Amazon AI Labs. He offers a deep dive into the problems faced in retrieval and ranking and suggests another way forward. We’ll showcase some of the session here, but you’ll gain the most benefit in watching the video below for the full experience including the participant Q&A session.
The need for a new way
When we think about machine learning, we have usually the example of a classifier regressor that gets an object and just decides between the binary options yes or no or somewhere between zero and one. While we have great tools and a lot of understanding about how such models work, in reality, models are required to do something a lot more complex. What they want to do is, given a very large collection of other objects, select the best one for the given context or a query.
Edo shares the example of Pinterest where you can issue a fashion query which is an image and what you get back is a collection of other images of similar garments. Or in Facebook, you can post an image and look for friends and other people in other pictures. This is a machine learning model that tries to match faces and retrieve the most relevant and the most matching images.
At Google and Microsoft, when you search with a query you get web pages. Both Google and Microsoft recently announced that big part of their stacks now works with deep neural net models, with NLP models, and not for the traditional search back that they’ve been working on for for a very long time. In Airbnb, for example, you get recommendations for vacation rentals. These are machine learning problems whose task is not to classify regress to a value or figure out some labour but rather to collect and retrieve and rank objects from above. large collection
Eco asserts, “I want to argue that in fact, a huge set of problems fall into this pattern. So all recommendation engines, all visual search, all of the semantic search, q&a, fraud detection, online advertising.”
According to Edo “In the last several years, a pattern has emerged where we can solve most of these problems in a unified way and receive significant gain.
The best practice seems to be to take your query to an image and process it through a transformer to get a very semantically deep vector or tensor. And you take both of those vectors or tensors in pairwise, score them with another new element, and retrain the whole thing in doing such as the scoring at the end gives you the highest surrogate for the quality of the match between those two items.”
In real-time, you find the top-scoring items in your catalogue, and you bring those forward as the result. A simple example could be producing an image search or image deduplication service where you add images. You can run, and you can compare them using cosine similarity. That already gives you a pretty good duplicate detection for images. It’s not necessarily the best score for semantic similarity. But you can always score those deep vectors you get out of the CNN and get significantly improved results.
Semantic text search: While Google Bert combined with cosine similarity is helpful, Edo notes that this could be significantly improved with post scoring (fairway scoring) where you can do question and answering, similarities, semantic retrieval etc.
Simple recommender: If you want to build a recommendation engine, you can go through the process of embedding items in your catalogue into a vector space with some vector embedding algorithms. And then your query is actually your cart where you take a selection of items. You feed that into a deep neural net that gives you a deep encoding over the query (or the shopping cart( and the user (or the shopper). And then you can match that with everything in your catalogue. And again we see significant improvements in both in shopping and in media recommendation with those kinds of models.
Is it time for a unified model for machine learning for scale?
Edo asks, “Are we done? So we’ve taken a lot of different problems that seemed on the face of to be very different from each other, and in the past required very different mechanisms, different science disciplines, different systems for serving them and deploying them in the field. And suddenly, I’m arguing that there is some unified machine learning model that should be good for pretty much all of them.”
If you are a company undergoing AI transition from legacy systems and you want to own your own recommendation or your own semantic search solution, you have to build quite a lot these days. You have to figure out how to train great models, how to experiment with them, how to launch them into production, you have to figure out how to collect data and collect feedback. You have to feed that into a database and return the data of the data that you get out of that and retrain, and so on. Each and every one of those steps is significantly harder in the context of ranking and retrieval than in classification.
An example is real-time scoring: What does it take to actually serve the best, the highest-ranking, high scoring items in your catalogue for every query?
You keep your items in the catalogue in some collection. And when a query arrives, you basically use whatever it is used for training, whether it’s TensorFlow or MX net etc, you just score each one of those items with this query and choose the top.
That works fine if you have roughly one query per second, you have about 1000 items in your cart. So that’s a tiny application. When your load grows to about QPS=10 in about 100,000 items in your catalogue, you are already beyond the scope of what you can do on one machine and beyond the scope of what you can do with the original model, and now you have to do something smarter and the standard solution is distilled/quantized approximation or surrogate model.
Instead of having the full-fledged scoring, you have some lightweight version of it, that you run first to prune those results with a slightly more crude tool, but much more efficient computation. But instead of training one model, you have to train two different models with different features, and different outputs.
As a catalogue grows, more challenges arise for machine learning for scale. A billion model evaluations per second is already beyond the scope of what you can do even with a simple model. Previously, you had to maintain two solutions. Now you have to maintain three solutions. And all of those have to be in lockstep.
Finally, when your service grows even bigger so do the problems. As Edo notes “Say you have about 1000 QPS and maybe a billion items in your catalogue. Even embedding type solutions nearest neighbour solutions already don’t quite cut it and you have to really do very hard pruning upfront. For example, you’d look for good matches only in a certain category in your shop, or only in a certain geo or something that immediately prunes the list of candidates significantly. And what you end up having is this cascade process that is hard to maintain. It’s very clunky to build and takes a long time, very clunky to maintain and they’re hard to build takes a very long time to actually get it to work correctly.”
Watch the video above to enjoy the rest of the presentation and following Q&A session.