Model vs. Serving complexity
In the early stage of our product, we started with simple logistic regression models to estimate the likelihood of users sending/accepting offers. The models were trained offline using scikit learn. The training set was obtained using a “log and learn” approach (logging signals exactly as they were during serving time) over ~90 different signals, and the learned weights were injected into our serving layer.
Although those models were doing a pretty good job, we observed via offline experiments the great potential of more advanced non linear models such as gradient boosted regression classifiers for our ranking task.
Implementing an in-memory fast serving layer supporting such advanced models would require non-trivial effort, as well as an on-going maintenance cost. A much simpler option was to delegate the serving layer to an external managed service that can be called via a REST API. However, we needed to be sure that it wouldn’t add too much latency to the overall flow.
In order to make our decision, we decided to do a quick POC using the AI Platform Online Prediction service, which sounded like a potential great fit for our needs at the serving layer.
A quick (and successful) POC
We trained our gradient boosted models over our ~90 signals using scikit learn, serialized it as a pickle file, and simply deployed it as-is to the Google Cloud AI Platform. Done. We get a fully managed serving layer for our advanced model through a REST API. From there, we just had to connect it to our java serving layer (a lot of important details to make it work, but unrelated to the pure model serving layer).
Below is a very high level schema of what our offline/online training/serving architecture looks like. The carpool serving layer is responsible for a lot of logic around computing/fetching the relevant candidates to score, but we focus here on the pure ranking ML part. Google Cloud AI Platform plays a key role in that architecture. It greatly increases our velocity by providing us with an immediate, managed and robust serving layer for our models and allows us to focus on improving our features and modelling.