This notebook is the third in the series of notebooks that show how Aerospike can be used as a feature store.
This notebook requires the Aerospike Database and Spark running locally with Aerospike Spark Connector. To create a Docker container that satisfies the requirements and holds a copy of Aerospike notebooks, visit the Aerospike Notebooks Repo.
Introduction
This notebook shows how Aerospike can be used as a Feature Store for Machine Learning applications on Spark using Aerospike Spark Connector. It is Part 3 of the Feature Store series of notebooks, and focuses on Model Serving aspects concerning a Feature Store. The first two notebooks in the series discuss Feature Engineering and Model Training.
This notebook is organized as follows:
Summary of the prior (Data Engineering and Model Training) notebooks
Load the trained and saved model for making a prediction.
Use Aerospike API to retrieve precomputed features.
Implement and test a web service that combines the above elements, that is, accesses features, runs the model and returns the prediction.
Prerequisites
This tutorial assumes familiarity with the following topics:
You may execute shell commands including Aerospike tools like aql and asadm in the terminal tab throughout this tutorial. Open a terminal tab by selecting File->Open from the notebook menu, and then New->Terminal.
Prior Context
In the prior two notebooks on Feature Enginnering and Model Training, we saved feature data to the feature store, and trained an ML model on it, respectively. If the saved feature data or the model is not available in this environment, you can run the following cells to recreate them.
Feature Data
Run the following cells ONLY IF the database does not have the feature data for credit card transactions from the prior notebooks (Part 1 or Part 2). You will need to covert them to Code cells before you can run them.
# read and transform the sample credit card transactions data from a csv file
print("Loaded Random Forest Classification model.")
Output
Loaded Random Forest Classification model.
Retrieving Features
The Python Client provides a convenient API to access specific features from the entity set as shown below. Recall, the model uses features CC1_V1 through CC1-V28. We also need to construct a schema for the dataframe which is needed to run the model.
namespace ='test'
entity_set ='cctxn-features'
txnid ='5'# dummy value, the web service will get the id from the request params
record_key = (namespace, entity_set, txnid)
features =["CC1_V"+str(i) for i inrange(1,29)]# need features CC1_V1-CC1_V28
schema =StructType()
for i inrange(1,29): # all features are of type float or Double
We first construct a feature vector from the input features as required by the model interface. The model only uses fvector column created by VectorAssembler.
Call the model’s transform function to predict. We input only a dataframe with fvector column, and use only two columns from the prediction dataframe record: probablity and prediction. The threshold for fraud/no-fraud decision is 50%.
Let’s create a simple web service that serves the model. We will use the Flask framework to create the web service. The web service takes txnid as the query parameter, retrieves the features from the feature store, runs the model, and returns the prediction.
Note, this model serving example is not realistic as we are using only precomputed features for inference. Also, we have trained and tested the model with the same data. Nonetheless, the example serves the purpose which is to illustrate the use of a feature store for model serving. It should not be difficult to use the patterns shown here to devise a realistic example.
# stop the existing spark session before starting the web service
spark.stop()
Install Web Service Framework
Open a terminal tab, and install the Flask framework with the following command.
Terminal window
pipinstallflask
pipinstallflask_restful
Examine Web Service File
First open the file resources/fs-model-ws.py that implements the web service using Flask frameowrk, and examine its contents.
Note that it is mostly the code in the above cells organized to run as a Flask web service. You can learn more about Flask here.
Run Web Service
Run the web service by opening a terminal tab and running the following commands in it:
cd /home/jovyan/notebooks/spark/resources
python fs-model-ws.py
You can ignore the warning messages. After the “Debugger is active” message, the service is ready to receive requests.
Send Requests to Web Service
Let’s call the web service to predict the outcome for a transaction id.
We can submit requests through the curl command as below. We can test with a few normal transactions (ids: 1, 2, 3, 10), and a few fraud transactions (ids: 6337, 120506, 150669).
You can query the database to view other fraud and normal transaction ids. As you may recall, TxnId and CC1_Class are the bins for the transaction id and label respectively.
# Send a request to the model web service running at 127.0.0.1:5000
!curl http://127.0.0.1:5000/?txnid=1
Output
{
"fraud_prob": 0.052470997597310345,
"normal_prob": 0.9475290024026897,
"prediction": "no fraud"
}
# You can query a transaction in the database.
# remember CC1_Class is the label with 1 indicating a fraudulent transaction
!aql -c "select TxnId, CC1_Class from test.cctxn-features where PK = '6337'"
Output
select TxnId, CC1_Class from test.cctxn-features where PK = '6337'
+--------+------------+
| TxnId | CC1_Class |
+--------+------------+
| "6337" | 1 |
+--------+------------+
1 row in set (0.000 secs)
OK
Takeaways and Conclusion
In this notebook, we explored how Aerospike can be used as a Feature Store for ML applications. Specifically, we showed how the precomputed features stored in the Aerospike feature store can be used at model serving time. We implemented a simple web service that loads the trained model, and then for each request, retrieves features, runs the model, and returns the model prediction.
This is the third notebook in the series of notebooks on how Aerospike can be used as a feature store. The first and second notebooks discussed Feature Engineering and Model Training aspects respectively.
Cleaning Up
Shut down the web service by hitting Ctrl-C in the tab in which it is running.
Close the spark session, and remove the tutorial data by executing the cell below.
try:
spark.stop()
except:
; # ignore
# To remove all data in the namespace test, uncomment the following line and run:
Visit Aerospike notebooks repo to run additional Aerospike notebooks. To run a different notebook, download the notebook from the repo to your local machine, and then click on File->Open in the notebook menu, and select Upload.