Skip to content

Feature Store Tutorial

From engineering to real-time serving: one feature store for the entire ML lifecycle.

Machine learning relies on training models with historical data to predict future outcomes. The specific data points used in this process are known as features.

To ensure accuracy, the features used during live inference must match those used during training as closely as possible. A feature store acts as a centralized database and management layer, allowing teams to define, discover, and reuse features across the entire ML lifecycle while keeping training and serving data in sync.

In this tutorial series, you will build an end-to-end workflow for a feature store. Your goal is to use this workflow to help more drivers accept ride requests. This helps reduce wait times and makes customers happier.

To build this workflow, you will use Spark to process large amounts of data and Aerospike to retrieve that data instantly when the app needs it. You will use the same feature model for every part of the process to keep your data consistent. Finally, you will verify that the system responds in less than one millisecond even as the number of users and data points grows.

This tutorial series takes about 25–40 minutes across three parts (5–10 min, 10–15 min, and 10–15 min).

Tutorial path

Set up your environment, explore the feature store, then train and serve a model.
Part 1: Feature Engineering
Set up Aerospike and Spark, then explore the four feature store objects and see how they store data.
Part 2: Model Training
Materialize dataset definitions into training data, train a model, and save the model artifact.
Part 3: Model Serving
Use the Aerospike Python client to fetch feature vectors for online inference and validate retrieval latency at larger scale.

Tools

The Spark Connector and Python client support different stages of the same ML pipeline.
Aerospike Spark Connector
For feature engineering and training, Spark parallelizes batch reads and writes while the connector handles Aerospike integration. Used in Parts 1 and 2.
Aerospike Python Client
For serving, fetch feature bins for individual entities using direct key-based reads. Used in Part 3.
Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?