Feature Store Tutorial

From engineering to real-time serving: one feature store for the entire ML lifecycle.

Machine learning relies on training models with historical data to predict future outcomes. The specific data points used in this process are known as features.

To ensure accuracy, the features used during live inference must match those used during training as closely as possible. A feature store acts as a centralized database and management layer, allowing teams to define, discover, and reuse features across the entire ML lifecycle while keeping training and serving data in sync.

In this tutorial series, you will build an end-to-end workflow for a feature store. Your goal is to use this workflow to help more drivers accept ride requests. This helps reduce wait times and makes customers happier.

To build this workflow, you will use Spark to process large amounts of data and Aerospike to retrieve that data instantly when the app needs it. You will use the same feature model for every part of the process to keep your data consistent. Finally, you will verify that the system responds in less than one millisecond even as the number of users and data points grows.

This tutorial series takes about 25–40 minutes across three parts (5–10 min, 10–15 min, and 10–15 min).

Tutorial path

Set up your environment, explore the feature store, then train and serve a model.

Part 1: Feature Engineering

Set up Aerospike and Spark, then explore the four feature store objects and see how they store data.

Start with feature engineering

Part 2: Model Training

Materialize dataset definitions into training data, train a model, and save the model artifact.

Start with model training

Part 3: Model Serving

Use the Aerospike Python client to fetch feature vectors for online inference and validate retrieval latency at larger scale.

Continue with model serving

Tools

The Spark Connector and Python client support different stages of the same ML pipeline.

Aerospike Spark Connector

For feature engineering and training, Spark parallelizes batch reads and writes while the connector handles Aerospike integration. Used in Parts 1 and 2.

Spark Connector docs

Aerospike Python Client

For serving, fetch feature bins for individual entities using direct key-based reads. Used in Part 3.

Python client docs