Part 1: Feature Engineering

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

Objectives

By the end of this tutorial, you will be able to:

Understand what a feature store is and why it solves data consistency problems in ML workflows.
Explore the four main feature store objects: Feature Groups, Features, Entities, and Datasets.
See how the feature store maps to Aerospike sets and records in a ride-hailing application.

A ride-hailing platform’s smart dispatch system needs to predict which driver is most likely to accept a given ride request. That prediction depends on features like each driver’s acceptance rate, average trip distance, and how recently they completed a ride. Those features come from different pipelines and refresh on different schedules, but the model needs to see them as a single, consistent record at inference time.

A feature store manages this with four objects. Feature Groups collect related features that share a data source and computation pipeline. Each Feature is a single named value a pipeline produces, like an acceptance rate or a trip count. Entities tie those values to a real-world instance: one Aerospike record per driver, with every feature from every group co-located in the same record. Datasets define reproducible training slices by selecting which entities and features to include in the ML training step.

You’ll soon explore a minimal pre-made feature store in the provided feature_store_tutorial.ipynb notebook to understand how these objects map to Aerospike sets and records.

First, you’ll need to set up your system.