Part 1: Feature Engineering
For the complete documentation index see: llms.txt
All documentation pages available in markdown.
Objectives
By the end of this tutorial, you will be able to:
- Understand what a feature store is and why it solves data consistency problems in ML workflows.
- Explore the four main feature store objects: Feature Groups, Features, Entities, and Datasets.
- See how the feature store maps to Aerospike sets and records in a ride-hailing application.
A ride-hailing platform’s smart dispatch system needs to predict which driver is most likely to accept a given ride request. That prediction depends on features like each driver’s acceptance rate, average trip distance, and how recently they completed a ride. Those features come from different pipelines and refresh on different schedules, but the model needs to see them as a single, consistent record at inference time.
A feature store manages this with four objects. Feature Groups collect related features that share a data source and computation pipeline. Each Feature is a single named value a pipeline produces, like an acceptance rate or a trip count. Entities tie those values to a real-world instance: one Aerospike record per driver, with every feature from every group co-located in the same record. Datasets define reproducible training slices by selecting which entities and features to include in the ML training step.
You’ll soon explore a minimal pre-made feature store in the provided feature_store_tutorial.ipynb notebook to understand how these objects map to Aerospike sets and records.
First, you’ll need to set up your system.