Skip to content

Part 1: Feature Engineering

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

Objectives

By the end of this tutorial, you will be able to:

  • Understand what a feature store is and why it solves data consistency problems in ML workflows.
  • Explore the four main feature store objects: Feature Groups, Features, Entities, and Datasets.
  • See how the feature store maps to Aerospike sets and records in a ride-hailing application.

A ride-hailing platform’s smart dispatch system needs to predict which driver is most likely to accept a given ride request. That prediction depends on features like each driver’s acceptance rate, average trip distance, and how recently they completed a ride. Those features come from different pipelines and refresh on different schedules, but the model needs to see them as a single, consistent record at inference time.

A feature store manages this with four objects. Feature Groups collect related features that share a data source and computation pipeline. Each Feature is a single named value a pipeline produces, like an acceptance rate or a trip count. Entities tie those values to a real-world instance: one Aerospike record per driver, with every feature from every group co-located in the same record. Datasets define reproducible training slices by selecting which entities and features to include in the ML training step.

You’ll soon explore a minimal pre-made feature store in the provided feature_store_tutorial.ipynb notebook to understand how these objects map to Aerospike sets and records.

First, you’ll need to set up your system.

Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?