Putting it all together
For the complete documentation index see: llms.txt
All documentation pages available in markdown.
Full pipeline architecture
Across three tutorials, you set up a complete ML pipeline:
| Stage | Part | Client | Purpose |
|---|---|---|---|
| Ingest | Part 1 | Spark connector | Bulk-write feature metadata and entity values |
| Train | Part 2 | Spark connector + MLlib | Materialize datasets, train and save models |
| Serve | Part 3 | Python client | Retrieve features in real time and run inference |
The Spark connector and Python client are complementary. Batch pipelines need distributed throughput, and serving needs direct low-latency key lookups.
What you completed in Part 3
- Used the Aerospike Python client for sub-millisecond feature retrieval
- Used
Entity.get_feature_vector()to drive low-latency feature retrieval in the serving path - Built
predict_decline_risk()for end-to-end online inference - Verified that retrieval latency stays sub-millisecond at larger scale
If your project has feature_store_tutorial.ipynb, models/trip_decline_risk_lr, and datasets/trip-decline-risk-v1/, you have the core artifacts from the full tutorial series.
Production considerations
The tutorial pipeline demonstrates the core pattern. In production, you’d also consider:
| Concern | Approach |
|---|---|
| Connection pooling | Reuse the Aerospike client connection across requests. The client is thread-safe. |
| Model versioning | Save models with version suffixes (trip_decline_risk_lr_v2). Load the active version from a config or registry. |
| Feature freshness | Batch pipelines update features periodically. Monitor staleness because features that have not been updated may indicate a pipeline failure. |
| Graceful degradation | If get_feature_vector() returns None, fall back to a default prediction or distance-only ranking. |
| Monitoring | Track prediction latency, feature retrieval latency, and prediction distribution over time. A shift in distribution may indicate data drift. |
Where to go next
- Aerospike Python client for connection management and serving best practices
- Aerospike Spark connector for batch ingestion and training workflows
- Spark MLlib Guide for model selection and advanced evaluation