Next steps
For the complete documentation index see: llms.txt
All documentation pages available in markdown.
What’s next: Part 3
You now have the two artifacts that matter for deployment: a saved model and a reproducible training dataset definition. Part 3 shows how to use them in a request-time prediction path.
You will switch from Spark batch reads to low-latency Aerospike key lookups, use get_feature_vector(), and connect feature retrieval directly to model inference.
This is where production constraints become real: even when model inference is fast, overall prediction latency depends on fetching the right features quickly for each request.
Continue to Part 3: Model Serving and start at Cell 14 in the same notebook.
Project directory snapshot
If you followed the full tutorial, your project directory may look like this:
Directoryfeature-store-tutorial/
- aerospike-spark-5.0.1-spark3.5-scala2.13-clientunshaded.jar Spark connector
- feature_store_tutorial.ipynb Your Jupyter notebook (Parts 1 and 2)
Directorymodels/
Directorytrip_decline_risk_lr/ Saved Logistic Regression model
- …
Directorydatasets/
Directorytrip-decline-risk-v1/ Materialized dataset Parquet files
- …
Advanced topics
Once you’ve completed the tutorial, consider coming back and exploring other classifiers besides logistic regression.
Try RandomForestClassifier, GBTClassifier, or MultilayerPerceptronClassifier from pyspark.ml.classification and evaluate how it performs on the same data.
Related documentation
- Aerospike Spark connector - Configuration options and advanced features
- Spark MLlib Guide - Classification, regression, and clustering algorithms