Next steps

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

What’s next: Part 3

You now have the two artifacts that matter for deployment: a saved model and a reproducible training dataset definition. Part 3 shows how to use them in a request-time prediction path.

You will switch from Spark batch reads to low-latency Aerospike key lookups, use get_feature_vector(), and connect feature retrieval directly to model inference. This is where production constraints become real: even when model inference is fast, overall prediction latency depends on fetching the right features quickly for each request.

Continue to Part 3: Model Serving and start at Cell 14 in the same notebook.

Project directory snapshot

If you followed the full tutorial, your project directory may look like this:

Directoryfeature-store-tutorial/
- aerospike-spark-5.0.1-spark3.5-scala2.13-clientunshaded.jar Spark connector
- feature_store_tutorial.ipynb Your Jupyter notebook (Parts 1 and 2)
- Directorymodels/
  - Directorytrip_decline_risk_lr/ Saved Logistic Regression model
    …
- Directorydatasets/
  - Directorytrip-decline-risk-v1/ Materialized dataset Parquet files
    …

Advanced topics

Once you’ve completed the tutorial, consider coming back and exploring other classifiers besides logistic regression. Try RandomForestClassifier, GBTClassifier, or MultilayerPerceptronClassifier from pyspark.ml.classification and evaluate how it performs on the same data.

Aerospike Spark connector - Configuration options and advanced features
Spark MLlib Guide - Classification, regression, and clustering algorithms