Saving and loading models

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

Trained models live in memory unless you persist them. Here, you’ll save the model artifact that Part 3 will later serve. This artifact file contains the parameters learned during training. It holds the model weights and related metadata. Reloading it gives you the same trained model without retraining.

Save, reload, and score the model

Run Cell 13 to save, reload, and score with the persisted model.

Cell 13: Save, reload, and score with the persisted model

import os
from pyspark.ml.classification import LogisticRegressionModel

os.makedirs("./models", exist_ok=True)
model_path = "./models/trip_decline_risk_lr"

lr_model.write().overwrite().save(model_path)
lr_model_loaded = LogisticRegressionModel.load(model_path)

loaded_predictions = lr_model_loaded.transform(test_df)
loaded_predictions.select("driver_id", "label", "prediction").show(5)

print(f"Saved model to {model_path}")
print("Reloaded model and scored test rows")

Save, reload, and score

+----------+-----+----------+
| driver_id|label|prediction|
+----------+-----+----------+
|driver_007|    0|       0.0|
|driver_012|    1|       1.0|
|driver_019|    0|       0.0|
|driver_023|    0|       0.0|
|driver_031|    1|       1.0|
+----------+-----+----------+
Saved model to ./models/trip_decline_risk_lr
Reloaded model and scored test rows

Saving and loading confirms your training output is portable and can be restored later for inference. No new learning happens during load; this is model deserialization followed by normal scoring with the restored artifact.

Why this matters for Part 3

In Part 3, you’ll combine this saved model with live feature retrieval from Aerospike to build an end-to-end serving flow.

You now have the training artifact needed for model serving.