Train a Model with the CLI

Objectives

Prerequisites:

Steps

  • Learn the workflow involved in training a model

  • Use custom containers with experiments

  • Create an experiment that trains a Scikit-learn model

  • Share files and models across experiments

The experiment generates a Python pickle file that gets stored in the shared storage service of Gradient.

Creating a Gradient Experiment to Train the Model

In this step, we will generate a fully-trained model by submitting the dataset and Python code to Gradient. The output from this experiment is stored in a centrally accessible storage location that will be used in the next step of this tutorial.

Let’s take a minute to get familiar with the dataset. To keep this really simple, we are using one feature (x) that represents the years of experience of a candidate and a label (y) associated with the salary.

The dataset, sal.csv, is available in the data folder of the GitHub repo. Clone the repo to your local machine and open it in your favorite text editor to explore the data.

In the train directory, you’ll find train.py, which is responsible for generating the model by applying linear regression to the dataset.

import numpy as np
import pandas as pd
import os
from argparse import ArgumentParser
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
parser = ArgumentParser()
parser.add_argument("-i", "--in", dest="input",
help="location of input dataset")
parser.add_argument("-o", "--out",dest="output",
help="location of model"
)
dataset = parser.parse_args().input
model_dir = parser.parse_args().output
sal = pd.read_csv(dataset,header=0, index_col=None)
X = sal[['x']]
y = sal['y']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=10)
lm = LinearRegression()
lm.fit(X_train,y_train)
print('Intercept :', round(lm.intercept_,2))
print('Slope :', round(lm.coef_[0],2))
from sklearn.metrics import mean_squared_error
y_predict= lm.predict(X_test)
mse = mean_squared_error(y_predict,y_test)
print('MSE :', round(mse,2))
from sklearn.externals import joblib
if not os.path.exists(model_dir):
os.makedirs(model_dir)
filename = model_dir+'/model.pkl'
joblib.dump(lm, filename)

Apart from applying linear regression, the code prints parameters such as intercept, slope, and MSE to stdout. The trained model is stored in the directory named salary under /storage as a .pkl file.

We are ready to kick off the training experiment on Gradient 🚀 Run the below command to submit the experiment.

The location /storage maps to Gradient persistent storage location. Anything stored at this location will be available even after the experiment is terminated. By storing the .pkl file at /storage, we will be able to access it from the model serving experiment that exposes the REST endpoint.

gradient experiments run singlenode \
--name train \
--projectId prj0ztwij \
--container janakiramm/python:3 \
--machineType C2 \
--command 'python train/train.py -i ./data/sal.csv -o /storage/salary' \
--workspace https://github.com/janakiramm/Salary

This command does quite a bit of heavy lifting for us. It schedules the training experiment in one of the chosen machine types and kicks off the training process.

Let’s analyze the steps taken by Gradient to finish the experiment.

  1. The CLI downloads all the files (including the dataset) and uploads it to Gradient

    1. You can also run the experiment from your local system by not specifying the --workspace parameter! This compress all the files into a .zip file and upload it to Gradient.

  2. Gradient pulls the container image instructed in the CLI (--container janakiramm/python:3) from the registry, which is Docker Hub for this scenario

  3. Before running the container, Gradient maps the directory with the uploaded files to the container’s working directory

  4. Gradient maps the command sent via the CLI parameter (--command 'python train/train.py -i ./data/sal.csv -o /storage/salary') to the docker run command

  5. Gradient schedules the container in one of the machines that match the type mentioned in the CLI (--machineType C2)

Once the experiment's status moves into run mode, it simply executes the code in the Python file. In train.py that we uploaded, we do two things - print the coefficients like intercept, slope, MSE and copy the model into the /storage location.

The output from CLI confirms that the experiment has been successfully executed. If you navigate to the experiment in the UI, you will see the logs printed the coefficients used in the script along with the message PSEOF which is a healthy sign.

We are using a custom Docker container image with prerequisites such as NumPy, Scipy, Pandas, and Scikit-learn. This image was built from the official Python 3 Docker image.