Public Profiles

Train & Deploy an ML Model with the Gradient CLI




  • Learn the workflow involved in training a model

  • Use custom containers with experiments

  • Create an experiment that trains a Scikit-learn model

  • Share files and models across experiments

  • Create a long-running job (web service) that serves the model

  • Access the REST endpoint exposed by a job

There are two Gradient experiments involved in this workflow -- training and deployment. The first experiment generates a Python pickle file that gets stored in the shared storage service of Gradient. The same pickle file will be used by the second experiment running a Flask web server to expose a REST endpoint. This experiment will serve the model through the inferencing endpoint.

Creating a Gradient Experiment to Train the Model

In this step, we will generate a fully-trained model by submitting the dataset and Python code to Gradient. The output from this experiment is stored in a centrally accessible storage location that will be used in the next step of this tutorial.

Let’s take a minute to get familiar with the dataset. To keep this really simple, we are using one feature (x) that represents the years of experience of a candidate and a label (y) associated with the salary.

The dataset, sal.csv, is available in the data folder of the GitHub repo. Clone the repo to your local machine and open it in your favorite text editor to explore the data.

In the train directory, you’ll find, which is responsible for generating the model by applying linear regression to the dataset.

import numpy as np
import pandas as pd
import os
from argparse import ArgumentParser
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
parser = ArgumentParser()
parser.add_argument("-i", "--in", dest="input",
help="location of input dataset")
parser.add_argument("-o", "--out",dest="output",
help="location of model"
dataset = parser.parse_args().input
model_dir = parser.parse_args().output
sal = pd.read_csv(dataset,header=0, index_col=None)
X = sal[['x']]
y = sal['y']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=10)
lm = LinearRegression(),y_train)
print('Intercept :', round(lm.intercept_,2))
print('Slope :', round(lm.coef_[0],2))
from sklearn.metrics import mean_squared_error
y_predict= lm.predict(X_test)
mse = mean_squared_error(y_predict,y_test)
print('MSE :', round(mse,2))
from sklearn.externals import joblib
if not os.path.exists(model_dir):
filename = model_dir+'/model.pkl'
joblib.dump(lm, filename)

Apart from applying linear regression, the code prints parameters such as intercept, slope, and MSE to stdout. The trained model is stored in the directory named salary under /storage as a .pkl file.

We are ready to kick off the training experiment on Gradient 🚀 Run the below command to submit the experiment.

The location /storage maps to Gradient persistent storage location. Anything stored at this location will be available even after the experiment is terminated. By storing the .pkl file at /storage, we will be able to access it from the model serving experiment that exposes the REST endpoint.

gradient experiments run singlenode \
--name train \
--projectId prj0ztwij \
--container janakiramm/python:3 \
--machineType C2 \
--command 'python train/ -i ./data/sal.csv -o /storage/salary' \

This command does quite a bit of heavy lifting for us. It schedules the training experiment in one of the chosen machine types and kicks off the training process.

Let’s analyze the steps taken by Gradient to finish the experiment.

  1. The CLI downloads all the files (including the dataset) and uploads it to Gradient

    1. You can also run the experiment from your local system by not specifying the --workspaceUrl parameter! This compress all the files into a .zip file and upload it to Gradient.

  2. Gradient pulls the container image instructed in the CLI (--container janakiramm/python:3) from the registry, which is Docker Hub for this scenario

  3. Before running the container, Gradient maps the directory with the uploaded files to the container’s working directory

  4. Gradient maps the command sent via the CLI parameter (--command 'python train/ -i ./data/sal.csv -o /storage/salary') to the docker run command

  5. Gradient schedules the container in one of the machines that match the type mentioned in the CLI (--machineType C2)

Once the experiment's status moves into run mode, it simply executes the code in the Python file. In that we uploaded, we do two things - print the coefficients like intercept, slope, MSE and copy the model into the /storage location.

The output from Paperspace CLI confirms that the experiment has been successfully executed. If you navigate to the experiment in the UI, you will see the logs printed the coefficients used in the script along with the message PSEOF which is a healthy sign.

We are using a custom Docker container image with prerequisites such as NumPy, Scipy, Pandas, and Scikit-learn. This image was built from the official Python 3 Docker image.

Creating a Gradient Experiment to Deploy and Host the Model

Note: check out the Create a Deployment docs for a more up-to-date way to deploy your models using the newer first-class Deployments feature in Gradient. The following section describes a how to deploy models with Gradient using Jobs.

We are now ready to host the trained model in a Gradient experiment that runs a Flask web server. The experiment loads the pickle file created and stored by the last experiment at the /storage location.

Navigate to the deploy directory of the cloned repo to find

from flask import Flask, jsonify
from sklearn.externals import joblib
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument("-m", "--model", dest="model",
help="location of the pickle file")
filename = parser.parse_args().model
app = Flask(__name__)
def index():
return "Stackoverflow Salary Predictor"
@app.route('/sal/<int:x>', methods=['GET'])
def predict(x):
sal=jsonify({'salary': round(y,2)})
return sal
if __name__ == '__main__':'', port=8080)

This is a standard Flask web server listening on port 8080. It exposes /sal endpoint which takes the number of years of experience and returns the predicted salary.

Each time a request is made, it loads the latest version of the pickle file, calls the predict method, serializes the output in JSON and returns the response.

Since Flask is not a part of the container image used in the tutorial, we will need to install it with pip before running the script, which can be included in the run command.

To access the REST endpoint, we also need to instruct Gradient to map the container port to the host port. This is done through the CLI parameter --ports.

Unlike the previous experiment, this wouldn’t get terminated unless it is manually stopped or destroyed. The Flask web server turns the experiment into a long-running experiment that doesn’t exit automatically.

Let’s go ahead and submit the experiment to Gradient.

gradient jobs create \
--container janakiramm/python:3 \
--machineType C2 \
--ports 8080:8080 \
--command 'pip install flask && python deploy/ -m /storage/salary/model.pkl'

The logs shown by the CLI confirms that the web server is up and running. Before we can access the endpoint, we need to get the DNS name of the experiment.

Hit Ctrl+C to get back to the command prompt. Don’t worry! this doesn’t terminate the experiment but only exits the CLI.

Let’s explore the experiment details with the below command:

gradient jobs list

Make a note of the fqdn parameter mentioned in the output. We need that to access the REST endpoint. Since we are using the jq utility, we can also grab the fqdn with a simple command.

export GRAD_HOST=`paperspace jobs list | jq -r .[].fqdn`

It’s time for us to hit the REST endpoint to get the predictions. Let’s check the expected salary of a candidate with 25 years of experience.

curl $GRAD_HOST:8080/sal/25

This should return:


Congratulations! You have successfully completed the end-to-end workflow involved in training and deploying machine learning models with Gradient.

Let’s do the clean up by stopping and destroying the experiment.

export JOB_ID=`gradient jobs list | jq .[].id`
gradient jobs stop $JOB_ID
gradient jobs destroy $JOB_ID