Workflows Sample Project: Registering a Model

Objectives

    Understand the process involved in registering models
    Passing environment variables to Gradient Workflows
    Registering Tensorflow Models in Gradient

Introduction

Training workloads in Gradient can generate machine learning models, which can be interpreted and stored in your Project's Models list. This list holds references to the model and checkpoint files generated during the training period as well as summary metrics associated with the model's performance, such as accuracy and loss.
In this tutorial, we will create a Worklfow to generate a Keras model based on the Fashion MNIST dataset. We will learn techniques such as using the Git checkout action, passing environment variables to Workflows, and specifying the right container image.
The model is trained in Keras but it is finally exported as a TensorFlow model through tf.saved_model.simple_savemethod. This approach seralizes Keras session into a TensorFlow .pb file.
This repo https://github.com/gradient-ai/fashionmnist contains the code for training and inferencing the model.

Create a Project for Fashion MNIST

We will start by creating a project that can contain multiple Workflows we may run during the training. We'll use the CLI here but you can perform the action in the user interface.
1
gradient projects create --name Fashion
Copied!
Now let's create our Workflow
1
gradient workflows create --name fashion-mnist --projectId <id of project>
Copied!

Create a Workflow run to Train the Model

We will now start a Workflow run within the Workflow created above. Make a note of the Workflow id before proceeding further.

Get the training code

Download or copy the YAML training code to your computer.
1
defaults:
2
env:
3
apiKey: secret:api_key #Replace this secret with your own secret
4
resources:
5
instance-type: C5
6
jobs:
7
CloneRepo:
8
inputs:
9
repo:
10
type: volume
11
12
with:
13
url: https://github.com/gradient-ai/fashionmnist.git
14
TrainModel:
15
env:
16
MODEL_DIR: /my-trained-model
17
needs:
18
- CloneRepo
19
inputs:
20
repo:
21
type: volume
22
outputs:
23
trained-model:
24
type: dataset
25
with:
26
ref: dsrvw1m30ymhiyt #Replace this id with your own dataset id
28
with:
29
args:
30
- bash
31
- "-c"
32
- >-
33
cd /inputs/repo/train && python train.py && cp -R /my-trained-model /outputs/trained-model
34
image: 'tensorflow/tensorflow:1.9.0'
35
UploadModel:
36
inputs:
37
model: TrainModel.outputs.trained-model
38
outputs:
39
model-id:
40
type: string
41
needs:
42
- TrainModel
43
uses: create-[email protected]
44
with:
45
name: trained-model
46
type: Tensorflow
Copied!
This YAML file incorporates several concepts that are important to understand:
The secret:api_key parameter masks your API key so it is not visible to others. You can learn how to store an API key as a Secret here.
instance-type: C5 sets a default instance type in case a step does not specify an instance type.
[email protected] is a Gradient Action which will clone a Git repo.
env and MODEL_DIR passes an environment variable to the script. In our code, we decide the location to store the model based on the value defined in the MODEL_DIR environment variable.
image is a parameter that points the step to a Docker image used to execute the step. Note: This same training code can run on a GPU instance which would require using the following image: tensorflow/tensorflow:1.9.0-gpu
TrainModel takes an outputs parameter which stores the model artifacts within a Gradient dataset. You must create a dataset before running the Workflow and add the id on this line.
UploadModel takes a type parameter that specifies the format of the model. In this case, we are passing in TensorFlow as the type. Frameworks other than TensorFlow are supported such as ONNX, TensorRT, and Custom.

Create a Workflow run

1
gradient workflows run \
2
--id <workflow id> \
3
--clusterId <if using a private cluster> \
4
--path ./workflow.yaml
Copied!

Verifying the Creation of Model

We can check if the output of the job is registered as a valid TensorFlow model with the following command.
1
gradient models list
Copied!
+------+-----------------+------------+ | Name | ID | Model Type | Project ID | +------+-----------------+------------+ | None | mosdnkkv1o1xuem | Tensorflow | +------+-----------------+------------+
You can also visit the Models section of Gradient UI to see a list of registered models.

Summary

After registering the model, we can turn that into a Deployment to perform inferencing:
Last modified 2mo ago