Public Profiles

Getting Started with Notebooks

Objectives

  • Launch a Jupyter Notebook with a single click

  • Access existing datasets available within Gradient

  • Train a machine learning model

  • Save the model for inferencing

This example is available in our ML Showcase! You can clone the notebook here.

Introduction

Gradient provides one-click access to Jupyter Notebooks. You can choose pre-configured environments to launch Notebook instances or create a container with custom environments.

In this walkthrough, we will launch a Jupyter Notebook to train a logistic regression model based on the MNIST dataset. Gradient comes with a set of datasets that are readily available at /datasets location. Instead of downloading the MNIST dataset to local storage, we will access the existing dataset.

We will also learn how to save the model for inferencing by persisting the final joblib file to the /storage location.

You can download the completed Jupyter Notebook and upload it to the VM.

Launching The Notebook Instance

Click on the Notebooks section in the left navigation bar to launch a Jupyter Notebook.

The first step is to choose the pre-configured environment. Choose Jupyter Notebook Data Science Stack that comes with the core modules needed for our model.

In the next step, choose the machine type. Since we don’t need high-end machines with GPUs, we can choose a low-cost instance. Turn on the setting Enable low-cost instances and choose G1 machine type that comes with 1 CPU, 1.7GB RAM and 250GB SSD.

In the final step, give the Notebook instance a name and click on Create Notebook button.

In a few minutes, the Notebook instance is ready. Launch it by clicking on the URL shown below the status.

We are now ready to launch the Notebook. Choose Python 3 option under the Notebooks section.

Rename the Notebook to give it a meaningful name. We are now ready to train the model.

Training the Model

Start by importing the modules. We are using Scikit-learn and relevant modules for this model. Since the environment doesn’t have joblib module, we will install it before using it. This is a one-time task that needs to run at the beginning of the training job.

Next, we will create a couple of helper functions that load the dataset and changes the shape as expected by Scikit-learn.

We will now load the MNIST dataset from /datasets location. You can browse the files within the Jupyter environment.

The loadMNIST helper function loads the dataset and converts into a NumPy array.

Let us verify if the dataset is loaded correctly by randomly visualizing a few data points.

Before we pass the training and test data to Scikit-learn Logistic Regression object, we need to reshape it.

We are now ready to fit the data into a logistic regression model.

Let’s call the predict method to see how accurate our model is. We will use the output of this to generate a confusion matrix.

This prints a confusion matrix shown below:

Finally, we will persist the trained model at /storage/mnist for accessing it later. The model that is saved to model.pkl is available to other Notebooks and Jobs launched within your account.

You can use the Jupyter environment to navigate to the /storage/mnist directory to find the saved model.