Get Started
Tutorials
Notebooks
Projects
Experiments
Jobs
Models
Deployments
Machines
Data
Gradient SDK
Gradient Private Cloud
Instances
Release Notes CLI & SDK

GradientCI

Set up continuous integration between your GitHub repository and Gradient
gradientci logo

Set Up GradientCI

Creating a GradientCI Project via the Paperspace Console

To create a Gradient Project with continuous integration powered by GradientCI and GitHub:

  1. Install the GradientCI GitHub app on your repository. (GitHub Admin privilege is required.)

  2. Navigate to the Projects page.

  3. Click Create Project.

  4. Select GitHub Project.

  5. Grant Paperspace access to your GitHub repos via OAuth.

  6. Confirm a repo with GradientCI installed for your new Gradient° Project.

Configure GradientCI Settings

To set up GradientCI, our continuous integration service, include a directory in your GitHub repository called .ps_project with a configuration file config.yaml, example below.

Building Branches and Tags

GradientCI supports building and sourcing project configuration from arbitrary branches or tags. By default we only build configuration sourced from your default branch (typically master). You can change your repositories default branch from within Github, if you only need the configuration from one branch. You can relax or tighten this rule by selecting "All" or "None" from the "Build Branches" dropdown in the project settings pane of the Gradient console. If you would like to build any tags or a subset of branches that are not the default branch, select "All" from this menu and provide filters in your config.yaml. To list the specific patterns of tags and branches to build, see branch and tag filters.

You may additionally disable the builds of pull requests, enabled by default. Or enable builds of pull requests that originate from forked repositories, disabled by default to prevent unauthorized use of Gradient resources. Each of these options will allow configuration to be sourced from the relevant Git branch.

Gradient console project settings pane

Template

# .ps_project/config.yaml:
version: 1
project: "project-handle"
experiment: "experiment-name" #[optional, default:<repo name>]
type: "experiment-type" #[single|multi-grpc|multi-mpi]
ports: "5000" #[optional, default:5000]
paths: #[optional]
workdir: "/path/to/workdir"
artifacts: "/path/to/artifacts"
model: #[optional, but required for model parsing and model deployments]
type: "model-type" #[required for model, one of: Tensorflow|ONNX|Custom]
path: "/path/to/model"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 1 #[required for multi-node]
parameter-server: #[required for multi-node]
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 1
filters:
branches:
ignore: irrelevant-branch
tags:
only:
- v.*
- latest
checks: #[optional]
tensorflow:loss:
target: "0.0..0.5"
aggregate: "mean"
defaults: #[optional]
precision: 3

Filtering Branches and Tags

By default GradientCI will only build the default branch and no tags. If you would like to build additional non-pull request branches or tags you must select "All" from the "Build Branches" project configuration dropdown. This will build all branches and no tags. Once that is complete you can filter additional branches in your config.yaml by providing a filters section. You can place the keys branches or tags to apply filters to the default. Under each key you can provide only or ignore fields, but not both, containing a Posix compatible regex or array of regex to match on. Branches or tags filtered by an only key must match one or more of the regex provided. Branches or tags filtered by an ignore key will be skipped if they match one or more of the regex provided.

Metrics Checks

The GradientCI service can help you from degrading your model. When you run an experiment based on a code change, GradientCI can automatically check properties of the experiment and forward those results to GitHub. GitHub can use these statuses to prevent pull requests merges that degrade your model. GradientCI will automatically report whether the experiment ran without error or which checks failed if there was an error. You may configure additional checks for metrics that are generated by your experiment under the checks key in your config.yaml. Currently, we only support scalar metrics coming from TensorFlow generated summaries.

In addition to the status checks, GradientCI writes a detailed summary of the experiment to a comment on the pull request so you have your critical data at a glance while reviewing code.

GitHub pull request blocked by failing GradientCI metric checks.

Configuration

For best results, especially on repositories with many contributors, we recommend configuring GitHub branch protections to prevent accidental merges of unintended pull requests. After your first build, statuses for the metrics will be reported back and you can make passing statuses required for the merge of your pull request. To do this follow GitHub's documentation.

# ...
checks:
<identifier>:
target: <range> #[required]
aggregate: mean #[required]
round: down #[optional, default: down, up|down]
precision: 2 #[optional, default: 2]
only-pulls: false #[optional, default: false]
if-not-found: failure #[optional, default: "failure", success|failure]
comment-on-pr: true #[optional, default: true]
defaults:
round: down #[optional, default: down, up|down]
precision: 2 #[optional, default: 2]
only-pulls: false #[optional, default: false]
if-not-found: failure #[optional, default: "failure", success|failure]
comment-on-pr: true #[optional, default: true]

<identifier>s:

  • <identifiers> are split into two parts: 1. the source of the metric and 2. the name of the metric, e.g. tensorflow:loss.

    • Currently the only supported source is tensorflow:

  • These <identifiers> are case-sensitive

  • “defaults”: reserved for defaults to set for the <identifiers>. Any of the above keys are valid within this block and will set the default behavior for all other <identifier> blocks. If round is set to down all other metrics will be evaluated using the round down behavior. If an <identifier> has a key specified underneath it it will override the value in the defaults block. For instance, round: up under tensorflow:loss will override the round: down behavior in the defaults block.

<range>

Ranges can appear in the following forms:

  1. <number> or <number>.. this form allows you to specify that the metric must be greater than <number>.

  2. ..<number> this form allows you to specify that the metric must be less than <number>.

  3. <left>..<right> this form allows you to specify that the metric must be less than <left> but greater than <right>.

Note these numbers are parsed as floats and relying on precise equality with the ends of the range is not recommended.

Required Properties

  • target: this is a <range> for the metric to appear in

  • aggregate: this is the aggregating function to evaluate the metric by.

    • Currently this is only supported for the aggregates automatically generated by tensorflow, these values are:

      • mean

      • stddev

      • max

      • min

      • var

      • median

Optional Properties

  • precision: how many decimal places to keep (default 2)

  • round: specifies rounding behavior (default “down”)

  • only-pulls: only perform this check on pull requests

  • if-not-found: return a default status if job has no data for <identifier>, defaults to “failure”

  • comment-on-pr: include this metric in a summary content if the metrics were generated from a pull-request

Models

To store model information in the Model Repository, add model properties to your configuration file.

  • model.type: defines the type of model that is being generated by the experiment. Model type must be one of Tensorflow, ONNX, or Custom

  • model.path: defines where in the context of the experiment the model checkpoint files are being stored.

model:
type: "Tensorflow"
path: "/path/to/model"

Dockerfiles

To use dockerfiles for experiment containers, the following configurations are available:

  • worker.dockerfile.use: true/false indicates whether a supplied dockerfile should be used for worker nodes.

  • worker.dockerfile.path: defines the relative path of the dockerfile for worker nodes. If no value is provided (and dockerfile.use is set to true) the assumed location is "./Dockerfile"

  • parameter-server.dockerfile.use: true/false indicates whether a supplied dockerfile should be used for multinode configuration parameter servers.

  • parameter-server.dockerfile.path: defines the relative path of the dockerfile for multinode configuration parameter servers. If no value is provided and dockerfile.use is set to true, the assumed location is "./Dockerfile"

worker:
[...]
dockerfile:
use: true
parameter-server:
[...]
dockerfile:
use: true
worker:
[...]
dockerfile:
use: true
path: "./docker/worker-Dockerfile"
parameter-server:
[...]
dockerfile:
use: true
path: "./docker/parameter-server-Dockerfile"

Examples

Repositories

Examples with only required fields

Single-node

version: 1
type: "single"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"

Multinode

version: 1
type: "multi-grpc"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 2
parameter-server:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 1

Examples with optional fields included

Single-node

version: 1
project: "fko0j2xs3mqqi"
experiment: "momo/perfect-run"
type: "single"
ports: "5000"
paths:
workdir: "/home/playground"
artifacts: "/artifacts"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
dockerfile:
use: true

Multinode

version: 1
project: "fko0j2xs3mqqi"
experiment: "momo/perfect-runner"
type: "multi-grpc"
ports: "5000"
paths:
workdir: "/home/playground"
artifacts: "/storage/models"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 2
dockerfile:
use: true
path: "./docker/worker-Dockerfile"
parameter-server:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 1
dockerfile:
use: true
path: "./docker/parameter-server-Dockerfile"

TensorFlow Model Summary Checks

version: 1
project: "fko0j2xs3mqqi"
experiment: "momo/perfect-runner"
type: "multi-grpc"
ports: "5000"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 2
model:
type: "TensorFlow"
path: "/storage/models"
checks:
default:
precision: 3
round: up
only-prs: true
tensorflow:accuracy:
target: 0.7..
aggregation: mean
tensorflow:loss:
target: ..0.025
aggregation: max

Uninstalling GradientCI

Note this can only be done by an organization level administrator or on your personal repositories.

  1. Navigate to the repository or organization that you wish to remove GradientCI from.

  2. Click the "Settings" tab in the top row

  3. Select "Integration & services" from the left menu, you should be presented with a list that includes "GradientCI" that looks like

    Integration & services pane

  4. Select "Configure" next to "GradientCI", you will be prompted to enter your password.

  5. From the "GradientCI" settings menu, you can then either GradientCI Settings

    a. Uninstall the application from all repositories on the organization or personal account by clicking the red "Uninstall" button

    b. Select the "Only select repositories" and choosing which repositories should have GradientCI from the dropdown, or unselecting them by clicking the "x" next to their name.