GradientCI

Set up continuous integration between your GitHub repository and Gradient
gradientci logo

Set Up GradientCI

Creating a GradientCI Project via the Paperspace Console

To create a Gradient Project with continuous integration powered by GradientCI and GitHub:

  1. Install the GradientCI GitHub app on your repository. (GitHub Admin privilege is required.)

  2. Navigate to the Projects page.

  3. Click Create Project.

  4. Select GitHub Project.

  5. Grant Paperspace access to your GitHub repos via OAuth.

  6. Confirm a repo with GradientCI installed for your new Gradient° Project.

Configure GradientCI Settings

To set up GradientCI, our continuous integration service, include a directory in your GitHub repository called .ps_project with a configuration file config.yaml. This file must exist on your default branch (typically master) in GitHub. We do not yet read the configuration from pull-requests or alternative branches.

Template

# .ps_project/config.yaml:
version: 1
project: "project-handle"
experiment: "experiment-name" # [optional, default:<repo name>]
type: "experiment-type" # [single|multi-grpc|multi-mpi]
ports: "5000" # [optional, default:5000]
paths: # [optional]
workdir: "/path/to/workdir"
artifacts: "/path/to/artifacts"
model: # [optional]
type: "model-type" [ex: Tensorflow|ONNX]
path: "/path/to/model"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 1 # [required for multi-node]
parameter-server: # [required for multi-node]
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 1
checks: # [optional]
tensorflow:loss:
target: 0.0..0.5
aggregate: "mean"
defaults: # [optional]
precision: 3

Metrics Checks

Our GradientCI service can help you from degrading your model. When we run an experiment based on a code change, we can check properties of the experiment and forward those results to GitHub. GitHub uses these statuses to prevent pull requests that degrade your model from merging. We will automatically report whether the experiment ran without error as the status gradientci. You may configure additional checks for metrics that are generated by your experiment under the checks key in your config.yaml. Currently, we only support scalar metrics coming from TensorFlow generated summaries.

In addition to the status checks, GradientCI writes a detailed summary of the experiment to a comment on the pull request so you have your critical data at a glance while reviewing code!

GitHub pull request blocked by failing GradientCI metric checks.

Configuration

For best results, especially on repositories with many contributors, we recommend configuring GitHub branch protections to prevent accidental merges of unintended pull requests. After your first build, statuses for the metrics will be reported back and you can make passing statuses required for the merge of your pull request. To do this follow GitHub's documentation.

# ...
checks:
<identifier>:
target: <range> #[required]
aggregate: mean #[required]
round: down #[ optional, default: down, up|down]
precision: 2 #[ optional, default: 2]
only-pulls: false #[ optional, default: false]
if-not-found: failure #[ optional, default: "failure", success|failure]
comment-on-pr: true #[optional, default: true]
defaults:
round: down #[ optional, default: down, up|down]
precision: 2 #[ optional, default: 2]
only-pulls: false #[ optional, default: false]
if-not-found: failure #[ optional, default: "failure", success|failure]
comment-on-pr: true #[optional, default: true]

<identifier>s:

  • <identifiers> are split into two parts: 1. the source of the metric and 2. the name of the metric, e.g. tensorflow:loss.

    • Currently the only supported source is tensorflow:

  • These <identifiers> are case-sensitive

  • “defaults”: reserved for defaults to set for the <identifiers>. Any of the above keys are valid within this block and will set the default behavior for all other <identifier> blocks. If round is set to down all other metrics will be evaluated using the round down behavior. If an <identifier> has a key specified underneath it it will override the value in the defaults block. For instance, round: up under tensorflow:loss will override the round: down behavior in the defaults block.

<range>

Ranges can appear in the following forms:

  1. <number> or <number>.. this form allows you to specify that the metric must be greater than <number>.

  2. ..<number> this form allows you to specify that the metric must be less than <number>.

  3. <left>..<right> this form allows you to specify that the metric must be less than <left> but greater than <right>.

Note these numbers are parsed as floats and relying on precise equality with the ends of the range is not recommended.

Required Properties

  • target: this is a <range> for the metric to appear in

  • aggregate: this is the aggregating function to evaluate the metric by.

    • Currently this is only supported for the aggregates automatically generated by tensorflow, these values are:

      • mean

      • stddev

      • max

      • min

      • var

      • median

Optional Properties

  • precision: how many decimal places to keep (default 2)

  • round: specifies rounding behavior (default “down”)

  • only-pulls: only perform this check on pull requests

  • if-not-found: return a default status if job has no data for <identifier>, defaults to “failure”

  • comment-on-pr: include this metric in a summary content if the metrics were generated from a pull-request

Models

To store model information in the Model Repository, add model properties to your configuration file.

  • model.type: defines the type of model that is being generated by the experiment.

  • model.path: defines where in the context of the experiment the model checkpoint files are being stored.

model:
type: "Tensorflow"
path: "/var/ml/models"

Dockerfiles

To use dockerfiles for experiment containers, the following configurations are available:

  • worker.dockerfile.use: indicates whether a supplied dockerfile should be used for worker nodes.

  • worker.dockerfile.path: defines the relative path of the dockerfile for worker nodes. If no value is provided (and dockerfile.use is set to true) the assumed location is "./Dockerfile"

  • parameter-server.dockerfile.use: indicates whether a supplied dockerfile should be used for multinode configuration parameter servers.

  • parameter-server.dockerfile.path: defines the relative path of the dockerfile for multinode configuration parameter servers. If no value is provided (and dockerfile.use is set to true) the assumed location is "./Dockerfile"

worker:
[...]
dockerfile:
use: true
parameter-server:
[...]
dockerfile:
use: true
worker:
[...]
dockerfile:
use: true
path: "./docker/worker-Dockerfile"
parameter-server:
[...]
dockerfile:
use: true
path: "./docker/parameter-server-Dockerfile"

Examples

Repositories

Examples with only required fields

Single-node

version: 1
type: "single"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"

Multinode

version: 1
type: "multi-grpc"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 2
parameter-server:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 1

Examples with optional fields included

Single-node

version: 1
project: "fko0j2xs3mqqi"
experiment: "momo/perfect-run"
type: "single"
ports: "5000"
paths:
workdir: "/home/playground"
artifacts: "/var/ml/artifacts"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
dockerfile:
use: true

Multinode

version: 1
project: "fko0j2xs3mqqi"
experiment: "momo/perfect-runner"
type: "multi-grpc"
ports: "5000"
paths:
workdir: "/home/playground"
artifacts: "/var/ml/artifacts"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 2
dockerfile:
use: true
path: "./docker/worker-Dockerfile"
parameter-server:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 1
dockerfile:
use: true
path: "./docker/parameter-server-Dockerfile"

TensorFlow Model Summary Checks

version: 1
project: "fko0j2xs3mqqi"
experiment: "momo/perfect-runner"
type: "multi-grpc"
ports: "5000"
worker:
container: "tensorflow/tensorflow:1.8.0-gpu"
command: "nvidia-smi"
machine-type: "K80"
count: 2
model:
type: TensorFlow
path: /artifacts
checks:
default:
precision: 3
round: up
only-prs: true
tensorflow:accuracy:
target: 0.7..
aggregation: mean
tensorflow:loss:
target: ..0.025
aggregation: max