Get Started
Tutorials
Notebooks
Public Profiles
Projects
Experiments
Jobs
Models
Deployments
Machines
Data
Gradient SDK
Gradient Private Cloud
Instances
Release Notes

Experiments Client

Importing

from gradient import ExperimentsClient
api_key='YOUR_API_KEY'
experiments_client = ExperimentsClient(api_key)

Listing Experiments

#list experiments in the project, filtering by project_id
experiments = experiment_client.list([project_id]))
#print first experiment
print(experiments[0])

Get experiment by ID

print(experiment_client.get(experiment_id))
SingleNodeExperiment(name='single_node_experiment-sdk', ports=None,
workspace_url='https://github.com/Paperspace/mnist-sample.git',
working_directory=None, artifact_directory=None, cluster_id=None,
experiment_env={'EVAL_SECS': 10, 'MAX_STEPS': 100, 'TRAIN_EPOCHS': 3,
'EPOCHS_EVAL': 2}, project_id='prlvf0u05', model_type='Tensorflow',
model_path='/storage/models/sdk_test', id='esxfbgzu7xanst', state=1,
container='tensorflow/tensorflow:1.13.1-gpu-py3', machine_type='K80',
command='python mnist.py', container_user=None, registry_username=None,
registry_password=None, registry_url=None, experiment_type_id=1)

View State of the Experiment

Import the ExperimentState constants

from gradient import constants
#experiment state, created but not started
state = experiment_client.get(experiment_id).state
print("state: "+constants.ExperimentState.get_state_str(state))
help(constants.ExperimentState)
Help on class ExperimentState in module gradient.constants:
class ExperimentState(builtins.object)
| Class methods defined here:
|
| get_state_str(state_int) from builtins.type
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| CANCELLED = 8
|
| CREATED = 1
|
| ERROR = 6
|
| FAILED = 7
|
| NETWORK_SETTING_UP = 12
|
| NETWORK_SETUP = 3
|
| NETWORK_TEARDOWN = 9
|
| NETWORK_TEARING_DOWN = 13
|
| PENDING = 10
|
| PROVISIONED = 2
|
| PROVISIONING = 11
|
| RUNNING = 4
|
| STOPPED = 5

Singlenode experiment

The parameters for a single node experiment are as following: taken from help(ExperimentsClient.create_single_node)

:param str name: Name of new experiment [required]
:param str project_id: Project ID [required]
:param str machine_type: Machine type [required]
:param str command: Container entrypoint command [required]
:param str ports: Port to use in new experiment
:param str workspace_url: Project git repository url
:param str working_directory: Working directory for the experiment
:param str artifact_directory: Artifacts directory
:param str cluster_id: Cluster ID
:param dict experiment_env: Environment variables in a JSON
:param str model_type: defines the type of model that is being generated by the experiment. Model type must be one of Tensorflow, ONNX, or Custom
:param str model_path: Model path
:param str container: Container (dockerfile) [required]
:param str container_user: Container user for running the specified command in the container. If no containerUser is specified, the user will default to 'root' in the container.
:param str registry_username: Registry username for accessing private docker registry container if nessesary
:param str registry_password: Registry password for accessing private docker registry container if nessesary
:param str registry_url: Registry server URL for accessing private docker registry container if nessesary
:returns: experiment handle
:rtype: str

Input Dictionary

One of the benefits of using the SDK is that each function takes in a dictionary as input. This enables easy generation & manipulation of the parameters needed to run an experiment. An example of the parameters to train a single node experiment to train mnist on a GPU is the following:

single_node__model_path = "/storage/models/single_node_experiment_sdk"
env_variable = {
"EPOCHS_EVAL":2,
"TRAIN_EPOCHS":3,
"MAX_STEPS":100,
"EVAL_SECS":10
}
single_node_parameters = {
"name" : "single_node_experiment_sdk",
"project_id" : project_id,
"command" : "python mnist.py",
"machine_type" : "K80",
"experiment_env": env_variable,
"container": "tensorflow/tensorflow:1.13.1-gpu-py3",
"workspace_url": "https://github.com/Paperspace/mnist-sample.git",
"model_type": "Tensorflow",
"model_path": single_node_path
}

Create & Start Singlenode Experiment

Here we create (but not start) a single node experiment by unpacking the dictionary of parameters as function input.

experiment_id = experiment_client.create_single_node(**input_node_parameters)

Now we can start the experiment

experiment_client.start(experiment_id)

We could also immediately run the experiment

experiment_id = experiment_client.run_single_node(**input_node_parameters)

We can then stop the experiment or wait for it to complete

experiment_client.stop(experiment_id)

Multinode Experiment

The parameters for a multi node experiment are as following: taken from help(ExperimentsClient.create_multi_node)

:param str name: Name of new experiment [required]
:param str project_id: Project ID [required]
:param str experiment_type_id: Experiment Type ID [GRPC|MPI] [required]
:param str worker_container: Worker container (dockerfile) [required]
:param str worker_machine_type: Worker machine type [required]
:param str worker_command: Worker command [required]
:param int worker_count: Worker count [required]
:param str parameter_server_container: Parameter server container [required]
:param str parameter_server_machine_type: Parameter server machine type [required]
:param str parameter_server_command: Parameter server command [required]
:param int parameter_server_count: Parameter server count [required]
:param str ports: Port to use in new experiment
:param str workspace_url: Project git repository url
:param str working_directory: Working directory for the experiment
:param str artifact_directory: Artifacts directory
:param str cluster_id: Cluster ID
:param dict experiment_env: Environment variables in a JSON
:param str model_type: defines the type of model that is being generated by the experiment. Model type must be one of Tensorflow, ONNX, or Custom
:param str model_path: Model path
:param str worker_container_user: Worker container user
:param str worker_registry_username: Registry username for accessing private docker registry container if nessesary
:param str worker_registry_password: Registry password for accessing private docker registry container if nessesary
:param str worker_registry_url: Registry server URL for accessing private docker registry container if nessesary
:param str parameter_server_container_user: Parameter server container user
:param str parameter_server_registry_username: Registry username for accessing private docker registry container if nessesary
:param str parameter_server_registry_password: Registry password for accessing private docker registry container if nessesary
:param str parameter_server_registry_url: Registry server URL for accessing private docker registry container if nessesary
:returns: experiment handle
:rtype: str

Sample Multinode input dictionary

##Create a dictionary of parameters for running a distributed/multinode experiment
env = {
"EPOCHS_EVAL":5,
"TRAIN_EPOCHS":10,
"MAX_STEPS":1000,
"EVAL_SECS":10
}
multi_node_parameters = {
"name": "multi_node_example",
"project_id": project_id,
"experiment_type_id": 2,
"worker_container": "tensorflow/tensorflow:1.13.1-gpu-py3",
"worker_machine_type": "K80",
"worker_command": "pip install -r requirements.txt && python mnist.py",
"experiment_env": env,
"worker_count": 2,
"parameter_server_container": "tensorflow/tensorflow:1.13.1-gpu-py3",
"parameter_server_machine_type": "K80",
"parameter_server_command": "python mnist.py",
"parameter_server_count": 1,
"workspace_url": "https://github.com/Paperspace/mnist-sample.git",
"model_type": "Tensorflow"
}

Create & Start Multinode

Here we create (but not start) a single node experiment by unpacking the dictionary of parameters as function input.

experiment_id = experiment_client.create_multi_node(**multi_node_parameters)

Now we can start the experiment

experiment_client.start(experiment_id)

We could also immediately run the experiment

experiment_id = experiment_client.run_multi_node(**multi_node_parameters)

We can then stop the experiment or wait for it to complete

experiment_client.stop(experiment_id)

Experiment Logs

The SDK enables getting logs from running or completed experiments

Help on method logs in module gradient.api_sdk.clients.experiment_client:
logs(experiment_id, line=0, limit=10000) method of gradient.api_sdk.clients.experiment_client.ExperimentsClient instance
Show list of latest logs from the specified experiment.
*EXAMPLE*::
gradient experiments logs --experimentId
:param str experiment_id: Experiment ID
:param int line: line number at which logs starts to display on screen
:param int limit: maximum lines displayed on screen, default set to 10 000
logs = experiment_client.get_logs(experiment_id)
print(logs)

The output is a array of LogRows each containing the line, the message, and the timestamp

LogRow(line=1, message='Requirement already satisfied: numpy>=1.15.4 in
/usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1))
(1.16.3)', timestamp='2019-08-06T17:21:18.528Z')

Streaming logs during runtime

log_stream = experiment_client.yield_logs(experiment_id)
print("Streaming logs of experiment")
try:
while True:
print(log_stream.send(None))
except:
print("done streaming logs")
Streaming logs of experiment
LogRow(line=1, message='Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1)) (1.16.3)', timestamp='2019-08-06T17:21:18.528Z')
LogRow(line=1, message='Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1)) (1.16.3)', timestamp='2019-08-06T17:20:30.842Z')
LogRow(line=1, message='Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1)) (1.16.3)', timestamp='2019-08-06T17:21:14.121Z')
print("done streaming logs")

Getting Help

If you are ever stuck, you can call help on any ExperimentsClient object or function in python

help(ProjectsClient.list)

or at https://paperspace.github.io/gradient-cli/gradient.api_sdk.clients.html#module-gradient.api_sdk.clients.experiment_client