from gradient import ExperimentsClientapi_key='YOUR_API_KEY'experiments_client = ExperimentsClient(api_key)
#list experiments in the project, filtering by project_idexperiments = experiment_client.list([project_id]))#print first experimentprint(experiments[0])
print(experiment_client.get(experiment_id))
SingleNodeExperiment(name='single_node_experiment-sdk', ports=None,workspace_url='https://github.com/Paperspace/mnist-sample.git',working_directory=None, artifact_directory=None, cluster_id=None,experiment_env={'EVAL_SECS': 10, 'MAX_STEPS': 100, 'TRAIN_EPOCHS': 3,'EPOCHS_EVAL': 2}, project_id='prlvf0u05', model_type='Tensorflow',model_path='/storage/models/sdk_test', id='esxfbgzu7xanst', state=1,container='tensorflow/tensorflow:1.13.1-gpu-py3', machine_type='K80',command='python mnist.py', container_user=None, registry_username=None,registry_password=None, registry_url=None, experiment_type_id=1)
Import the ExperimentState constants
from gradient import constants#experiment state, created but not startedstate = experiment_client.get(experiment_id).stateprint("state: "+constants.ExperimentState.get_state_str(state))
help(constants.ExperimentState)Help on class ExperimentState in module gradient.constants:class ExperimentState(builtins.object)| Class methods defined here:|| get_state_str(state_int) from builtins.type|| ----------------------------------------------------------------------| Data descriptors defined here:|| __dict__| dictionary for instance variables (if defined)|| __weakref__| list of weak references to the object (if defined)|| ----------------------------------------------------------------------| Data and other attributes defined here:|| CANCELLED = 8|| CREATED = 1|| ERROR = 6|| FAILED = 7|| NETWORK_SETTING_UP = 12|| NETWORK_SETUP = 3|| NETWORK_TEARDOWN = 9|| NETWORK_TEARING_DOWN = 13|| PENDING = 10|| PROVISIONED = 2|| PROVISIONING = 11|| RUNNING = 4|| STOPPED = 5
The parameters for a single node experiment are as following: taken from help(ExperimentsClient.create_single_node)
:param str name: Name of new experiment [required]:param str project_id: Project ID [required]:param str machine_type: Machine type [required]:param str command: Container entrypoint command [required]:param str ports: Port to use in new experiment:param str workspace_url: Project git repository url:param str working_directory: Working directory for the experiment:param str artifact_directory: Artifacts directory:param str cluster_id: Cluster ID:param dict experiment_env: Environment variables in a JSON:param str model_type: defines the type of model that is being generated by the experiment. Model type must be one of Tensorflow, ONNX, or Custom:param str model_path: Model path:param str container: Container (dockerfile) [required]:param str container_user: Container user for running the specified command in the container. If no containerUser is specified, the user will default to 'root' in the container.:param str registry_username: Registry username for accessing private docker registry container if nessesary:param str registry_password: Registry password for accessing private docker registry container if nessesary:param str registry_url: Registry server URL for accessing private docker registry container if nessesary:returns: experiment handle:rtype: str
One of the benefits of using the SDK is that each function takes in a dictionary as input. This enables easy generation & manipulation of the parameters needed to run an experiment. An example of the parameters to train a single node experiment to train mnist on a GPU is the following:
single_node__model_path = "/storage/models/single_node_experiment_sdk"env_variable = {"EPOCHS_EVAL":2,"TRAIN_EPOCHS":3,"MAX_STEPS":100,"EVAL_SECS":10}single_node_parameters = {"name" : "single_node_experiment_sdk","project_id" : project_id,"command" : "python mnist.py","machine_type" : "K80","experiment_env": env_variable,"container": "tensorflow/tensorflow:1.13.1-gpu-py3","workspace_url": "https://github.com/Paperspace/mnist-sample.git","model_type": "Tensorflow","model_path": single_node_path}
Here we create (but not start) a single node experiment by unpacking the dictionary of parameters as function input.
experiment_id = experiment_client.create_single_node(**input_node_parameters)
Now we can start the experiment
experiment_client.start(experiment_id)
We could also immediately run the experiment
experiment_id = experiment_client.run_single_node(**input_node_parameters)
We can then stop the experiment or wait for it to complete
experiment_client.stop(experiment_id)
The parameters for a multi node experiment are as following: taken from help(ExperimentsClient.create_multi_node)
:param str name: Name of new experiment [required]:param str project_id: Project ID [required]:param str experiment_type_id: Experiment Type ID [GRPC|MPI] [required]:param str worker_container: Worker container (dockerfile) [required]:param str worker_machine_type: Worker machine type [required]:param str worker_command: Worker command [required]:param int worker_count: Worker count [required]:param str parameter_server_container: Parameter server container [required]:param str parameter_server_machine_type: Parameter server machine type [required]:param str parameter_server_command: Parameter server command [required]:param int parameter_server_count: Parameter server count [required]:param str ports: Port to use in new experiment:param str workspace_url: Project git repository url:param str working_directory: Working directory for the experiment:param str artifact_directory: Artifacts directory:param str cluster_id: Cluster ID:param dict experiment_env: Environment variables in a JSON:param str model_type: defines the type of model that is being generated by the experiment. Model type must be one of Tensorflow, ONNX, or Custom:param str model_path: Model path:param str worker_container_user: Worker container user:param str worker_registry_username: Registry username for accessing private docker registry container if nessesary:param str worker_registry_password: Registry password for accessing private docker registry container if nessesary:param str worker_registry_url: Registry server URL for accessing private docker registry container if nessesary:param str parameter_server_container_user: Parameter server container user:param str parameter_server_registry_username: Registry username for accessing private docker registry container if nessesary:param str parameter_server_registry_password: Registry password for accessing private docker registry container if nessesary:param str parameter_server_registry_url: Registry server URL for accessing private docker registry container if nessesary:returns: experiment handle:rtype: str
##Create a dictionary of parameters for running a distributed/multinode experimentenv = {"EPOCHS_EVAL":5,"TRAIN_EPOCHS":10,"MAX_STEPS":1000,"EVAL_SECS":10}multi_node_parameters = {"name": "multi_node_example","project_id": project_id,"experiment_type_id": 2,"worker_container": "tensorflow/tensorflow:1.13.1-gpu-py3","worker_machine_type": "K80","worker_command": "pip install -r requirements.txt && python mnist.py","experiment_env": env,"worker_count": 2,"parameter_server_container": "tensorflow/tensorflow:1.13.1-gpu-py3","parameter_server_machine_type": "K80","parameter_server_command": "python mnist.py","parameter_server_count": 1,"workspace_url": "https://github.com/Paperspace/mnist-sample.git","model_type": "Tensorflow"}
Here we create (but not start) a single node experiment by unpacking the dictionary of parameters as function input.
experiment_id = experiment_client.create_multi_node(**multi_node_parameters)
Now we can start the experiment
experiment_client.start(experiment_id)
We could also immediately run the experiment
experiment_id = experiment_client.run_multi_node(**multi_node_parameters)
We can then stop the experiment or wait for it to complete
experiment_client.stop(experiment_id)
The SDK enables getting logs from running or completed experiments
Help on method logs in module gradient.api_sdk.clients.experiment_client:logs(experiment_id, line=0, limit=10000) method of gradient.api_sdk.clients.experiment_client.ExperimentsClient instanceShow list of latest logs from the specified experiment.*EXAMPLE*::gradient experiments logs --experimentId:param str experiment_id: Experiment ID:param int line: line number at which logs starts to display on screen:param int limit: maximum lines displayed on screen, default set to 10 000
logs = experiment_client.get_logs(experiment_id)print(logs)
The output is a array of LogRows each containing the line, the message, and the timestamp
LogRow(line=1, message='Requirement already satisfied: numpy>=1.15.4 in/usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1))(1.16.3)', timestamp='2019-08-06T17:21:18.528Z')
log_stream = experiment_client.yield_logs(experiment_id)print("Streaming logs of experiment")try:while True:print(log_stream.send(None))except:print("done streaming logs")
Streaming logs of experimentLogRow(line=1, message='Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1)) (1.16.3)', timestamp='2019-08-06T17:21:18.528Z')LogRow(line=1, message='Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1)) (1.16.3)', timestamp='2019-08-06T17:20:30.842Z')LogRow(line=1, message='Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.5/dist-packages (from -r requirements.txt (line 1)) (1.16.3)', timestamp='2019-08-06T17:21:14.121Z')print("done streaming logs")
If you are ever stuck, you can call help on any ExperimentsClient object or function in python
help(ProjectsClient.list)