Workflow Spec
This describes in more detail the main components of a Gradient Workflow, as seen in the YAML file.

Key Concepts

defaults

At the top of the YAML Workflow file, you can specify default parameters to be used throughout the entire Workflow. This includes environment variables and default machine instance configuration. Instances can also be specified per-job.

inputs

The inputs block allows you to specify named inputs (e.g., a versioned dataset) to be referenced and consumed by your jobs.
Note: you can also collect inputs in a separate YAML and reference this file as an inputPath when creating a Workflow run.
Workflow and job-level inputs can be of type: dataset (a persistent, versioned collection of data), string (e.g., a generated value or ID that may be output from another job) or volume (a temporary workspace mounted onto a job's container).
Note: datasets must be defined in advance of being referenced in a workflow. See Create Datasets for the Workflow for more information.

jobs

Jobs are also sometimes referred to as "steps" within the Gradient Workflow. A job is an individual task that executes code (such as a training a machine learning model) and can consume inputs and produce outputs.

Sample Workflow Spec

To run this Workflow, define datasets named test-one, test-two, and test-three as described in the Create Datasets for the Workflow documentation. Also, to make use of the secret named hello in the inputs section, define a secret as described here.
1
defaults:
2
# clusterId defaults to the NY2 public cluster, setting this parameter this is equaivalent to using the `--clusterId` flag on the command line.
3
# This parameter often used for github triggered workflows running on private clusters.
4
clusterId: clusterId
5
# Default environment variables for all jobs. Can use any supported
6
# substitution syntax (named secrets, ephemeral secrets, etc.).
7
env:
8
# This environment variable uses a Gradient secret called "hello".
9
HELLO: secret:hello
10
# Default instance type for all jobs
11
resources:
12
instance-type: P4000
13
container-registries: # optional
14
- my-registry
15
16
# Workflow takes two inputs, neither of which have defaults. This means that
17
# when the Workflow is run the corresponding input for these values are
18
# required, for example:
19
#
20
# {"inputs": {"data": {"id": "test-one"}, "echo": {"value": "hello world"}}}
21
#
22
inputs:
23
data:
24
type: dataset
25
with:
26
ref: test-one
27
echo:
28
type: string
29
with:
30
value: "hello world"
31
jobs:
32
job-1:
33
# These are inputs for the "job-1" job; they are "aliases" to the
34
# Workflow inputs.
35
#
36
# All inputs are placed in the "/inputs/<name>" path of the run
37
# containers. So for this job we would have the paths "/inputs/data"
38
# and "/inputs/echo".
39
inputs:
40
# The "/inputs/data" directory would contain the contents for the dataset
41
# version. ID here refers to the name of the dataset, not its dataset ID.
42
data: workflow.inputs.data
43
# The "/inputs/echo" file would contain the string of the Workflow input
44
# "echo".
45
echo: workflow.inputs.echo
46
# These are outputs for the "job-1" job.
47
#
48
# All outputs are read from the "/outputs/<name>" path.
49
outputs:
50
# A directory will automatically be created for output datasets and
51
# any content written to that directory will be committed to a newly
52
# created dataset version when the jobs completes.
53
data2:
54
type: dataset
55
with:
56
id: test-two
57
# The container is responsible creating the file "/outputs/<name>" with the
58
# content being a small-ish utf-8 encoded string.
59
echo2:
60
type: string
61
# Set job-specific environment variables
62
env:
63
TSTVAR: test
64
# Set action
66
# Set action arguments
67
with:
68
args:
69
- bash
70
- -c
71
- find /inputs/data > /outputs/data2/list.txt; echo ENV $HELLO $TSTVAR > /outputs/echo2; cat /inputs/echo; echo; cat /outputs/data2/list.txt /outputs/echo2
72
image: bash:5
73
job-2:
74
inputs:
75
# These inputs use job-1 outputs instead of Workflow inputs. You must
76
# specify job-1 in the needs section to reference them here.
77
data2: job-1.outputs.data2
78
echo2: job-1.outputs.echo2
79
outputs:
80
data3:
81
type: dataset
82
with:
83
ref: test-three
84
# List of job IDs that must complete before this job runs
85
needs:
86
- job-1
88
with:
89
args:
90
- bash
91
- -c
92
- wc -l /inputs/data2/list.txt > /outputs/data3/summary.txt; cat /outputs/data3/summary.txt /inputs/echo2
93
image: bash:5
Copied!
Last modified 1mo ago