Environment Variables
Information can be passed to and from Workflows, and between the jobs within them.
One form of this is our Workflow input/output, which can be datasets, volumes or strings.
Another is environment variables. These are used when we want to pass information in the Workflow YAML to some other computation being called, such as authenticating to a private GitHub repository, or running a Python .py script.

Why use environment variables and not just arguments to a script?

Some reasons are
  • Environment variables can hold other information like secrets, that are harder to pass securely otherwise.
  • They can be defined as applying to all Workflow jobs (under defaults:), or per-job, under a job's name.
  • Sets of arguments may be used several times, for example, the potentially large number of hyperparameters for training an ML model may want to be made explicit in a script. Then we may want to use them for training more than one model, varying some parameters but leaving the rest unchanged.
  • Using environment variables makes larger setups like this easier to handle than passing large argument lists more than once, helping ensure different model invocations get the same settings when they should be getting them.

Specifying environment variables in Workflows

An environment variable global to a script comes under env: in the defaults: field. An example that is commonly used is the secret to hold the user's API key, and the code block containing it, defining the environment variable HELLO might look like
1
defaults:
2
env:
3
HELLO: secret:hello
4
resources:
5
instance-type: P4000
Copied!
Common job-specific environment variables are information to be passed to a script, for example, the model hyperparameters from our Deep Learning Recommender tutorial appear in this code:
1
RecommenderTrain:
2
needs:
3
- CloneRecRepo
4
inputs:
5
repoRec: CloneRecRepo.outputs.repoRec
6
env:
7
HP_FINAL_EPOCHS: '50'
8
HP_FINAL_LR: '0.1'
9
outputs:
10
trainedRecommender:
11
type: dataset
12
with:
13
ref: recommender
15
with:
16
script: |-
17
cp -R /inputs/repoRec /Deep-Learning-Recommender-TF
18
cd /Deep-Learning-Recommender-TF
19
python workflow_train_model.py
20
image: tensorflow/tensorflow:2.4.1-jupyter
Copied!
Here, the number of epochs to train the final model, HP_FINAL_EPOCHS: '50', and the model's learning rate, HP_FINAL_LR: '0.1', are used by the workflow_train_model.py Python script that the job calls.

Using values of Workflow environment variables in scripts

To utilize the values of the Workflow environment variables in a script, the user parses them as part of their code. In the recommender case here we pass them to variables in the code:
1
hp_final_epochs = int(os.environ.get('HP_FINAL_EPOCHS'))
2
hp_final_lr = float(os.environ.get('HP_FINAL_LR'))
Copied!
Python's os.environ reads the values, and we need to cast them to the correct data type integer, float, etc.) for the model to understand them.

Advanced Usage

For more advanced situations, Python has libraries such as env_config. This overlaps somewhat with what Workflows can do, because it allows you to declare environment variables as well. But it has clearer handling of issues like data types and errors, and can handle variables that are lists (useful for hyperparameter tuning), or more complex structures.
It can read declared variables from a file, e.g., test.sh, from env_config's GitHub page:
1
#!/usr/bin/env bash
2
3
# comment is ignored
4
5
HIDDEN_VARIABLE="value not parsed"
6
export VISIBLE_VARIABLE_1="this value will be available"
7
8
function {
9
# if the line does not start with export it's ignored
10
}
11
12
# variables inside strings are not expanded. The value will contain the literal :code:`$OTHER_VARIABLE`.
13
export VARIABLE_CONTAINING_REFERENCE="$OTHER_VARIABLE"
14
15
read in by
16
17
from env_config import Config, parse_str
18
19
# uses the value of CONFIG_FILE as the file name to load variables from
20
config = Config(filename_variable='CONFIG_FILE', defer_raise=False)
21
# visible_variable_1 is declared in test and the current tag is test. variable1 will be loaded from test.sh
22
config.declare('visible_variable_1', parse_int(), ('test',), 'test'))
23
24
# visible_variable_2 is declared in the 'default' tag and not available in the config file.
25
# visible_variable_2 will be ignored because the current tag is 'test'
26
config.declare('visible_variable_1', parse_int(), ('default',), 'test')
Copied!
This could potentially be integrated with Workflows to similarly handle a set of Workflow environment variables.
Last modified 25d ago