Recipe specification
Introduction
Overview
Let's start with a brief description of a recipe and it's components. Additionally, we'll look at an example recipe definition that will be used throughout this document.
Overview of recipe components
A refgenie recipe is a yaml
file that may consist of the following keys:
name
: (REQUIRED) - A string identifying the recipe.version
: (REQUIRED) - A string identifying the version of the recipe.description
: (REQUIRED) - A string describing the recipe, which may include a brief description of the outputs of the recipe, software used to generate the outputs, and other relevant information.inputs
: (REQUIRED) - A nested dictionary of input parameters for the recipe, which includes 3 keys:files
,assets
andparams
.command_template_list
: (REQUIRED) - A list of strings, each of which is a Jinja2 command template for a particular command. This basically means each string is a shell command with allowed variables encoded like{variable}
.default_tag
: (REQUIRED) - A string, which is the default tag to use for the asset produced by the recipe. The string can be a Jinja2 template, which will be evaluated to determine the tag.container
: (RECOMMENDED) - a container registry path, which can be used to run the recipe.custom_properties
: (OPTIONAL) - A dictionary of custom properties of the asset produced by the recipe.test
: (RECOMMENDED) - A dictionary of test settings.
Example recipe
name: my_asset1
version: 0.0.1
output_asset_class: my_asset
description: This asset consists of FASTA, JSON and HTML files.
inputs:
files:
html:
description: A HTML file to include
params:
cores:
description: Number of cores to use
default: 1
assets:
fasta:
asset_class: fasta
description: gzipped fasta file
default: fasta
container: docker.io/databio/refgenie
command_template_list:
- cp {{files.html}} {{asset_outfolder}}/{{genome}}.html
- cp {{assets.fasta}} {{asset_outfolder}}/{{genome}}.fa.gz
- jsonMaker --cores {{params.cores}} --input-file {{assets.fasta}}
custom_properties:
version: jsonMaker --version
settings: jsonMaker --settings
default_tag: "{{custom_properties.version}}"
This GitHub repository contains numerous recipe examples: refgenie/recipes
Details of the recipe components
Let's describe the components of the recipe specification in more detail, i.e. their internal structure and the impact they have on the asset build process.
name
The recipe name is an arbitrary string that identifies the recipe, therefore it should be unique among your recipes. Refgenie tries to match the name
attribute to the requested asset build, when no recipe is explicitly specified.
version
The recipe version is an arbitrary string that identifies the version of the recipe. It is used to distinguish between different versions of an evolving recipe.
description
The recipe description is a freeform text that describes the recipe, which may include a brief description of the outputs of the recipe, software used to generate the outputs, and other relevant information. Refgenie attaches a description
attribute to the asset that is created by the recipe.
container
The container registry path, which can be used to run the recipe.
inputs
Inputs are the parameters that are used to generate the outputs of the recipe. The inputs are specified as a nested dictionary of input parameters, which include: files
, assets
and params
. Let's take a look at each of them.
assets
This subsection is used to specify the assets that are used by the recipe. Refgnie will make sure the assets specified in this section are already managed by refgenie before running the recipe. Assets that require other assets as inputs are referred to as derived assets.
There are 3 required keys in the assets
subsection:
description
: A string briefly describing the required assetasset_class
: A string that identifies the asset class of the asset. Asset class mismatches will result in errors before the recipes are run.default
: A string that identifies the default asset to use for a build if none are specified.
files
This subsection is used to specify the files that are used by the recipe.
There is only one required key in the files
subsection:
description
: A string briefly describing the required file.
params
This subsection is used to specify the parameters that are used by the recipe. Technically, this a way to pass any text data to the recipe from the command line, like max number of cores to use, simple settings, etc.
There are 2 required keys in the params
subsection:
description
: A string briefly describing the required parameterdefault
: A string that sepecifies the default parameter to use for a build if none are specified.
command_template_list
The command template list is the most critical part of the recipe. It is a list of strings, each of which is a Jinja2 command template for a particular command. The command templates are evaluated in the order they are specified in the list.
Within the command_template_list
, you have access to variables from several sources. These variables are divided into namespaces depending on the variable source. You can access the values of these variables in the command template using the double-brace Jinja2 template language syntax: {{namespace.variable}}
. Below is a list of the namespaces available to the command templates:
- genome (
str
): The genome digest. - asset (
str
): The asset name. - params (
Dict[str, Dict[str, str]]
): The resolved input parameters. - files (
Dict[str, Dict[str, str]]
): The resolved input files. - assets (
Dict[str, Dict[str, str]]
): The resolved input assets. - customproperties (
Dict[str, Dict[str, Any]]
): The _resolved custom properties. - test (
Dict[str, Dict[str, str]]
): The recipe testing setup.
This is a dict-like representation of an example namespaces object available to the command templates:
namespaces = {
"genome": "6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905",
"asset": "my_asset1",
"assets": {
"fasta": "/path/to/genomes/fasta.fa.gz"
},
"params": {
"cores": "2"
},
"files": {
"html": "/path/to/my_asset1.html"
},
"custom_properties": {
"version": "0.0.1",
"settings": "a=1, b=2"
}
}
default_tag
This section is used to determine the string that the asset will be tagged with when it is built. The string can be a Jinja2 template, which will be evaluated to determine the tag. Users can override this value by specifying a tag when building the asset.
The default_tag
value can be a string or a Jinja2 template, that combines the values of other namespaces into a tag. It is reasonable to use {{custom_properties.version}}
as the default tag, as presented in the example recipe at the top of this page.
custom_properties
This section is an object where keys are the names of custom properties and values are shell commands that will be executed in the asset building requirement. This is a way to record any computing enviroment specific information, like the version of a software package used to produce the asset. Please note that the build namespaces are not available to the commands in this section.
test
This section is used to specify the test setup for the recipe. The test setup is a dictionary of test parameters, which include:
genome
: the genome digest to use for derived asset tests. Any input assets will be sourced from this genome namespace.inputs
: the input files to the recipes. The values must be URLs to remote files, so that the recipes are portable. The keys in this section must match the keys in theinputs.files
section or the recipe. Currentlyinputs.params
andinputs.assets
cannot be changed, default values from the top-levelinputs
section are used.outputs
: the outputs to test the results of running the recipe commands against. There are two ways the outputs can be tested: 1) by comparing the file checksum (checksum
), or 2) by comparing the value (value
). The keys in this section must match theseek_keys
in the asset class definition.
test:
genome: 6c5f19c9c2850e62cc3f89b04047fa05eee911662bd77905
inputs:
fasta: /path/to/genome/fasta.fa
outputs:
html:
md5sum: d41d8cd98f00b204e9800998ecf8427e
# OR
value: 100