Tagging assets
Why to tag assets?
It is natural in a research environment to use various flavors of the reference genome related resources that may result from different versions of the software used to create them. And this is what inspired the introduction of assets tagging concept in refgenie
.
Tag character whitelist
Tag can be any text or number composed of characters safe for Uniform Resource Identifiers (URIs) as per RFC3986, so it is well suited to contain software version information or even a concise description, like 0.4.1
or new_build_strategy
.
RFC3986 section 2.3. Unreserved Characters: characters that are allowed in a URI include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~"
How to tag assets?
Asset tagging is very flexible. You can tag assets when you build them, add or change tags to already built assets, or just not use tags at all if you don't need them.
Tagging when assets are built
Here we'll demonstrate how you can specify a tag when building an asset:
export REFGENIE="genome_config.yaml"
refgenie init -c $REFGENIE
refgenie pull hg38/fasta
refgenie build hg38/bowtie2_index:2.3.5.1
or
refgenie build hg38/bowtie2_index:2.3.3.1
Tagging already built/pulled assets (re-tagging)
If you already built an asset, you can add a tag to it. Here, we'll add a tag for most_recent
to our bowtie2 index asset:
refgenie tag hg38/bowtie2_index:2.3.5.1 --tag most_recent
Now you could retrieve this asset using either of those tags. In other words, assets can have more than 1 tag.
No tagging at all
Importantly, you don't have to care about tags at all if you don't need to because there is a default tag for every asset in your assets inventory. Building without specifying a tag will tag the asset as default
. If you don't specify a tag when trying to retrieve an asset path, it will assume you're looking for the default tagged asset.
refgenie build hg38/bwa_index
Default tags
If you pull or build the first asset of a given kind it will become the default one, which refgenie
will use for any actions when no tag is explicitly specified. For example the
refgenie seek hg38/bowtie2_index
call would return the path to the asset tagged with most_recent
since it was the first bowtie2_index
asset built/pulled for hg38
genome.
To retrieve the path to any other asset, you need to specify the tag:
refgenie seek hg38/bowtie2_index:2.3.3.1
Changing the default tag
If you want to make a tag the default one, use the -d
/--default
option in refgenie tag
command:
refgenie tag hg38/bowtie2_index:2.3.3.1 -d