Package refgenconf
Documentation
Class RefGenConf
A sort of oracle of available reference genome assembly assets
def __init__(self, filepath=None, entries=None, writable=False, wait_max=60, skip_read_lock=False, genome_exact=False, schema_source=None)
Create the config instance by with a filepath or key-value pairs.
Parameters:
filepath
(str
): a path to the YAML file to readentries
(Iterable[(str, object)] | Mapping[str, object]
): config filepath or collection of key-value pairswritable
(bool
): whether to create the object with write capabilitieswait_max
(int
): how long to wait for creating an object when thefile that data will be read from is lockedskip_read_lock
(bool
): whether the file should not be locked forreading when object is created in read only mode
Raises:
refgenconf.MissingConfigDataError
: if a required configurationitem is missingValueError
: if entries is given as a string and is not a file
def add(self, path, genome, asset, tag=None, seek_keys=None, force=False)
Add an external asset to the config
Parameters:
path
(str
): a path to the asset to add; must exist and be relativeto the genome_foldergenome
(str
): genome nameasset
(str
): asset nametag
(str
): tag nameseek_keys
(dict
): seek keys to addforce
(bool
): whether to force existing asset overwrite
def alias_dir(self)
Path to the genome alias directory
Returns:
str
: path to the directory where the assets are stored
def assets_str(self, offset_text=' ', asset_sep=', ', genome_assets_delim='/ ', genome=None, order=None)
Create a block of text representing genome-to-asset mapping.
Parameters:
offset_text
(str
): text that begins each line of the textrepresentation that's producedasset_sep
(str
): the delimiter between names of types of assets,within each genome linegenome_assets_delim
(str
): the delimiter to place betweenreference genome assembly name and its list of asset namesgenome
(list[str] | str
): genomes that the assets should be found fororder
(function(str) -> object
): how to key genome IDs and assetnames for sort
Returns:
str
: text representing genome-to-asset mapping
def cfg_remove_assets(self, genome, asset, tag=None, relationships=True)
Remove data associated with a specified genome:asset:tag combination. If no tags are specified, the entire asset is removed from the genome.
If no more tags are defined for the selected genome:asset after tag removal, the parent asset will be removed as well If no more assets are defined for the selected genome after asset removal, the parent genome will be removed as well
Parameters:
genome
(str
): genome to be removedasset
(str
): asset package to be removedtag
(str
): tag to be removedrelationships
(bool
): whether the asset being removed shouldbe removed from its relatives as well
Returns:
RefGenConf
: updated object
Raises:
TypeError
: if genome argument type is not a list or str
def cfg_tag_asset(self, genome, asset, tag, new_tag, force=False)
Retags the asset selected by the tag with the new_tag. Prompts if default already exists and overrides upon confirmation.
This method does not override the original asset entry in the RefGenConf object. It creates its copy and tags it with the new_tag. Additionally, if the retagged asset has any children their parent will be retagged as new_tag that was introduced upon this method execution.
Parameters:
genome
(str
): name of a reference genome assembly of interestasset
(str
): name of particular asset of interesttag
(str
): name of the tag that identifies the asset of interestnew_tag
(str
): name of particular the new tagforce
(bool
): force any actions that require approval
Returns:
bool
: a logical indicating whether the tagging was successful
Raises:
ValueError
: when the original tag is not specified
def chk_digest_update_child(self, genome, remote_asset_name, child_name, server_url)
Check local asset digest against the remote one and populate children of the asset with the provided asset:tag.
In case the local asset does not exist, the config is populated with the remote asset digest and children data
Parameters:
genome
(str
): name of the genome to check the asset digests forremote_asset_name
(str
): tagchild_name
(str
): name to be appended to the children of the parentserver_url
(str
): address of the server to query for the digests
Raises:
RefgenconfError
: if the local digest does not match its remote counterpart
def compare(self, genome1, genome2, explain=False)
Check genomes compatibility level. Compares Annotated Sequence Digests (ASDs) -- digested sequences and metadata
Parameters:
genome1
(str
): name of the first genome to comparegenome2
(str
): name of the first genome to compareexplain
(bool
): whether the returned code explanation shouldbe displayed
Returns:
int
: compatibility code
def data_dir(self)
Path to the genome data directory
Returns:
str
: path to the directory where the assets are stored
def file_path(self)
Path to the genome configuration file
Returns:
str
: path to the genome configuration file
def filepath(self, genome, asset, tag, ext='.tgz', dir=False)
Determine path to a particular asset for a particular genome.
Parameters:
genome
(str
): reference genome IDasset
(str
): asset nametag
(str
): tag nameext
(str
): file extensiondir
(bool
): whether to return the enclosing directory instead of the file
Returns:
str
: path to asset for given genome and asset kind/name
def genome_aliases(self)
Mapping of human-readable genome identifiers to genome identifiers
Returns:
dict
: mapping of human-readable genome identifiers to genomeidentifiers
def genome_aliases_table(self)
Mapping of human-readable genome identifiers to genome identifiers
Returns:
dict
: mapping of human-readable genome identifiers to genomeidentifiers
def genomes_list(self, order=None)
Get a list of this configuration's reference genome assembly IDs.
Returns:
Iterable[str]
: list of this configuration's reference genomeassembly IDs
def genomes_str(self, order=None)
Get as single string this configuration's reference genome assembly IDs.
Parameters:
order
(function(str) -> object
): how to key genome IDs for sort
Returns:
str
: single string that lists this configuration's knownreference genome assembly IDs
def get_asds_path(self, genome)
Get path to the Annotated Sequence Digests JSON file for a given genome. Note that the path and/or genome may not exist.
Parameters:
genome
(str
): genome name
Returns:
str
: ASDs path
def get_asset_table(self, genomes=None, server_url=None, get_json_url=<function RefGenConf.<lambda> at 0x7faab748b060>)
Get a rich.Table object representing assets available locally
Parameters:
genomes
(list[str]
): genomes to restrict the results withserver_url
(str
): server URL to query for the remote genome dataget_json_url
(function(str, str) -> str
): how to build URL fromgenome server URL base, genome, and asset
Returns:
rich.table.Table
: table of assets available locally
def get_default_tag(self, genome, asset, use_existing=True)
Determine the asset tag to use as default. The one indicated by the 'default_tag' key in the asset section is returned. If no 'default_tag' key is found, by default the first listed tag is returned with a RuntimeWarning. This behavior can be turned off with use_existing=False
Parameters:
genome
(str
): name of a reference genome assembly of interestasset
(str
): name of the particular asset of interestuse_existing
(bool
): whether the first tag in the config should bereturned in case there is no default tag defined for an asset
Returns:
str
: name of the tag to use as the default one
def get_genome_alias(self, digest, fallback=False, all_aliases=False)
Get the human readable alias for a genome digest
Parameters:
digest
(str
): digest to find human-readable alias forfallback
(bool
): whether to return the query digest in caseof failureall_aliases
(bool
): whether to return all aliases instead of justthe first one
Returns:
str | list[str]
: human-readable aliases
Raises:
GenomeConfigFormatError
: if "genome_digests" section doesnot exist in the configUndefinedAliasError
: if a no alias has been defined for therequested digest
def get_genome_alias_digest(self, alias, fallback=False)
Get the human readable alias for a genome digest
Parameters:
alias
(str
): alias to find digest forfallback
(bool
): whether to return the query alias in caseof failure and in case it is one of the digests
Returns:
str
: genome digest
Raises:
UndefinedAliasError
: if the specified alias has been assigned toany digests
def get_genome_attributes(self, genome)
Get the dictionary attributes, like checksum, contents, description. Does not return the assets.
Parameters:
genome
(str
): genome to get the attributes dict for
Returns:
Mapping[str, str]
: available genome attributes
def get_local_data_str(self, genome=None, order=None)
List locally available reference genome IDs and assets by ID.
Parameters:
genome
(list[str] | str
): genomes that the assets should be found fororder
(function(str) -> object
): how to key genome IDs and assetnames for sort
Returns:
str, str
: text reps of locally available genomes and assets
def get_remote_data_str(self, genome=None, order=None, get_url=<function RefGenConf.<lambda> at 0x7faab748bec0>)
List genomes and assets available remotely.
Parameters:
get_url
(function(serverUrl, operationId) -> str
): how to determineURL request, given server URL and endpoint operationIDgenome
(list[str] | str
): genomes that the assets should be found fororder
(function(str) -> object
): how to key genome IDs and assetnames for sort
Returns:
str, str
: text reps of remotely available genomes and assets
def get_symlink_paths(self, genome, asset=None, tag=None, all_aliases=False)
Get path to the alias directory for the selected genome-asset-tag
Parameters:
genome
(str
): reference genome IDasset
(str
): asset nametag
(str
): tag nameall_aliases
(bool
): whether to return a collection of symboliclinks or just the first one from the alias list
Returns:
dict
:
def getseq(self, genome, locus, as_str=False)
Return the sequence found in a selected range and chromosome. Something like the refget protocol.
Parameters:
genome
(str
): name of the sequence identifierlocus
(str
): 1-10'as_str
(bool
): whether to convert the resurned object to stringand return just the sequence
Returns:
str | pyfaidx.FastaRecord | pyfaidx.Sequence
: selected sequence
def id(self, genome, asset, tag=None)
Returns the digest for the specified asset. The defined default tag will be used if not provided as an argument
Parameters:
genome
(str
): genome identifierasset
(str
): asset identifiertag
(str
): tag identifier
Returns:
str
: asset digest for the tag
def initialize_config_file(self, filepath=None)
Initialize genome configuration file on disk
Parameters:
filepath
(str
): a valid path where the configuration file should be initialized
Returns:
str
: the filepath the file was initialized at
Raises:
OSError
: in case the file could not be initialized due to insufficient permissions or pre-existenceTypeError
: if no valid filepath cat be determined
def initialize_genome(self, fasta_path, alias, fasta_unzipped=False, skip_alias_write=False)
Initialize a genome
Create a JSON file with Annotated Sequence Digests (ASDs) for the FASTA file in the genome directory.
Parameters:
fasta_path
(str
): path to a FASTA file to initialize genome withalias
(str
): alias to set for the genomeskip_alias_write
(bool
): whether to skip writing the alias to the file
Returns:
str, list[dict[]]
: human-readable name for the genome
def is_asset_complete(self, genome, asset, tag)
Check whether all required tag attributes are defined in the RefGenConf object. This is the way we determine tag completeness.
Parameters:
genome
(str
): genome to be checkedasset
(str
): asset package to be checkedtag
(str
): tag to be checked
Returns:
bool
: the decision
def list(self, genome=None, order=None, include_tags=False)
List local assets; map each namespace to a list of available asset names
Parameters:
order
(callable(str) -> object
): how to key genome IDs for sortgenome
(list[str] | str
): genomes that the assets should be found forinclude_tags
(bool
): whether asset tags should be included in the returned dict
Returns:
Mapping[str, Iterable[str]]
: mapping from assembly name tocollection of available asset names.
def list_assets_by_genome(self, genome=None, order=None, include_tags=False)
List types/names of assets that are available for one--or all--genomes.
Parameters:
genome
(str | NoneType
): reference genome assembly ID, optional;if omitted, the full mapping from genome to asset namesorder
(function(str) -> object
): how to key genome IDs and assetnames for sortinclude_tags
(bool
): whether asset tags should be included in thereturned dict
Returns:
Iterable[str] | Mapping[str, Iterable[str]]
: collection ofasset type names available for particular reference assembly if one is provided, else the full mapping between assembly ID and collection available asset type names
def list_genomes_by_asset(self, asset=None, order=None)
List assemblies for which a particular asset is available.
Parameters:
asset
(str | NoneType
): name of type of asset of interest, optionalorder
(function(str) -> object
): how to key genome IDs and assetnames for sort
Returns:
Iterable[str] | Mapping[str, Iterable[str]]
: collection ofassemblies for which the given asset is available; if asset argument is omitted, the full mapping from name of asset type to collection of assembly names for which the asset key is available will be returned.
def list_seek_keys_values(self, genomes=None, assets=None)
List values for all seek keys for the specified genome and asset. Leave the arguments out to get all seek keys values managed by refgenie.
Parameters:
genome_names
(str | List[str]
): optional list of genomes to includeasset_names
(str | List[str]
): optional list of assets to include
Returns:
dict
: a nested dictionary with the seek key values
def listr(self, genome=None, get_url=<function RefGenConf.<lambda> at 0x7faab7490040>, as_digests=False)
List genomes and assets available remotely on all servers the object subscribes to
Parameters:
get_url
(function(serverUrl, operationId) -> str
): how to determineURL request, given server URL and endpoint operationIDgenome
(list[str] | str
): genomes that the assets should be found fororder
(function(str) -> object
): how to key genome IDs and assetnames for sort
Returns:
dict[OrderedDict[list]]
: remotely available genomes and assetskeyed by genome keyed by source server endpoint
def plugins(self)
Plugins registered by entry points in the current Python env
Returns:
dict[dict[function(refgenconf.RefGenConf)]]
: dict which keysare names of all possible hooks and values are dicts mapping registered functions names to their values
def populate(self, glob)
Populates local refgenie references from refgenie://genome/asset.seek_key:tag registry paths
Parameters:
glob
(dict | str | list
): String which may contain refgenie registry paths asvalues; or a dict, for which values may contain refgenie registry paths. Dict include nested dicts.
Returns:
dict | str | list
: modified input dict with refgenie paths populated
def populater(self, glob, remote_class=None)
Populates remote refgenie references from refgenie://genome/asset:tag registry paths
Parameters:
glob
(dict | str | list
): String which may contain refgenie registry paths asvalues; or a dict, for which values may contain refgenie registry paths. Dict include nested dicts.remote_class
(str
): remote data provider class, e.g. 'http' or 's3'
Returns:
dict | str | list
: modified input dict with refgenie paths populated
def pull(self, genome, asset, tag, unpack=True, force=None, force_large=None, size_cutoff=10, get_json_url=<function RefGenConf.<lambda> at 0x7faab7490360>, build_signal_handler=<function _handle_sigint at 0x7faab7661f80>)
Download and possibly unpack one or more assets for a given ref gen.
Parameters:
genome
(str
): name of a reference genome assembly of interestasset
(str
): name of particular asset to fetchtag
(str
): name of particular tag to fetchunpack
(bool
): whether to unpack a tarballforce
(bool | NoneType
): how to handle case in which asset pathalready exists; null for prompt (on a per-asset basis), False to effectively auto-reply No to the prompt to replace existing file, and True to auto-replay Yes for existing asset replacement.force_large
(bool | NoneType
): how to handle case in large (> 5GB)asset is to be pulled; null for prompt (on a per-asset basis), False to effectively auto-reply No to the prompt, and True to auto-replay Yessize_cutoff
(float
): maximum archive file size to download withno promptget_json_url
(function(str, str) -> str
): how to build URL fromgenome server URL base, genome, and assetbuild_signal_handler
(function(str) -> function
): how to createa signal handler to use during the download; the single argument to this function factory is the download filepath
Returns:
(list[str], dict, str)
: a list of genome, asset, tag namesand a key-value pair with which genome config file should be updated if pull succeeds, else asset key and a null value
Raises:
refgenconf.UnboundEnvironmentVariablesError
: if genome folderpath contains any env. var. that's unboundrefgenconf.RefGenConfError
: if the object update is requested ina non-writable state
def remove(self, genome, asset, tag=None, relationships=True, files=True, force=False)
Remove data associated with a specified genome:asset:tag combination. If no tags are specified, the entire asset is removed from the genome.
If no more tags are defined for the selected genome:asset after tag removal, the parent asset will be removed as well If no more assets are defined for the selected genome after asset removal, the parent genome will be removed as well
Parameters:
genome
(str
): genome to be removedasset
(str
): asset package to be removedtag
(str
): tag to be removedrelationships
(bool
): whether the asset being removed shouldbe removed from its relatives as wellfiles
(bool
): whether the asset files from disk should be removedforce
(bool
): whether the removal prompts should be skipped
Returns:
RefGenConf
: updated object
Raises:
TypeError
: if genome argument type is not a list or str
def remove_asset_from_relatives(self, genome, asset, tag)
Remove any relationship links associated with the selected asset
Parameters:
genome
(str
): genome to be removed from its relatives' relatives listasset
(str
): asset to be removed from its relatives' relatives listtag
(str
): tag to be removed from its relatives' relatives list
def remove_genome_aliases(self, digest, aliases=None)
Remove alias for a specified genome digest. This method will remove the digest both from the genomes object and from the aliases mapping in tbe config
Parameters:
digest
(str
): genome digest to remove an alias foraliases
(list[str]
): a collection to aliases to remove for thegenome. If not provided, all aliases for the digest will be remove
Returns:
bool
: whether the removal has been performed
def run_plugins(self, hook)
Runs all installed plugins for the specified hook.
Parameters:
hook
(str
): hook identifier
def seek(self, genome_name, asset_name, tag_name=None, seek_key=None, strict_exists=None, enclosing_dir=False, all_aliases=False, check_exist=<function RefGenConf.<lambda> at 0x7faab748b740>)
Seek path to a specified genome-asset-tag alias
Parameters:
genome_name
(str
): name of a reference genome assembly of interestasset_name
(str
): name of the particular asset to fetchtag_name
(str
): name of the particular asset tag to fetchseek_key
(str
): name of the particular subasset to fetchstrict_exists
(bool | NoneType
): how to handle case in whichpath doesn't exist; True to raise IOError, False to raise RuntimeWarning, and None to do nothing at all. Default: None (do not check).check_exist
(function(callable) -> bool
): how to check forasset/path existenceenclosing_dir
(bool
): whether a path to the entire enclosingdirectory should be returned, e.g. for a fasta asset that has 3 seek_keys pointing to 3 files in an asset dir, that asset dir is returnedall_aliases
(bool
): whether to return paths to all asset aliases orjust the one for the specified 'genome_name` argument
Returns:
str
: path to the asset
Raises:
TypeError
: if the existence check is not a one-arg functionrefgenconf.MissingGenomeError
: if the named assembly isn't knownto this configuration instancerefgenconf.MissingAssetError
: if the names assembly is known tothis configuration instance, but the requested asset is unknown
def seek_src(self, genome_name, asset_name, tag_name=None, seek_key=None, strict_exists=None, enclosing_dir=False, check_exist=<function RefGenConf.<lambda> at 0x7faab748b9c0>)
Seek path to a specified genome-asset-tag
Parameters:
genome_name
(str
): name of a reference genome assembly of interestasset_name
(str
): name of the particular asset to fetchtag_name
(str
): name of the particular asset tag to fetchseek_key
(str
): name of the particular subasset to fetchstrict_exists
(bool | NoneType
): how to handle case in whichpath doesn't exist; True to raise IOError, False to raise RuntimeWarning, and None to do nothing at all. Default: None (do not check).check_exist
(function(callable) -> bool
): how to check forasset/path existenceenclosing_dir
(bool
): whether a path to the entire enclosingdirectory should be returned, e.g. for a fasta asset that has 3 seek_keys pointing to 3 files in an asset dir, that asset dir is returned
Returns:
str
: path to the asset
Raises:
TypeError
: if the existence check is not a one-arg functionrefgenconf.MissingGenomeError
: if the named assembly isn't knownto this configuration instancerefgenconf.MissingAssetError
: if the names assembly is known tothis configuration instance, but the requested asset is unknown
def seekr(self, genome_name, asset_name, tag_name=None, seek_key=None, remote_class='http', get_url=<function RefGenConf.<lambda> at 0x7faab748b880>)
Seek a remote path to a specified genome/asset.seek_key:tag
Parameters:
genome_name
(str
): name of a reference genome assembly of interestasset_name
(str
): name of the particular asset to fetchtag_name
(str
): name of the particular asset tag to fetchseek_key
(str
): name of the particular subasset to fetchremote_class
(str
): remote data provider class, e.g. 'http' or 's3'get_url
(function(serverUrl, operationId) -> str
): how to determineURL request, given server URL and endpoint operationID
Returns:
str
: path to the asset
def set_default_pointer(self, genome, asset, tag, force_exists=False, force_digest=None, force_fasta=False)
Point to the selected tag by default
Parameters:
genome
(str
): name of a reference genome assembly of interestasset
(str
): name of the particular asset of interesttag
(str
): name of the particular asset tag to point to by defaultforce_digest
(str
): digest to force update of. The alias willnot be converted to the digest, even if provided.force_fasta
(bool
): whether setting a default tag for a fasta assetshould be forced. Beware: This could lead to genome identity issuesforce_exists
(bool
): whether the default tag change should beforced (even if it exists)
def set_genome_alias(self, genome, digest=None, servers=None, overwrite=False, reset_digest=False, create_genome=False, no_write=False, get_json_url=<function RefGenConf.<lambda> at 0x7faab7490680>)
Assign a human-readable alias to a genome identifier.
Genomes are identified by a unique identifier which is derived from the FASTA file (part of fasta asset). This way we can ensure genome provenance and compatibility with the server. This function maps a human-readable identifier to make referring to the genomes easier.
Parameters:
genome
(str
): name of the genome to assign to an identifierdigest
(str
): identifier to useoverwrite
(bool
): whether all the previously set aliases should beremoved and just the current one storedno_write
(bool
): whether to skip writing the alias to the file
Returns:
bool
: whether the alias has been established
def subscribe(self, urls, reset=False, no_write=False)
Add URLs the list of genome_servers.
Use reset argument to overwrite the current list. Otherwise the current one will be appended to.
Parameters:
urls
(list[str] | str
): urls to update the genome_servers list withreset
(bool
): whether the current list should be overwritten
def tag(self, genome, asset, tag, new_tag, files=True, force=False)
Retags the asset selected by the tag with the new_tag. Prompts if default already exists and overrides upon confirmation.
This method does not override the original asset entry in the RefGenConf object. It creates its copy and tags it with the new_tag. Additionally, if the retagged asset has any children their parent will be retagged as new_tag that was introduced upon this method execution. By default, the files on disk will be also renamed to reflect the genome configuration file changes
Parameters:
genome
(str
): name of a reference genome assembly of interestasset
(str
): name of particular asset of interesttag
(str
): name of the tag that identifies the asset of interestnew_tag
(str
): name of particular the new tagfiles
(bool
): whether the asset files on disk should be renamed
Returns:
bool
: a logical indicating whether the tagging was successful
Raises:
ValueError
: when the original tag is not specified
def unsubscribe(self, urls, no_write=False)
Remove URLs the list of genome_servers.
Parameters:
urls
(list[str] | str
): urls to update the genome_servers list with
def update_assets(self, genome, asset=None, data=None, force_digest=None)
Updates the genomes in RefGenConf object at any level. If a requested genome-asset mapping is missing, it will be created
Parameters:
genome
(str
): genome to be added/updatedasset
(str
): asset to be added/updatedforce_digest
(str
): digest to force update of. The alias willnot be converted to the digest, even if provided.data
(Mapping
): data to be added/updated
Returns:
RefGenConf
: updated object
def update_genomes(self, genome, data=None, force_digest=None)
Updates the genomes in RefGenConf object at any level. If a requested genome is missing, it will be added
Parameters:
genome
(str
): genome to be added/updatedforce_digest
(str
): digest to force update of. The alias willnot be converted to the digest, even if provided.data
(Mapping
): data to be added/updated
Returns:
RefGenConf
: updated object
def update_relatives_assets(self, genome, asset, tag=None, data=None, children=False)
A convenience method which wraps the update assets and uses it to update the asset relatives of an asset.
Parameters:
genome
(str
): genome to be added/updatedasset
(str
): asset to be added/updatedtag
(str
): tag to be added/updateddata
(list
): asset parents or children to be added/updatedchildren
(bool
): a logical indicating whether the relationship to beadded is 'children'
Returns:
RefGenConf
: updated object
def update_seek_keys(self, genome, asset, tag=None, keys=None, force_digest=None)
A convenience method which wraps the updated assets and uses it to update the seek keys for a tagged asset.
Parameters:
genome
(str
): genome to be added/updatedasset
(str
): asset to be added/updatedtag
(str
): tag to be added/updatedforce_digest
(str
): digest to force update of. The alias willnot be converted to the digest, even if provided.keys
(Mapping
): seek_keys to be added/updated
Returns:
RefGenConf
: updated object
def update_tags(self, genome, asset=None, tag=None, data=None, force_digest=None)
Updates the genomes in RefGenConf object at any level. If a requested genome-asset-tag mapping is missing, it will be created
Parameters:
genome
(str
): genome to be added/updatedasset
(str
): asset to be added/updatedtag
(str
): tag to be added/updatedforce_digest
(str
): digest to force update of. The alias willnot be converted to the digest, even if provided.data
(Mapping
): data to be added/updated
Returns:
RefGenConf
: updated object
def writable(self)
Return writability flag or None if not set
Returns:
bool | None
: whether the object is writable now
def write(self, filepath=None)
Write the contents to a file. If pre- and post-update plugins are defined, they will be executed automatically
Parameters:
filepath
(str
): a file path to write to
Returns:
str
: the path to the created files
Raises:
OSError
: when the object has been created in a read only mode or otherprocess has locked the fileTypeError
: when the filepath cannot be determined.This takes place only if YacAttMap initialized with a Mapping as an input, not read from file.OSError
: when the write is called on an object with no write capabilitiesor when writing to a file that is locked by a different object
Class GenomeConfigFormatError
Exception for invalid genome config file format.
def __init__(self, msg)
Initialize self. See help(type(self)) for accurate signature.
Class MissingAssetError
Error type for request of an unavailable genome asset.
Class MissingConfigDataError
Missing required configuration instance items
Class MissingGenomeError
Error type for request of unknown genome/assembly.
Class RefgenconfError
Base exception type for this package
Class UnboundEnvironmentVariablesError
Use of environment variable that isn't bound to a value.
def select_genome_config(filename=None, conf_env_vars=['REFGENIE'], **kwargs)
Get path to genome configuration file.
Parameters:
filename
(str
): name/path of genome configuration fileconf_env_vars
(Iterable[str]
): names of environment variables toconsider; basically, a prioritized search list
Returns:
str
: path to genome configuration file
def get_dir_digest(path, pm=None)
Generate a MD5 digest that reflects just the contents of the files in the selected directory.
Parameters:
path
(str
): path to the directory to digestpm
(pypiper.PipelineManager
): a pipeline object, optional.The subprocess module will be used if not provided
Returns:
str
: a digest, e.g. a3c46f201a3ce7831d85cf4a125aa334
def looper_refgenie_populate(namespaces)
A looper plugin that populates refgenie references in a PEP from refgenie://genome/asset:tag registry paths. This can be used to convert all refgenie references into their local paths at the looper stage, so the final paths are passed to the workflow. This way the workflow does not need to depend on refgenie to resolve the paths. This is useful for example for CWL pipelines, which are built to have paths resolved outside the workflow.
The namespaces structure required to run the plugin is:
namespaces["pipeline"]["var_templates"]["refgenie_config"]
Parameters:
namespaces
(Mapping
): a nested variable namespaces dict
Returns:
dict
: sample namespace dict
Raises:
TypeError
: if the input namespaces is not a mappingKeyError
: if the namespaces mapping does not include 'pipeline'NotImplementedError
: if 'var_templates' key is missing in the 'pipeline' namespace or'refgenie_config' is missing in 'var_templates' section.
Version Information: refgenconf
v0.12.2, generated by lucidoc
v0.4.4