Skip to content

RefGenConf Python API

Package Overview

The refgenconf package provides a Python interface for managing reference genome resources. It offers a centralized configuration object (RefGenConf) that handles local and remote genome assets, enabling seamless integration with bioinformatics pipelines.

Key Features

  • Asset Management: Download, build, and organize reference genome assets
  • Path Resolution: Retrieve paths to genome resources without hardcoding
  • Remote Servers: Connect to refgenie servers to pull pre-built assets
  • Aliases: Use human-readable genome names instead of digests
  • Seek Keys: Access specific sub-assets within larger asset packages

Installation

pip install refgenconf

Quick Example

import refgenconf

# Initialize with a genome configuration file
rgc = refgenconf.RefGenConf("genome_config.yaml")

# Get path to a genome asset
bowtie2_index = rgc.seek("hg38", "bowtie2_index")

API Reference

RefGenConf Class

The main class for interacting with refgenie-managed assets:

RefGenConf

RefGenConf(filepath=None, entries=None, writable=False, wait_max=60, skip_read_lock=False, genome_exact=False, schema_source=None)

Bases: YacAttMap

A sort of oracle of available reference genome assembly assets

Create the config instance by with a filepath or key-value pairs.

Parameters:

Name Type Description Default
filepath str

a path to the YAML file to read

None
entries

config filepath or collection of key-value pairs

None
writable bool

whether to create the object with write capabilities

False
wait_max int

how long to wait for creating an object when the file that data will be read from is locked

60
skip_read_lock bool

whether the file should not be locked for reading when object is created in read only mode

False

Raises:

Type Description
refgenconf.MissingConfigDataError

if a required configuration item is missing

ValueError

if entries is given as a string and is not a file

Source code in refgenconf/refgenconf.py
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
def __init__(
    self,
    filepath=None,
    entries=None,
    writable=False,
    wait_max=60,
    skip_read_lock=False,
    genome_exact=False,
    schema_source=None,
):
    """
    Create the config instance by with a filepath or key-value pairs.

    :param str filepath: a path to the YAML file to read
    :param Iterable[(str, object)] | Mapping[str, object] entries:
        config filepath or collection of key-value pairs
    :param bool writable: whether to create the object with write capabilities
    :param int wait_max: how long to wait for creating an object when the
        file that data will be read from is locked
    :param bool skip_read_lock: whether the file should not be locked for
        reading when object is created in read only mode
    :raise refgenconf.MissingConfigDataError: if a required configuration
        item is missing
    :raise ValueError: if entries is given as a string and is not a file
    """

    def _missing_key_msg(key, value):
        _LOGGER.debug("Config lacks '{}' key. Setting to: {}".format(key, value))

    super(RefGenConf, self).__init__(
        filepath=filepath,
        entries=entries,
        writable=writable,
        wait_max=wait_max,
        skip_read_lock=skip_read_lock,
        schema_source=schema_source or DEFAULT_CONFIG_SCHEMA,
        write_validate=True,
    )
    # assert correct config version
    try:
        version = self[CFG_VERSION_KEY]
    except KeyError:
        _missing_key_msg(CFG_VERSION_KEY, REQ_CFG_VERSION)
        self[CFG_VERSION_KEY] = REQ_CFG_VERSION
    else:
        try:
            version = float(version)
        except ValueError:
            _LOGGER.warning(
                "Cannot parse config version as numeric: {}".format(version)
            )
        else:
            if version < REQ_CFG_VERSION:
                msg = (
                    "This genome config (v{}) is not compliant with v{} standards. \n"
                    "To use current refgenconf, please use upgrade_config function to upgrade, or"
                    "downgrade refgenconf: 'pip install \"refgenconf>={},<{}\"'. \n"
                    "If refgenie is installed, you can use 'refgenie upgrade --target-version {}'".format(
                        self[CFG_VERSION_KEY],
                        str(REQ_CFG_VERSION),
                        REFGENIE_BY_CFG[str(version)],
                        REFGENIE_BY_CFG[str(REQ_CFG_VERSION)],
                        str(REQ_CFG_VERSION),
                    )
                )
                raise ConfigNotCompliantError(msg)

            else:
                _LOGGER.debug("Config version is compliant: {}".format(version))

    # initialize "genomes_folder"
    if CFG_FOLDER_KEY not in self:
        self[CFG_FOLDER_KEY] = (
            os.path.dirname(filepath) if filepath else os.getcwd()
        )
        _missing_key_msg(CFG_FOLDER_KEY, self[CFG_FOLDER_KEY])
    # initialize "genome_servers"
    if CFG_SERVERS_KEY not in self and CFG_SERVER_KEY in self:
        # backwards compatibility after server config key change
        self[CFG_SERVERS_KEY] = self[CFG_SERVER_KEY]
        del self[CFG_SERVER_KEY]
        _LOGGER.debug(
            f"Moved servers list from '{CFG_SERVER_KEY}' to '{CFG_SERVERS_KEY}'"
        )
    try:
        if isinstance(self[CFG_SERVERS_KEY], list):
            tmp_list = [
                server_url.rstrip("/") for server_url in self[CFG_SERVERS_KEY]
            ]
            self[CFG_SERVERS_KEY] = tmp_list
        else:  # Logic in pull_asset expects a list, even for a single server
            self[CFG_SERVERS_KEY] = self[CFG_SERVERS_KEY].rstrip("/")
            self[CFG_SERVERS_KEY] = [self[CFG_SERVERS_KEY]]
    except KeyError:
        _missing_key_msg(CFG_SERVERS_KEY, str([DEFAULT_SERVER]))
        self[CFG_SERVERS_KEY] = [DEFAULT_SERVER]

    # initialize "genomes" mapping
    if CFG_GENOMES_KEY in self:
        if not isinstance(self[CFG_GENOMES_KEY], PXAM):
            if self[CFG_GENOMES_KEY]:
                _LOGGER.warning(
                    "'{k}' value is a {t_old}, not a {t_new}; setting to empty {t_new}".format(
                        k=CFG_GENOMES_KEY,
                        t_old=type(self[CFG_GENOMES_KEY]).__name__,
                        t_new=PXAM.__name__,
                    )
                )
            self[CFG_GENOMES_KEY] = PXAM()
    else:
        self[CFG_GENOMES_KEY] = PXAM()

    self[CFG_GENOMES_KEY] = yacman.AliasedYacAttMap(
        entries=self[CFG_GENOMES_KEY],
        aliases=lambda x: {k: v.__getitem__(CFG_ALIASES_KEY) for k, v in x.items()},
        aliases_strict=True,
        exact=genome_exact,
    )

alias_dir property

alias_dir

Path to the genome alias directory

Returns:

Type Description

path to the directory where the assets are stored

data_dir property

data_dir

Path to the genome data directory

Returns:

Type Description

path to the directory where the assets are stored

file_path property

file_path

Path to the genome configuration file

Returns:

Type Description

path to the genome configuration file

genome_aliases property

genome_aliases

Mapping of human-readable genome identifiers to genome identifiers

Returns:

Type Description

mapping of human-readable genome identifiers to genome identifiers

genome_aliases_table property

genome_aliases_table

Mapping of human-readable genome identifiers to genome identifiers

Returns:

Type Description

mapping of human-readable genome identifiers to genome identifiers

plugins property

plugins

Plugins registered by entry points in the current Python env

Returns:

Type Description

dict which keys are names of all possible hooks and values are dicts mapping registered functions names to their values

add

add(path, genome, asset, tag=None, seek_keys=None, force=False)

Add an external asset to the config

Parameters:

Name Type Description Default
path str

a path to the asset to add; must exist and be relative to the genome_folder

required
genome str

genome name

required
asset str

asset name

required
tag str

tag name

None
seek_keys dict

seek keys to add

None
force bool

whether to force existing asset overwrite

False
Source code in refgenconf/refgenconf.py
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
def add(self, path, genome, asset, tag=None, seek_keys=None, force=False):
    """
    Add an external asset to the config

    :param str path: a path to the asset to add; must exist and be relative
        to the genome_folder
    :param str genome: genome name
    :param str asset: asset name
    :param str tag: tag name
    :param dict seek_keys: seek keys to add
    :param bool force: whether to force existing asset overwrite
    """
    try:
        genome = self.get_genome_alias_digest(alias=genome, fallback=True)
    except yacman.UndefinedAliasError:
        _LOGGER.error(
            "No digest defined for '{}'. Set an alias or pull an"
            " asset to initialize.".format(genome)
        )
        return False
    tag = tag or self.get_default_tag(genome, asset)
    abspath = os.path.join(self[CFG_FOLDER_KEY], path)
    remove = False
    if not os.path.exists(abspath) or not os.path.isabs(abspath):
        raise OSError(
            "Provided path must exist and be relative to the"
            " genome_folder: {}".format(self[CFG_FOLDER_KEY])
        )
    try:
        _assert_gat_exists(self[CFG_GENOMES_KEY], genome, asset, tag)
    except Exception:
        pass
    else:
        if not force and not query_yes_no(
            "'{}/{}:{}' exists. Do you want to overwrite?".format(
                genome, asset, tag
            )
        ):
            _LOGGER.info("Aborted by a user, asset no added")
            return False
        remove = True
        _LOGGER.info("Will remove existing to overwrite")
    tag_data = {
        CFG_ASSET_PATH_KEY: path,
        CFG_ASSET_CHECKSUM_KEY: get_dir_digest(abspath) or "",
    }
    msg = "Added asset: {}/{}:{} {}".format(
        genome,
        asset,
        tag,
        "" if not seek_keys else "with seek keys: {}".format(seek_keys),
    )
    if not self.file_path:
        if remove:
            self.cfg_remove_assets(genome, asset, tag)
        self.update_tags(genome, asset, tag, tag_data)
        self.update_seek_keys(genome, asset, tag, seek_keys or {asset: "."})
        self.set_default_pointer(genome, asset, tag)
        _LOGGER.info(msg)
    else:
        with self as rgc:
            if remove:
                rgc.cfg_remove_assets(genome, asset, tag)
            rgc.update_tags(genome, asset, tag, tag_data)
            rgc.update_seek_keys(genome, asset, tag, seek_keys or {asset: "."})
            rgc.set_default_pointer(genome, asset, tag)
            _LOGGER.info(msg)
    self._symlink_alias(genome, asset, tag)
    return True

assets_str

assets_str(offset_text='  ', asset_sep=', ', genome_assets_delim='/ ', genome=None, order=None)

Create a block of text representing genome-to-asset mapping.

Parameters:

Name Type Description Default
offset_text str

text that begins each line of the text representation that's produced

' '
asset_sep str

the delimiter between names of types of assets, within each genome line

', '
genome_assets_delim str

the delimiter to place between reference genome assembly name and its list of asset names

'/ '
genome

genomes that the assets should be found for

None
order

how to key genome IDs and asset names for sort

None

Returns:

Type Description

text representing genome-to-asset mapping

Source code in refgenconf/refgenconf.py
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
def assets_str(
    self,
    offset_text="  ",
    asset_sep=", ",
    genome_assets_delim="/ ",
    genome=None,
    order=None,
):
    """
    Create a block of text representing genome-to-asset mapping.

    :param str offset_text: text that begins each line of the text
        representation that's produced
    :param str asset_sep: the delimiter between names of types of assets,
        within each genome line
    :param str genome_assets_delim: the delimiter to place between
        reference genome assembly name and its list of asset names
    :param list[str] | str genome: genomes that the assets should be found for
    :param function(str) -> object order: how to key genome IDs and asset
        names for sort
    :return str: text representing genome-to-asset mapping
    """
    refgens = self._select_genomes(genome=genome, order=order)
    make_line = partial(
        _make_genome_assets_line,
        offset_text=offset_text,
        genome_assets_delim=genome_assets_delim,
        asset_sep=asset_sep,
        order=order,
        rjust=max(map(len, refgens) or [0]) + 2,
    )
    return "\n".join(
        [make_line(g, self[CFG_GENOMES_KEY][g][CFG_ASSETS_KEY]) for g in refgens]
    )

cfg_remove_assets

cfg_remove_assets(genome, asset, tag=None, relationships=True)

Remove data associated with a specified genome:asset:tag combination. If no tags are specified, the entire asset is removed from the genome.

If no more tags are defined for the selected genome:asset after tag removal, the parent asset will be removed as well If no more assets are defined for the selected genome after asset removal, the parent genome will be removed as well

Parameters:

Name Type Description Default
genome str

genome to be removed

required
asset str

asset package to be removed

required
tag str

tag to be removed

None
relationships bool

whether the asset being removed should be removed from its relatives as well

True

Returns:

Type Description

updated object

Raises:

Type Description
TypeError

if genome argument type is not a list or str

Source code in refgenconf/refgenconf.py
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
def cfg_remove_assets(self, genome, asset, tag=None, relationships=True):
    """
    Remove data associated with a specified genome:asset:tag combination.
    If no tags are specified, the entire asset is removed from the genome.

    If no more tags are defined for the selected genome:asset after tag removal,
    the parent asset will be removed as well
    If no more assets are defined for the selected genome after asset removal,
    the parent genome will be removed as well

    :param str genome: genome to be removed
    :param str asset: asset package to be removed
    :param str tag: tag to be removed
    :param bool relationships: whether the asset being removed should
        be removed from its relatives as well
    :raise TypeError: if genome argument type is not a list or str
    :return RefGenConf: updated object
    """

    def _del_if_empty(obj, attr, alt=None):
        """
        Internal function for Mapping attribute deleting.
        Check if attribute exists and delete it if its length is zero.

        :param Mapping obj: an object to check
        :param str attr: Mapping attribute of interest
        :param list[Mapping, str] alt: a list of length 2 that indicates alternative
        Mapping-attribute combination to remove
        """
        if attr in obj and len(obj[attr]) == 0:
            if alt is None:
                del obj[attr]
            else:
                if alt[1] in alt[0]:
                    del alt[0][alt[1]]

    tag = tag or self.get_default_tag(genome, asset)
    if _check_insert_data(genome, str, "genome"):
        if _check_insert_data(asset, str, "asset"):
            if _check_insert_data(tag, str, "tag"):
                if relationships:
                    self.remove_asset_from_relatives(genome, asset, tag)
                del self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
                    CFG_ASSET_TAGS_KEY
                ][tag]
                _del_if_empty(
                    self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset],
                    CFG_ASSET_TAGS_KEY,
                    [self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY], asset],
                )
                _del_if_empty(self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY], asset)
                _del_if_empty(
                    self[CFG_GENOMES_KEY][genome],
                    CFG_ASSETS_KEY,
                    [self[CFG_GENOMES_KEY], genome],
                )
                _del_if_empty(self[CFG_GENOMES_KEY], genome)
                try:
                    default_tag = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][
                        asset
                    ][CFG_ASSET_DEFAULT_TAG_KEY]
                except KeyError:
                    pass
                else:
                    if default_tag == tag:
                        del self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
                            CFG_ASSET_DEFAULT_TAG_KEY
                        ]
                if len(self[CFG_GENOMES_KEY]) == 0:
                    self[CFG_GENOMES_KEY] = None
    return self

cfg_tag_asset

cfg_tag_asset(genome, asset, tag, new_tag, force=False)

Retags the asset selected by the tag with the new_tag. Prompts if default already exists and overrides upon confirmation.

This method does not override the original asset entry in the RefGenConf object. It creates its copy and tags it with the new_tag. Additionally, if the retagged asset has any children their parent will be retagged as new_tag that was introduced upon this method execution.

Parameters:

Name Type Description Default
genome str

name of a reference genome assembly of interest

required
asset str

name of particular asset of interest

required
tag str

name of the tag that identifies the asset of interest

required
new_tag str

name of particular the new tag

required
force bool

force any actions that require approval

False

Returns:

Type Description

a logical indicating whether the tagging was successful

Raises:

Type Description
ValueError

when the original tag is not specified

Source code in refgenconf/refgenconf.py
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
def cfg_tag_asset(self, genome, asset, tag, new_tag, force=False):
    """
    Retags the asset selected by the tag with the new_tag.
    Prompts if default already exists and overrides upon confirmation.

    This method does not override the original asset entry in the
    RefGenConf object. It creates its copy and tags it with the new_tag.
    Additionally, if the retagged asset has any children their parent will
     be retagged as new_tag that was introduced upon this method execution.

    :param str genome: name of a reference genome assembly of interest
    :param str asset: name of particular asset of interest
    :param str tag: name of the tag that identifies the asset of interest
    :param str new_tag: name of particular the new tag
    :param bool force: force any actions that require approval
    :raise ValueError: when the original tag is not specified
    :return bool: a logical indicating whether the tagging was successful
    """
    self._assert_gat_exists(genome, asset, tag)
    asset_mapping = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset]
    if tag is None:
        ts = ", ".join(get_asset_tags(asset_mapping))
        raise ValueError(
            f"You must explicitly specify the tag of the asset"
            f" you want to reassign. Currently defined tags "
            f"for '{genome}/{asset}' are: {ts}"
        )
    if new_tag in asset_mapping[CFG_ASSET_TAGS_KEY]:
        if not force and not query_yes_no(
            f"You already have a '{asset}' asset tagged as "
            f"'{new_tag}', do you wish to override?"
        ):
            _LOGGER.info("Tag action aborted by the user")
            return
    children = []
    parents = []
    if CFG_ASSET_CHILDREN_KEY in asset_mapping[CFG_ASSET_TAGS_KEY][tag]:
        children = asset_mapping[CFG_ASSET_TAGS_KEY][tag][CFG_ASSET_CHILDREN_KEY]
    if CFG_ASSET_PARENTS_KEY in asset_mapping[CFG_ASSET_TAGS_KEY][tag]:
        parents = asset_mapping[CFG_ASSET_TAGS_KEY][tag][CFG_ASSET_PARENTS_KEY]
    if len(children) > 0 or len(parents) > 0:
        if not force and not query_yes_no(
            f"The asset '{genome}/{asset}:{tag}' has {len(children)} "
            f"children and {len(parents)} parents. Refgenie will update"
            f" the relationship data. Do you want to proceed?"
        ):
            _LOGGER.info("Tag action aborted by the user")
            return False
        # updates children's parents
        self._update_relatives_tags(
            genome, asset, tag, new_tag, children, update_children=False
        )
        # updates parents' children
        self._update_relatives_tags(
            genome, asset, tag, new_tag, parents, update_children=True
        )
    self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][CFG_ASSET_TAGS_KEY][
        new_tag
    ] = asset_mapping[CFG_ASSET_TAGS_KEY][tag]
    if (
        CFG_ASSET_DEFAULT_TAG_KEY in asset_mapping
        and asset_mapping[CFG_ASSET_DEFAULT_TAG_KEY] == tag
    ):
        self.set_default_pointer(
            genome, asset, new_tag, force_exists=True, force_fasta=True
        )
    self.cfg_remove_assets(genome, asset, tag)
    return True

chk_digest_update_child

chk_digest_update_child(genome, remote_asset_name, child_name, server_url)

Check local asset digest against the remote one and populate children of the asset with the provided asset:tag.

In case the local asset does not exist, the config is populated with the remote asset digest and children data

Parameters:

Name Type Description Default
genome str

name of the genome to check the asset digests for

required
remote_asset_name str

asset and tag names, formatted like: asset:tag

required
child_name str

name to be appended to the children of the parent

required
server_url str

address of the server to query for the digests

required

Raises:

Type Description
RefgenconfError

if the local digest does not match its remote counterpart

Source code in refgenconf/refgenconf.py
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
def chk_digest_update_child(
    self, genome, remote_asset_name, child_name, server_url
):
    """
    Check local asset digest against the remote one and populate children of the
    asset with the provided asset:tag.

    In case the local asset does not exist, the config is populated with the remote
     asset digest and children data

    :param str genome: name of the genome to check the asset digests for
    :param str remote_asset_name: asset and tag names, formatted like: asset:tag
    :param str child_name: name to be appended to the children of the parent
    :param str server_url: address of the server to query for the digests
    :raise RefgenconfError: if the local digest does not match its remote counterpart
    """
    remote_asset_data = prp(remote_asset_name)
    asset = remote_asset_data["item"]
    tag = remote_asset_data["tag"]
    asset_digest_url = construct_request_url(server_url, API_ID_DIGEST).format(
        genome=genome, asset=asset, tag=tag
    )
    try:
        remote_digest = send_data_request(asset_digest_url)
    except DownloadJsonError:
        return
    try:
        # we need to allow for missing seek_keys section so that the digest is
        # respected even from the previously populated 'incomplete asset' from
        # the server
        self._assert_gat_exists(
            genome,
            asset,
            tag,
            allow_incomplete=not self.is_asset_complete(genome, asset, tag),
        )
    except (KeyError, MissingAssetError, MissingGenomeError, MissingSeekKeyError):
        self.update_tags(
            genome, asset, tag, {CFG_ASSET_CHECKSUM_KEY: remote_digest}
        )
        _LOGGER.info(
            f"Could not find '{genome}/{asset}:{tag}' digest. "
            f"Populating with server data"
        )
    else:
        local_digest = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
            CFG_ASSET_TAGS_KEY
        ][tag][CFG_ASSET_CHECKSUM_KEY]
        if remote_digest != local_digest:
            raise RemoteDigestMismatchError(asset, local_digest, remote_digest)
    finally:
        self.update_relatives_assets(
            genome, asset, tag, [child_name], children=True
        )

compare

compare(genome1, genome2, explain=False)

Check genomes compatibility level. Compares Annotated Sequence Digests (ASDs) -- digested sequences and metadata

Parameters:

Name Type Description Default
genome1 str

name of the first genome to compare

required
genome2 str

name of the first genome to compare

required
explain bool

whether the returned code explanation should be displayed

False

Returns:

Type Description

compatibility code

Source code in refgenconf/refgenconf.py
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
def compare(self, genome1, genome2, explain=False):
    """
    Check genomes compatibility level.
    Compares Annotated Sequence Digests (ASDs) -- digested sequences and
    metadata
    :param str genome1: name of the first genome to compare
    :param str genome2: name of the first genome to compare
    :param bool explain: whether the returned code explanation should
        be displayed
    :return int: compatibility code
    """

    def _get_asds_for_genome(rgc, genome):
        """
        Read JSON file containing ASDs for a specified genome
        :param refgenconf.RefGenConf rgc: object to find the genome for
        :param str genome: genome to find the file for
        :return list[dict]: list of ASDs, ready to compare
        """
        g = rgc.get_genome_alias(genome, fallback=True)
        error_msg = (
            f"File containing Annotated Sequence Digests (ASDs) not "
            f"found for genome: {g}. Must pull or build '{g}/fasta' again to "
            f"check the compatibility."
        )
        try:
            rgc.seek_src(genome, "fasta", strict_exists=True)
        except MissingSeekKeyError:
            raise MissingSeekKeyError(error_msg)
        json_file = rgc.get_asds_path(genome)
        if not os.path.exists(json_file):
            raise OSError(error_msg)
        with open(json_file, "r") as jfp:
            return json.load(jfp)

    return SeqColClient({}).compare_asds(
        _get_asds_for_genome(self, self.get_genome_alias_digest(genome1, True)),
        _get_asds_for_genome(self, self.get_genome_alias_digest(genome2, True)),
        explain=explain,
    )

filepath

filepath(genome, asset, tag, ext='.tgz', dir=False)

Determine path to a particular asset for a particular genome.

Parameters:

Name Type Description Default
genome str

reference genome ID

required
asset str

asset name

required
tag str

tag name

required
ext str

file extension

'.tgz'
dir bool

whether to return the enclosing directory instead of the file

False

Returns:

Type Description

path to asset for given genome and asset kind/name

Source code in refgenconf/refgenconf.py
649
650
651
652
653
654
655
656
657
658
659
660
661
def filepath(self, genome, asset, tag, ext=".tgz", dir=False):
    """
    Determine path to a particular asset for a particular genome.

    :param str genome: reference genome ID
    :param str asset: asset name
    :param str tag: tag name
    :param str ext: file extension
    :param bool dir: whether to return the enclosing directory instead of the file
    :return str: path to asset for given genome and asset kind/name
    """
    tag_dir = os.path.join(self.data_dir, genome, asset, tag)
    return os.path.join(tag_dir, asset + "__" + tag + ext) if not dir else tag_dir

genomes_list

genomes_list(order=None)

Get a list of this configuration's reference genome assembly IDs.

Returns:

Type Description

list of this configuration's reference genome assembly IDs

Source code in refgenconf/refgenconf.py
663
664
665
666
667
668
669
670
671
672
673
674
675
676
def genomes_list(self, order=None):
    """
    Get a list of this configuration's reference genome assembly IDs.

    :return Iterable[str]: list of this configuration's reference genome
        assembly IDs
    """
    return sorted(
        [
            self.get_genome_alias(x, fallback=True)
            for x in self[CFG_GENOMES_KEY].keys()
        ],
        key=order,
    )

genomes_str

genomes_str(order=None)

Get as single string this configuration's reference genome assembly IDs.

Parameters:

Name Type Description Default
order

how to key genome IDs for sort

None

Returns:

Type Description

single string that lists this configuration's known reference genome assembly IDs

Source code in refgenconf/refgenconf.py
678
679
680
681
682
683
684
685
686
def genomes_str(self, order=None):
    """
    Get as single string this configuration's reference genome assembly IDs.

    :param function(str) -> object order: how to key genome IDs for sort
    :return str: single string that lists this configuration's known
        reference genome assembly IDs
    """
    return ", ".join(self.genomes_list(order))

get_asds_path

get_asds_path(genome)

Get path to the Annotated Sequence Digests JSON file for a given genome. Note that the path and/or genome may not exist.

Parameters:

Name Type Description Default
genome str

genome name

required

Returns:

Type Description

ASDs path

Source code in refgenconf/refgenconf.py
1970
1971
1972
1973
1974
1975
1976
1977
1978
def get_asds_path(self, genome):
    """
    Get path to the Annotated Sequence Digests JSON file for a given genome.
    Note that the path and/or genome may not exist.

    :param str genome: genome name
    :return str: ASDs path
    """
    return os.path.join(self.data_dir, genome, f"{genome}__ASDs.json")

get_asset_table

get_asset_table(genomes=None, server_url=None, get_json_url=lambda s, i: construct_request_url(s, i, PRIVATE_API))

Get a rich.Table object representing assets available locally

Parameters:

Name Type Description Default
genomes list[str]

genomes to restrict the results with

None
server_url str

server URL to query for the remote genome data

None
get_json_url

how to build URL from genome server URL base, genome, and asset

lambda s, i: construct_request_url(s, i, PRIVATE_API)

Returns:

Type Description

table of assets available locally

Source code in refgenconf/refgenconf.py
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
def get_asset_table(
    self,
    genomes=None,
    server_url=None,
    get_json_url=lambda s, i: construct_request_url(s, i, PRIVATE_API),
):
    """
    Get a rich.Table object representing assets available locally

    :param list[str] genomes: genomes to restrict the results with
    :param str server_url: server URL to query for the remote genome data
    :param function(str, str) -> str get_json_url: how to build URL from
        genome server URL base, genome, and asset
    :return rich.table.Table: table of assets available locally
    """

    def _fill_table_with_genomes_data(rgc, genomes_data, table, genomes=None):
        it = "([italic]{}[/italic])"
        table.add_column("genome")
        if genomes:
            table.add_column("asset " + it.format("seek_keys"))
            table.add_column("tags")
            for g in genomes:
                try:
                    genome = rgc.get_genome_alias_digest(alias=g, fallback=True)
                except yacman.UndefinedAliasError:
                    rgc.set_genome_alias(
                        genome=g, create_genome=True, no_write=True
                    )
                    genome = rgc.get_genome_alias_digest(alias=g, fallback=True)
                if genome not in genomes_data:
                    _LOGGER.error(f"Genome {g} ({genome}) not found")
                    continue
                genome_dict = genomes_data[genome]
                if CFG_ASSETS_KEY not in genome_dict:
                    continue
                for asset, asset_dict in genome_dict[CFG_ASSETS_KEY].items():
                    tags = list(asset_dict[CFG_ASSET_TAGS_KEY].keys())
                    if (
                        CFG_SEEK_KEYS_KEY
                        not in asset_dict[CFG_ASSET_TAGS_KEY][tags[0]]
                    ):
                        continue
                    seek_keys = list(
                        asset_dict[CFG_ASSET_TAGS_KEY][tags[0]][
                            CFG_SEEK_KEYS_KEY
                        ].keys()
                    )
                    table.add_row(
                        ", ".join(genome_dict[CFG_ALIASES_KEY]),
                        "{} ".format(asset) + it.format(", ".join(seek_keys)),
                        ", ".join(tags),
                    )
        else:
            table.add_column("assets")
            for genome in list(genomes_data.keys()):
                genome_dict = genomes_data[genome]
                if CFG_ASSETS_KEY not in genome_dict:
                    continue
                table.add_row(
                    ", ".join(genome_dict[CFG_ALIASES_KEY]),
                    ", ".join(list(genome_dict[CFG_ASSETS_KEY].keys())),
                )
        return table

    if server_url is None:
        genomes_data = self[CFG_GENOMES_KEY]
        title = (
            f"Local refgenie assets\nServer subscriptions: "
            f"{', '.join(self[CFG_SERVERS_KEY])}"
        )
    else:
        genomes_data = send_data_request(
            get_json_url(server_url, API_ID_GENOMES_DICT)
        )
        title = f"Remote refgenie assets\nServer URL: {server_url}"
    c = (
        f"use refgenie list{'r' if server_url is not None else ''} "
        f"-g <genome> for more detailed view"
        if genomes is None
        else ""
    )
    return _fill_table_with_genomes_data(
        self, genomes_data, Table(title=title, min_width=70, caption=c), genomes
    )

get_default_tag

get_default_tag(genome, asset, use_existing=True)

Determine the asset tag to use as default. The one indicated by the 'default_tag' key in the asset section is returned. If no 'default_tag' key is found, by default the first listed tag is returned with a RuntimeWarning. This behavior can be turned off with use_existing=False

Parameters:

Name Type Description Default
genome str

name of a reference genome assembly of interest

required
asset str

name of the particular asset of interest

required
use_existing bool

whether the first tag in the config should be returned in case there is no default tag defined for an asset

True

Returns:

Type Description

name of the tag to use as the default one

Source code in refgenconf/refgenconf.py
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
def get_default_tag(self, genome, asset, use_existing=True):
    """
    Determine the asset tag to use as default. The one indicated by
    the 'default_tag' key in the asset section is returned.
    If no 'default_tag' key is found, by default the first listed tag is returned
    with a RuntimeWarning. This behavior can be turned off with use_existing=False

    :param str genome: name of a reference genome assembly of interest
    :param str asset: name of the particular asset of interest
    :param bool use_existing: whether the first tag in the config should be
        returned in case there is no default tag defined for an asset
    :return str: name of the tag to use as the default one
    """
    try:
        self._assert_gat_exists(genome, asset)
    except RefgenconfError:
        _LOGGER.info(
            "Using '{}' as the default tag for '{}/{}'".format(
                DEFAULT_TAG, genome, asset
            )
        )
        return DEFAULT_TAG
    try:
        return self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
            CFG_ASSET_DEFAULT_TAG_KEY
        ]
    except KeyError:
        alt = (
            self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
                CFG_ASSET_TAGS_KEY
            ].keys()[0]
            if use_existing
            else DEFAULT_TAG
        )
        if isinstance(alt, str):
            if alt != DEFAULT_TAG:
                warnings.warn(
                    "Could not find the '{}' key for asset '{}/{}'. "
                    "Used the first one in the config instead: '{}'. "
                    "Make sure it does not corrupt your workflow.".format(
                        CFG_ASSET_DEFAULT_TAG_KEY, genome, asset, alt
                    ),
                    RuntimeWarning,
                )
            else:
                warnings.warn(
                    "Could not find the '{}' key for asset '{}/{}'. Returning '{}' "
                    "instead. Make sure it does not corrupt your workflow.".format(
                        CFG_ASSET_DEFAULT_TAG_KEY, genome, asset, alt
                    ),
                    RuntimeWarning,
                )
            return alt
    except TypeError:
        _raise_not_mapping(
            self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset], "Asset section "
        )

get_genome_alias

get_genome_alias(digest, fallback=False, all_aliases=False)

Get the human readable alias for a genome digest

Parameters:

Name Type Description Default
digest str

digest to find human-readable alias for

required
fallback bool

whether to return the query digest in case of failure

False
all_aliases bool

whether to return all aliases instead of just the first one

False

Returns:

Type Description

human-readable aliases

Raises:

Type Description
GenomeConfigFormatError

if "genome_digests" section does not exist in the config

UndefinedAliasError

if a no alias has been defined for the requested digest

Source code in refgenconf/refgenconf.py
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
def get_genome_alias(self, digest, fallback=False, all_aliases=False):
    """
    Get the human readable alias for a genome digest

    :param str digest: digest to find human-readable alias for
    :param bool fallback: whether to return the query digest in case
        of failure
    :param bool all_aliases: whether to return all aliases instead of just
        the first one
    :return str | list[str]: human-readable aliases
    :raise GenomeConfigFormatError: if "genome_digests" section does
        not exist in the config
    :raise UndefinedAliasError: if a no alias has been defined for the
        requested digest
    """
    try:
        res = self[CFG_GENOMES_KEY].get_aliases(key=digest)
        return res if all_aliases else res[0]
    except (yacman.UndefinedAliasError, AttributeError):
        if not fallback:
            raise
        if digest in self.genome_aliases.keys():
            return digest
        raise

get_genome_alias_digest

get_genome_alias_digest(alias, fallback=False)

Get the human readable alias for a genome digest

Parameters:

Name Type Description Default
alias str

alias to find digest for

required
fallback bool

whether to return the query alias in case of failure and in case it is one of the digests

False

Returns:

Type Description

genome digest

Raises:

Type Description
UndefinedAliasError

if the specified alias has been assigned to any digests

Source code in refgenconf/refgenconf.py
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
def get_genome_alias_digest(self, alias, fallback=False):
    """
    Get the human readable alias for a genome digest

    :param str alias: alias to find digest for
    :param bool fallback: whether to return the query alias in case
        of failure and in case it is one of the digests
    :return str: genome digest
    :raise UndefinedAliasError: if the specified alias has been assigned to
        any digests
    """
    try:
        return self[CFG_GENOMES_KEY].get_key(alias=alias)
    except (yacman.UndefinedAliasError, AttributeError):
        if not fallback:
            raise
        if alias in self.genome_aliases.values():
            return alias
        raise

get_genome_attributes

get_genome_attributes(genome)

Get the dictionary attributes, like checksum, contents, description. Does not return the assets.

Parameters:

Name Type Description Default
genome str

genome to get the attributes dict for

required

Returns:

Type Description

available genome attributes

Source code in refgenconf/refgenconf.py
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
def get_genome_attributes(self, genome):
    """
    Get the dictionary attributes, like checksum, contents, description.
    Does not return the assets.

    :param str genome: genome to get the attributes dict for
    :return Mapping[str, str]: available genome attributes
    """
    return {
        k: self[CFG_GENOMES_KEY][genome][k]
        for k in CFG_GENOME_ATTRS_KEYS
        if k in self[CFG_GENOMES_KEY][genome]
    }

get_local_data_str

get_local_data_str(genome=None, order=None)

List locally available reference genome IDs and assets by ID.

Parameters:

Name Type Description Default
genome

genomes that the assets should be found for

None
order

how to key genome IDs and asset names for sort

None

Returns:

Type Description

text reps of locally available genomes and assets

Source code in refgenconf/refgenconf.py
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
def get_local_data_str(self, genome=None, order=None):
    """
    List locally available reference genome IDs and assets by ID.

    :param list[str] | str genome: genomes that the assets should be found for
    :param function(str) -> object order: how to key genome IDs and asset
        names for sort
    :return str, str: text reps of locally available genomes and assets
    """
    exceptions = []
    if genome is not None:
        genome = _make_list_of_str(genome)
        for g in genome:
            try:
                self._assert_gat_exists(gname=g)
            except MissingGenomeError as e:
                exceptions.append(e)
        if exceptions:
            raise MissingGenomeError(", ".join(map(str, exceptions)))
    return (
        ", ".join(self._select_genomes(genome=genome, order=order)),
        self.assets_str(genome=genome, order=order),
    )

get_remote_data_str

get_remote_data_str(genome=None, order=None, get_url=lambda server, id: construct_request_url(server, id))

List genomes and assets available remotely.

Parameters:

Name Type Description Default
get_url

how to determine URL request, given server URL and endpoint operationID

lambda server, id: construct_request_url(server, id)
genome

genomes that the assets should be found for

None
order

how to key genome IDs and asset names for sort

None

Returns:

Type Description

text reps of remotely available genomes and assets

Source code in refgenconf/refgenconf.py
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
def get_remote_data_str(
    self,
    genome=None,
    order=None,
    get_url=lambda server, id: construct_request_url(server, id),
):
    """
    List genomes and assets available remotely.

    :param function(serverUrl, operationId) -> str get_url: how to determine
        URL request, given server URL and endpoint operationID
    :param list[str] | str genome: genomes that the assets should be found for
    :param function(str) -> object order: how to key genome IDs and asset
        names for sort
    :return str, str: text reps of remotely available genomes and assets
    """
    warnings.warn(
        "Please use listr method instead; get_remote_data_str will be "
        "removed in the next release.",
        category=DeprecationWarning,
    )
    return self.listr(genome, order, get_url)
get_symlink_paths(genome, asset=None, tag=None, all_aliases=False)

Get path to the alias directory for the selected genome-asset-tag

Parameters:

Name Type Description Default
genome str

reference genome ID

required
asset str

asset name

None
tag str

tag name

None
all_aliases bool

whether to return a collection of symbolic links or just the first one from the alias list

False

Returns:

Type Description
Source code in refgenconf/refgenconf.py
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
def get_symlink_paths(self, genome, asset=None, tag=None, all_aliases=False):
    """
    Get path to the alias directory for the selected genome-asset-tag

    :param str genome: reference genome ID
    :param str asset: asset name
    :param str tag: tag name
    :param bool all_aliases: whether to return a collection of symbolic
        links or just the first one from the alias list
    :return dict:
    """
    try:
        defined_aliases = self.get_genome_alias(
            genome, fallback=True, all_aliases=all_aliases
        )
    except yacman.UndefinedAliasError:
        return {}
    alias = _make_list_of_str(defined_aliases)
    if asset:
        tag = tag or self.get_default_tag(genome, asset)
    return {
        a: os.path.join(self.alias_dir, a, asset, tag)
        if asset
        else os.path.join(self.alias_dir, a)
        for a in alias
    }

getseq

getseq(genome, locus, as_str=False)

Return the sequence found in a selected range and chromosome. Something like the refget protocol.

Parameters:

Name Type Description Default
genome str

name of the sequence identifier

required
locus str

coordinates of desired sequence, e.g. 'chr1:1-10'

required
as_str bool

whether to convert the resurned object to string and return just the sequence

False

Returns:

Type Description

selected sequence

Source code in refgenconf/refgenconf.py
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
def getseq(self, genome, locus, as_str=False):
    """
    Return the sequence found in a selected range and chromosome.
    Something like the refget protocol.

    :param str genome: name of the sequence identifier
    :param str locus: coordinates of desired sequence, e.g. 'chr1:1-10'
    :param bool as_str: whether to convert the resurned object to string
        and return just the sequence
    :return str | pyfaidx.FastaRecord | pyfaidx.Sequence: selected sequence
    """
    import pyfaidx

    fa = pyfaidx.Fasta(self.seek_src(genome, "fasta", strict_exists=True))
    locus_split = locus.split(":")
    chr = fa[locus_split[0]]
    if len(locus_split) == 1:
        return str(chr) if as_str else chr
    start, end = locus_split[1].split("-")
    _LOGGER.debug(
        "chr: '{}', start: '{}', end: '{}'".format(locus_split[0], start, end)
    )
    return str(chr[int(start) : int(end)]) if as_str else chr[int(start) : int(end)]

id

id(genome, asset, tag=None)

Returns the digest for the specified asset. The defined default tag will be used if not provided as an argument

Parameters:

Name Type Description Default
genome str

genome identifier

required
asset str

asset identifier

required
tag str

tag identifier

None

Returns:

Type Description

asset digest for the tag

Source code in refgenconf/refgenconf.py
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
def id(self, genome, asset, tag=None):
    """
    Returns the digest for the specified asset.
    The defined default tag will be used if not provided as an argument

    :param str genome: genome identifier
    :param str asset: asset identifier
    :param str tag: tag identifier
    :return str: asset digest for the tag
    """
    self._assert_gat_exists(genome, asset, tag)
    tag = tag or self.get_default_tag(genome, asset)
    a = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset]
    if CFG_ASSET_CHECKSUM_KEY in a[CFG_ASSET_TAGS_KEY][tag]:
        return a[CFG_ASSET_TAGS_KEY][tag][CFG_ASSET_CHECKSUM_KEY]
    raise MissingConfigDataError(
        "Digest does not exist for: {}/{}:{}".format(genome, asset, tag)
    )

initialize_config_file

initialize_config_file(filepath=None)

Initialize genome configuration file on disk

Parameters:

Name Type Description Default
filepath str

a valid path where the configuration file should be initialized

None

Returns:

Type Description

the filepath the file was initialized at

Raises:

Type Description
OSError

in case the file could not be initialized due to insufficient permissions or pre-existence

TypeError

if no valid filepath cat be determined

Source code in refgenconf/refgenconf.py
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
def initialize_config_file(self, filepath=None):
    """
    Initialize genome configuration file on disk

    :param str filepath: a valid path where the configuration file should be initialized
    :return str: the filepath the file was initialized at
    :raise OSError: in case the file could not be initialized due to insufficient permissions or pre-existence
    :raise TypeError: if no valid filepath cat be determined
    """

    def _write_fail_err(reason):
        raise OSError("Can't initialize, {}: {} ".format(reason, filepath))

    filepath = select_genome_config(filepath, check_exist=False)
    if not isinstance(filepath, str):
        raise TypeError(
            f"Could not determine a valid path to initialize a "
            f"configuration file: {filepath}"
        )
    if os.path.exists(filepath):
        _write_fail_err("file exists")
    if not is_writable(filepath, check_exist=False):
        _write_fail_err("insufficient permissions")
    self.make_writable(filepath)
    self.write()
    self.make_readonly()
    _LOGGER.info(f"Initialized genome configuration file: {filepath}")
    os.makedirs(self.data_dir, exist_ok=True)
    os.makedirs(self.alias_dir, exist_ok=True)
    _LOGGER.info(
        f"Created directories:{block_iter_repr([self.data_dir, self.alias_dir])}"
    )

    return filepath

initialize_genome

initialize_genome(fasta_path, alias, fasta_unzipped=False, skip_alias_write=False)

Initialize a genome

Create a JSON file with Annotated Sequence Digests (ASDs) for the FASTA file in the genome directory.

Parameters:

Name Type Description Default
fasta_path str

path to a FASTA file to initialize genome with

required
alias str

alias to set for the genome

required
skip_alias_write bool

whether to skip writing the alias to the file

False

Returns:

Type Description

human-readable name for the genome

Source code in refgenconf/refgenconf.py
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
def initialize_genome(
    self, fasta_path, alias, fasta_unzipped=False, skip_alias_write=False
):
    """
    Initialize a genome

    Create a JSON file with Annotated Sequence Digests (ASDs)
    for the FASTA file in the genome directory.

    :param str fasta_path: path to a FASTA file to initialize genome with
    :param str alias: alias to set for the genome
    :param bool skip_alias_write: whether to skip writing the alias to the file
    :return str, list[dict[]]: human-readable name for the genome
    """
    _LOGGER.info("Initializing genome: {}".format(alias))
    if not os.path.isfile(fasta_path):
        raise FileNotFoundError(
            "Can't initialize genome; FASTA file does "
            "not exist: {}".format(fasta_path)
        )
    ssc = SeqColClient({})
    d, _ = ssc.load_fasta(fasta_path, gzipped=not fasta_unzipped)
    # retrieve annotated sequence digests list to save in a JSON file
    asdl = ssc.retrieve(druid=d)
    pth = self.get_asds_path(d)
    os.makedirs(os.path.dirname(pth), exist_ok=True)
    with open(pth, "w") as jfp:
        json.dump(asdl, jfp)
    _LOGGER.debug("Saved ASDs to JSON: {}".format(pth))
    self.set_genome_alias(
        genome=alias,
        digest=d,
        overwrite=True,
        create_genome=True,
        no_write=skip_alias_write,
    )
    return d, asdl

is_asset_complete

is_asset_complete(genome, asset, tag)

Check whether all required tag attributes are defined in the RefGenConf object. This is the way we determine tag completeness.

Parameters:

Name Type Description Default
genome str

genome to be checked

required
asset str

asset package to be checked

required
tag str

tag to be checked

required

Returns:

Type Description

the decision

Source code in refgenconf/refgenconf.py
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
def is_asset_complete(self, genome, asset, tag):
    """
    Check whether all required tag attributes are defined in the RefGenConf object.
    This is the way we determine tag completeness.

    :param str genome: genome to be checked
    :param str asset: asset package to be checked
    :param str tag: tag to be checked
    :return bool: the decision
    """
    tag_data = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
        CFG_ASSET_TAGS_KEY
    ][tag]
    return all([r in tag_data for r in REQ_TAG_ATTRS])

list

list(genome=None, order=None, include_tags=False)

List local assets; map each namespace to a list of available asset names

Parameters:

Name Type Description Default
order

how to key genome IDs for sort

None
genome

genomes that the assets should be found for

None
include_tags bool

whether asset tags should be included in the returned dict

False

Returns:

Type Description

mapping from assembly name to collection of available asset names.

Source code in refgenconf/refgenconf.py
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
def list(self, genome=None, order=None, include_tags=False):
    """
    List local assets; map each namespace to a list of available asset names

    :param callable(str) -> object order: how to key genome IDs for sort
    :param list[str] | str genome: genomes that the assets should be found for
    :param bool include_tags: whether asset tags should be included in the returned dict
    :return Mapping[str, Iterable[str]]: mapping from assembly name to
        collection of available asset names.
    """
    self.run_plugins(PRE_LIST_HOOK)
    refgens = self._select_genomes(genome=genome, order=order)
    if include_tags:
        self.run_plugins(POST_LIST_HOOK)
        return OrderedDict(
            [
                (
                    g,
                    sorted(
                        _make_asset_tags_product(
                            self[CFG_GENOMES_KEY][g][CFG_ASSETS_KEY], ":"
                        ),
                        key=order,
                    ),
                )
                for g in refgens
                if CFG_ASSETS_KEY in self[CFG_GENOMES_KEY][g]
            ]
        )
    self.run_plugins(POST_LIST_HOOK)
    return OrderedDict(
        [
            (
                g,
                sorted(
                    list(self[CFG_GENOMES_KEY][g][CFG_ASSETS_KEY].keys()), key=order
                ),
            )
            for g in refgens
            if CFG_ASSETS_KEY in self[CFG_GENOMES_KEY][g]
        ]
    )

list_assets_by_genome

list_assets_by_genome(genome=None, order=None, include_tags=False)

List types/names of assets that are available for one--or all--genomes.

Parameters:

Name Type Description Default
genome

reference genome assembly ID, optional; if omitted, the full mapping from genome to asset names

None
order

how to key genome IDs and asset names for sort

None
include_tags bool

whether asset tags should be included in the returned dict

False

Returns:

Type Description

collection of asset type names available for particular reference assembly if one is provided, else the full mapping between assembly ID and collection available asset type names

Source code in refgenconf/refgenconf.py
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
def list_assets_by_genome(self, genome=None, order=None, include_tags=False):
    """
    List types/names of assets that are available for one--or all--genomes.

    :param str | NoneType genome: reference genome assembly ID, optional;
        if omitted, the full mapping from genome to asset names
    :param function(str) -> object order: how to key genome IDs and asset
        names for sort
    :param bool include_tags: whether asset tags should be included in the
        returned dict
    :return Iterable[str] | Mapping[str, Iterable[str]]: collection of
        asset type names available for particular reference assembly if
        one is provided, else the full mapping between assembly ID and
        collection available asset type names
    """
    if genome:
        genome = self.get_genome_alias(digest=genome, fallback=True)
    return (
        self.list(genome, order, include_tags=include_tags)[genome]
        if genome is not None
        else self.list(order, include_tags=include_tags)
    )

list_genomes_by_asset

list_genomes_by_asset(asset=None, order=None)

List assemblies for which a particular asset is available.

Parameters:

Name Type Description Default
asset

name of type of asset of interest, optional

None
order

how to key genome IDs and asset names for sort

None

Returns:

Type Description

collection of assemblies for which the given asset is available; if asset argument is omitted, the full mapping from name of asset type to collection of assembly names for which the asset key is available will be returned.

Source code in refgenconf/refgenconf.py
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
def list_genomes_by_asset(self, asset=None, order=None):
    """
    List assemblies for which a particular asset is available.

    :param str | NoneType asset: name of type of asset of interest, optional
    :param function(str) -> object order: how to key genome IDs and asset
        names for sort
    :return Iterable[str] | Mapping[str, Iterable[str]]: collection of
        assemblies for which the given asset is available; if asset
        argument is omitted, the full mapping from name of asset type to
        collection of assembly names for which the asset key is available
        will be returned.
    """
    return (
        self._invert_genomes(order)
        if not asset
        else sorted(
            [
                self.get_genome_alias(g, fallback=True)
                for g, data in self[CFG_GENOMES_KEY].items()
                if asset in data.get(CFG_ASSETS_KEY)
            ],
            key=order,
        )
    )

list_seek_keys_values

list_seek_keys_values(genomes=None, assets=None)

List values for all seek keys for the specified genome and asset. Leave the arguments out to get all seek keys values managed by refgenie.

Parameters:

Name Type Description Default
genome_names

optional list of genomes to include

required
asset_names

optional list of assets to include

required

Returns:

Type Description

a nested dictionary with the seek key values

Source code in refgenconf/refgenconf.py
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
def list_seek_keys_values(self, genomes=None, assets=None):
    """
    List values for all seek keys for the specified genome and asset.
    Leave the arguments out to get all seek keys values managed by refgenie.

    :param str | List[str] genome_names: optional list of genomes to include
    :param str | List[str] asset_names: optional list of assets to include
    :return dict: a nested dictionary with the seek key values
    """
    ret = {}

    if genomes is None:
        genome_names = self.genomes_list()
    else:
        genome_names = _make_list_of_str(genomes)

    for genome_name in genome_names:
        self._assert_gat_exists(genome_name)
        ret[genome_name] = {}
        if assets is None:
            asset_names = self.list_assets_by_genome(genome_name)
        else:
            asset_names = _make_list_of_str(assets)
        for asset_name in asset_names:
            try:
                self._assert_gat_exists(genome_name, asset_name)
            except MissingAssetError as e:
                _LOGGER.warning(f"Skipping {asset_name} asset: {str(e)}")
                continue
            asset_mapping = self[CFG_GENOMES_KEY][genome_name][CFG_ASSETS_KEY][
                asset_name
            ]
            ret[genome_name][asset_name] = {}
            for tag_name in get_asset_tags(asset_mapping):
                tag_mapping = asset_mapping[CFG_ASSET_TAGS_KEY][tag_name]
                ret[genome_name][asset_name][tag_name] = {}
                for seek_key_name in get_tag_seek_keys(tag_mapping):
                    ret[genome_name][asset_name][tag_name][
                        seek_key_name
                    ] = self.seek(genome_name, asset_name, tag_name, seek_key_name)
    return ret

listr

listr(genome=None, get_url=lambda server, id: construct_request_url(server, id), as_digests=False)

List genomes and assets available remotely on all servers the object subscribes to

Parameters:

Name Type Description Default
get_url

how to determine URL request, given server URL and endpoint operationID

lambda server, id: construct_request_url(server, id)
genome

genomes that the assets should be found for

None
order

how to key genome IDs and asset names for sort

required

Returns:

Type Description

remotely available genomes and assets keyed by genome keyed by source server endpoint

Source code in refgenconf/refgenconf.py
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
def listr(
    self,
    genome=None,
    get_url=lambda server, id: construct_request_url(server, id),
    as_digests=False,
):
    """
    List genomes and assets available remotely on all servers the object
    subscribes to

    :param function(serverUrl, operationId) -> str get_url: how to determine
        URL request, given server URL and endpoint operationID
    :param list[str] | str genome: genomes that the assets should be found for
    :param function(str) -> object order: how to key genome IDs and asset
        names for sort
    :return dict[OrderedDict[list]]: remotely available genomes and assets
        keyed by genome keyed by source server endpoint
    """
    data_by_server = {}

    for url in self[CFG_SERVERS_KEY]:
        aliases_url = get_url(url, API_ID_ALIASES_DICT)
        assets_url = get_url(url, API_ID_ASSETS)
        if assets_url is None or aliases_url is None:
            continue

        aliases_by_digest = send_data_request(aliases_url)
        # convert the original, condensed mapping to a data structure with optimal time complexity
        digests_by_alias = {}
        for k, v in aliases_by_digest.items():
            for alias in v:
                digests_by_alias[alias] = k

        genome_digests = None
        genomes = genome if isinstance(genome, list) else [genome]
        if genome is not None:
            genome_digests = [
                g
                if g in aliases_by_digest.keys()
                else digests_by_alias.get(g, None)
                for g in genomes
            ]
            if genome_digests is None:
                _LOGGER.info(f"{genome} not found on server: {url}")
                continue

        server_data = self._list_remote(
            url=assets_url,
            genome=genome_digests,
        )
        data_by_server[assets_url] = (
            server_data
            if as_digests
            else {aliases_by_digest[k][0]: v for k, v in server_data.items()}
        )

    return data_by_server

populate

populate(glob)

Populates local refgenie references from refgenie://genome/asset.seek_key:tag registry paths

Parameters:

Name Type Description Default
glob

String which may contain refgenie registry paths as values; or a dict, for which values may contain refgenie registry paths. Dict include nested dicts.

required

Returns:

Type Description

modified input dict with refgenie paths populated

Source code in refgenconf/refgenconf.py
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
def populate(self, glob):
    """
    Populates *local* refgenie references from
    refgenie://genome/asset.seek_key:tag registry paths

    :param dict | str | list glob: String which may contain refgenie registry paths as
        values; or a dict, for which values may contain refgenie registry
        paths. Dict include nested dicts.
    :return dict | str | list: modified input dict with refgenie paths populated
    """
    return _populate_refgenie_registry_path(
        self, glob=glob, seek_method_name="seek"
    )

populater

populater(glob, remote_class=None)

Populates remote refgenie references from refgenie://genome/asset:tag registry paths

Parameters:

Name Type Description Default
glob

String which may contain refgenie registry paths as values; or a dict, for which values may contain refgenie registry paths. Dict include nested dicts.

required
remote_class str

remote data provider class, e.g. 'http' or 's3'

None

Returns:

Type Description

modified input dict with refgenie paths populated

Source code in refgenconf/refgenconf.py
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
def populater(self, glob, remote_class=None):
    """
    Populates *remote* refgenie references from
    refgenie://genome/asset:tag registry paths

    :param dict | str | list glob: String which may contain refgenie registry paths as
        values; or a dict, for which values may contain refgenie registry
        paths. Dict include nested dicts.
    :param str remote_class: remote data provider class, e.g. 'http' or 's3'
    :return dict | str | list: modified input dict with refgenie paths populated
    """
    return _populate_refgenie_registry_path(
        self,
        glob=glob,
        seek_method_name="seekr",
        remote_class=remote_class or "http",
    )

pull

pull(genome, asset, tag, unpack=True, force=None, force_large=None, size_cutoff=10, get_json_url=lambda server, operation_id: construct_request_url(server, operation_id), build_signal_handler=_handle_sigint)

Download and possibly unpack one or more assets for a given ref gen.

Parameters:

Name Type Description Default
genome str

name of a reference genome assembly of interest

required
asset str

name of particular asset to fetch

required
tag str

name of particular tag to fetch

required
unpack bool

whether to unpack a tarball

True
force

how to handle case in which asset path already exists; null for prompt (on a per-asset basis), False to effectively auto-reply No to the prompt to replace existing file, and True to auto-replay Yes for existing asset replacement.

None
force_large

how to handle case in large (> 5GB) asset is to be pulled; null for prompt (on a per-asset basis), False to effectively auto-reply No to the prompt, and True to auto-replay Yes

None
size_cutoff float

maximum archive file size to download with no prompt

10
get_json_url

how to build URL from genome server URL base, genome, and asset

lambda server, operation_id: construct_request_url(server, operation_id)
build_signal_handler

how to create a signal handler to use during the download; the single argument to this function factory is the download filepath

_handle_sigint

Returns:

Type Description

a list of genome, asset, tag names and a key-value pair with which genome config file should be updated if pull succeeds, else asset key and a null value

Raises:

Type Description
refgenconf.UnboundEnvironmentVariablesError

if genome folder path contains any env. var. that's unbound

refgenconf.RefGenConfError

if the object update is requested in a non-writable state

Source code in refgenconf/refgenconf.py
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
def pull(
    self,
    genome,
    asset,
    tag,
    unpack=True,
    force=None,
    force_large=None,
    size_cutoff=10,
    get_json_url=lambda server, operation_id: construct_request_url(
        server, operation_id
    ),
    build_signal_handler=_handle_sigint,
):
    """
    Download and possibly unpack one or more assets for a given ref gen.

    :param str genome: name of a reference genome assembly of interest
    :param str asset: name of particular asset to fetch
    :param str tag: name of particular tag to fetch
    :param bool unpack: whether to unpack a tarball
    :param bool | NoneType force: how to handle case in which asset path
        already exists; null for prompt (on a per-asset basis), False to
        effectively auto-reply No to the prompt to replace existing file,
        and True to auto-replay Yes for existing asset replacement.
    :param bool | NoneType force_large: how to handle case in large (> 5GB)
        asset is to be pulled; null for prompt (on a per-asset basis), False
        to effectively auto-reply No to the prompt,
        and True to auto-replay Yes
    :param float size_cutoff: maximum archive file size to download with
        no prompt
    :param function(str, str) -> str get_json_url: how to build URL from
        genome server URL base, genome, and asset
    :param function(str) -> function build_signal_handler: how to create
        a signal handler to use during the download; the single argument
        to this function factory is the download filepath
    :return (list[str], dict, str): a list of genome, asset, tag names
        and a key-value pair with which genome config file should be updated
        if pull succeeds, else asset key and a null value
    :raise refgenconf.UnboundEnvironmentVariablesError: if genome folder
        path contains any env. var. that's unbound
    :raise refgenconf.RefGenConfError: if the object update is requested in
        a non-writable state
    """
    self.run_plugins(PRE_PULL_HOOK)

    def _null_return():
        self.run_plugins(POST_PULL_HOOK)
        return gat, None, None

    def _raise_unpack_error():
        raise NotImplementedError(
            "Option to not extract tarballs is not yet supported."
        )

    num_servers = 0
    bad_servers = []
    no_asset_json = []
    alias = genome
    gat = [genome, asset, tag]
    if CFG_SERVERS_KEY not in self or self[CFG_SERVERS_KEY] is None:
        _LOGGER.error("You are not subscribed to any asset servers")
        return _null_return()

    good_servers = [
        s for s in self[CFG_SERVERS_KEY] if get_json_url(s, API_ID_DIGEST)
    ]

    _LOGGER.info(f"Compatible refgenieserver instances: {good_servers}")

    for server_url in good_servers:
        try:
            genome = self.get_genome_alias_digest(alias=alias)
        except yacman.UndefinedAliasError:
            _LOGGER.info(f"No local digest for genome alias: {genome}")
            if not self.set_genome_alias(
                genome=alias, servers=[server_url], create_genome=True
            ):
                continue
            genome = self.get_genome_alias_digest(alias=alias)

        num_servers += 1
        try:
            determined_tag = (
                send_data_request(
                    get_json_url(server_url, API_ID_DEFAULT_TAG).format(
                        genome=genome, asset=asset
                    )
                )
                if tag is None
                else tag
            )
        except DownloadJsonError as e:
            _LOGGER.warning(
                f"Could not retrieve tag from: {server_url}. Caught exception: {e}"
            )
            bad_servers.append(server_url)
            continue
        else:
            determined_tag = str(determined_tag)
            _LOGGER.debug(f"Determined tag: {determined_tag}")
            unpack or _raise_unpack_error()
        gat = [genome, asset, determined_tag]
        url_asset_attrs = get_json_url(server_url, API_ID_ASSET_ATTRS).format(
            genome=genome, asset=asset
        )
        url_genome_attrs = get_json_url(server_url, API_ID_GENOME_ATTRS).format(
            genome=genome
        )
        url_archive = get_json_url(server_url, API_ID_ARCHIVE).format(
            genome=genome, asset=asset
        )

        try:
            archive_data = send_data_request(
                url_asset_attrs, params={"tag": determined_tag}
            )
        except DownloadJsonError:
            no_asset_json.append(server_url)
            if num_servers == len(good_servers):
                _LOGGER.error(
                    f"'{genome}/{asset}:{determined_tag}' not "
                    f"available on any of the following servers: "
                    f"{', '.join(self[CFG_SERVERS_KEY])}"
                )
                return _null_return()
            continue
        else:
            _LOGGER.debug("Determined server URL: {}".format(server_url))
            genome_archive_data = send_data_request(url_genome_attrs)

        if sys.version_info[0] == 2:
            archive_data = asciify_json_dict(archive_data)

        # local directory that the asset data will be stored in
        tag_dir = os.path.dirname(self.filepath(*gat))
        # local target path for the saved archive
        tardir = os.path.join(self.data_dir, genome, asset)
        tarpath = os.path.join(tardir, asset + "__" + determined_tag + ".tgz")
        # check if the genome/asset:tag exists and get request user decision
        if os.path.exists(tag_dir):

            def preserve():
                _LOGGER.info(f"Preserving existing: {tag_dir}")
                return _null_return()

            if force is False:
                return preserve()
            elif force is None:
                if not query_yes_no(f"Replace existing ({tag_dir})?", "no"):
                    return preserve()
                else:
                    _LOGGER.debug(f"Overwriting: {tag_dir}")
            else:
                _LOGGER.debug(f"Overwriting: {tag_dir}")

        # check asset digests local-server match for each parent
        [
            self._chk_digest_if_avail(
                genome, x, archive_data[CFG_ASSET_CHECKSUM_KEY]
            )
            for x in archive_data[CFG_ASSET_PARENTS_KEY]
            if CFG_ASSET_PARENTS_KEY in archive_data
        ]

        bundle_name = "{}/{}:{}".format(*gat)
        archsize = archive_data[CFG_ARCHIVE_SIZE_KEY]
        _LOGGER.debug(f"'{bundle_name}' archive size: {archsize}")

        if not force_large and _is_large_archive(archsize, size_cutoff):
            if force_large is False:
                _LOGGER.info(
                    "Skipping pull of {}/{}:{}; size: {}".format(*gat, archsize)
                )
                return _null_return()
            if not query_yes_no(
                "This archive exceeds the size cutoff ({} > {:.1f}GB). "
                "Do you want to proceed?".format(archsize, size_cutoff)
            ):
                _LOGGER.info(
                    "Skipping pull of {}/{}:{}; size: {}".format(*gat, archsize)
                )
                return _null_return()

        if not os.path.exists(tardir):
            _LOGGER.debug(f"Creating directory: {tardir}")
            os.makedirs(tardir)

        # Download the file from `url` and save it locally under `filepath`:
        _LOGGER.info(f"Downloading URL: {url_archive}")
        try:
            signal.signal(signal.SIGINT, build_signal_handler(tarpath))
            _download_url_progress(
                url_archive, tarpath, bundle_name, params={"tag": determined_tag}
            )
        except HTTPError:
            _LOGGER.error(
                "Asset archive '{}/{}:{}' is missing on the "
                "server: {s}".format(*gat, s=server_url)
            )
            if server_url == self[CFG_SERVERS_KEY][-1]:
                # it this was the last server on the list, return
                return _null_return()
            else:
                _LOGGER.info("Trying next server")
                # set the tag value back to what user requested
                determined_tag = tag
                continue
        except ConnectionRefusedError as e:
            _LOGGER.error(str(e))
            _LOGGER.error(
                f"Server {server_url}/{API_VERSION} refused "
                f"download. Check your internet settings"
            )
            return _null_return()
        except ContentTooShortError as e:
            _LOGGER.error(str(e))
            _LOGGER.error(f"'{bundle_name}' download incomplete")
            return _null_return()
        else:
            _LOGGER.info(f"Download complete: {tarpath}")

        new_checksum = checksum(tarpath)
        old_checksum = archive_data and archive_data.get(CFG_ARCHIVE_CHECKSUM_KEY)
        if old_checksum and new_checksum != old_checksum:
            _LOGGER.error(
                f"Downloaded archive ('{tarpath}') checksum "
                f"mismatch: ({new_checksum}, {old_checksum})"
            )
            return _null_return()
        else:
            _LOGGER.debug(f"Matched checksum: '{old_checksum}'")
        # successfully downloaded tarball; untar it
        if unpack and tarpath.endswith(".tgz"):
            _LOGGER.info(f"Extracting asset tarball: {tarpath}")
            untar(tarpath, tardir)
            os.remove(tarpath)

        if self.file_path:
            with self as rgc:
                [
                    rgc.chk_digest_update_child(
                        gat[0], x, "{}/{}:{}".format(*gat), server_url
                    )
                    for x in archive_data[CFG_ASSET_PARENTS_KEY]
                    if CFG_ASSET_PARENTS_KEY in archive_data
                ]
                rgc.update_tags(
                    *gat,
                    data={
                        attr: archive_data[attr]
                        for attr in ATTRS_COPY_PULL
                        if attr in archive_data
                    },
                )
                rgc.set_default_pointer(*gat)
                rgc.update_genomes(genome=genome, data=genome_archive_data)
        else:
            [
                self.chk_digest_update_child(
                    gat[0], x, "{}/{}:{}".format(*gat), server_url
                )
                for x in archive_data[CFG_ASSET_PARENTS_KEY]
                if CFG_ASSET_PARENTS_KEY in archive_data
            ]
            self.update_tags(
                *gat,
                data={
                    attr: archive_data[attr]
                    for attr in ATTRS_COPY_PULL
                    if attr in archive_data
                },
            )
            self.set_default_pointer(*gat)
            self.update_genomes(genome=genome, data=genome_archive_data)
        if asset == "fasta":
            self.initialize_genome(
                fasta_path=self.seek_src(*gat), alias=alias, fasta_unzipped=True
            )
        self.run_plugins(POST_PULL_HOOK)
        self._symlink_alias(*gat)
        return gat, archive_data, server_url

remove

remove(genome, asset, tag=None, relationships=True, files=True, force=False)

Remove data associated with a specified genome:asset:tag combination. If no tags are specified, the entire asset is removed from the genome.

If no more tags are defined for the selected genome:asset after tag removal, the parent asset will be removed as well If no more assets are defined for the selected genome after asset removal, the parent genome will be removed as well

Parameters:

Name Type Description Default
genome str

genome to be removed

required
asset str

asset package to be removed

required
tag str

tag to be removed

None
relationships bool

whether the asset being removed should be removed from its relatives as well

True
files bool

whether the asset files from disk should be removed

True
force bool

whether the removal prompts should be skipped

False

Returns:

Type Description

updated object

Raises:

Type Description
TypeError

if genome argument type is not a list or str

Source code in refgenconf/refgenconf.py
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
def remove(
    self, genome, asset, tag=None, relationships=True, files=True, force=False
):
    """
    Remove data associated with a specified genome:asset:tag combination.
    If no tags are specified, the entire asset is removed from the genome.

    If no more tags are defined for the selected genome:asset after tag removal,
    the parent asset will be removed as well
    If no more assets are defined for the selected genome after asset removal,
    the parent genome will be removed as well

    :param str genome: genome to be removed
    :param str asset: asset package to be removed
    :param str tag: tag to be removed
    :param bool relationships: whether the asset being removed should
        be removed from its relatives as well
    :param bool files: whether the asset files from disk should be removed
    :param bool force: whether the removal prompts should be skipped
    :raise TypeError: if genome argument type is not a list or str
    :return RefGenConf: updated object
    """
    tag = tag or self.get_default_tag(genome, asset, use_existing=False)
    if files:
        req_dict = {
            "genome": self.get_genome_alias_digest(genome, fallback=True),
            "asset": asset,
            "tag": tag,
        }
        _LOGGER.debug("Attempting removal: {}".format(req_dict))
        if not force and not query_yes_no(
            "Remove '{}/{}:{}'?".format(genome, asset, tag)
        ):
            _LOGGER.info("Action aborted by the user")
            return
        removed = []
        asset_path = self.seek_src(
            genome, asset, tag, enclosing_dir=True, strict_exists=False
        )
        alias_asset_paths = self.seek(
            genome,
            asset,
            tag,
            enclosing_dir=True,
            strict_exists=False,
            all_aliases=True,
        )
        if os.path.exists(asset_path):
            removed.append(_remove(asset_path))
            removed.extend([_remove(p) for p in alias_asset_paths])
            if self.file_path:
                with self as r:
                    r.cfg_remove_assets(genome, asset, tag, relationships)
            else:
                self.cfg_remove_assets(genome, asset, tag, relationships)
        else:
            _LOGGER.warning(
                "Selected asset does not exist on disk ({}). "
                "Removing from genome config.".format(asset_path)
            )
            if self.file_path:
                with self as r:
                    r.cfg_remove_assets(genome, asset, tag, relationships)
                    return
            else:
                self.cfg_remove_assets(genome, asset, tag, relationships)
                return
        try:
            self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset]
        except (KeyError, TypeError):
            asset_dir = os.path.abspath(os.path.join(asset_path, os.path.pardir))
            alias_asset_dirs = [
                os.path.abspath(os.path.join(p, os.path.pardir))
                for p in alias_asset_paths
            ]
            _entity_dir_removal_log(asset_dir, "asset", req_dict, removed)
            removed.extend([_remove(p) for p in alias_asset_dirs])
            try:
                self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY]
            except (KeyError, TypeError):
                genome_dir = os.path.abspath(
                    os.path.join(asset_dir, os.path.pardir)
                )
                alias_genome_dirs = [
                    os.path.abspath(os.path.join(p, os.path.pardir))
                    for p in alias_asset_dirs
                ]
                _entity_dir_removal_log(genome_dir, "genome", req_dict, removed)
                removed.extend([_remove(p) for p in alias_genome_dirs])
                try:
                    if self.file_path:
                        with self as r:
                            del r[CFG_GENOMES_KEY][genome]
                    else:
                        del self[CFG_GENOMES_KEY][genome]
                except (KeyError, TypeError):
                    _LOGGER.debug(
                        "Could not remove genome '{}' from the config; it "
                        "does not exist".format(genome)
                    )
        _LOGGER.info(f"Successfully removed entities:{block_iter_repr(removed)}")
    else:
        if self.file_path:
            with self as r:
                r.cfg_remove_assets(genome, asset, tag, relationships)
        else:
            self.cfg_remove_assets(genome, asset, tag, relationships)

remove_asset_from_relatives

remove_asset_from_relatives(genome, asset, tag)

Remove any relationship links associated with the selected asset

Parameters:

Name Type Description Default
genome str

genome to be removed from its relatives' relatives list

required
asset str

asset to be removed from its relatives' relatives list

required
tag str

tag to be removed from its relatives' relatives list

required
Source code in refgenconf/refgenconf.py
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
def remove_asset_from_relatives(self, genome, asset, tag):
    """
    Remove any relationship links associated with the selected asset

    :param str genome: genome to be removed from its relatives' relatives list
    :param str asset: asset to be removed from its relatives' relatives list
    :param str tag: tag to be removed from its relatives' relatives list
    """
    to_remove = "{}/{}:{}".format(
        self.get_genome_alias_digest(alias=genome, fallback=True), asset, tag
    )
    for rel_type in CFG_ASSET_RELATIVES_KEYS:
        tmp = CFG_ASSET_RELATIVES_KEYS[
            len(CFG_ASSET_RELATIVES_KEYS)
            - 1
            - CFG_ASSET_RELATIVES_KEYS.index(rel_type)
        ]
        tag_data = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
            CFG_ASSET_TAGS_KEY
        ][tag]
        if rel_type not in tag_data:
            continue
        for rel in tag_data[rel_type]:
            parsed = prp(rel)
            _LOGGER.debug("Removing '{}' from '{}' {}".format(to_remove, rel, tmp))
            try:
                self[CFG_GENOMES_KEY][parsed["namespace"] or genome][
                    CFG_ASSETS_KEY
                ][parsed["item"]][CFG_ASSET_TAGS_KEY][parsed["tag"]][tmp].remove(
                    to_remove
                )
            except (KeyError, ValueError):
                pass

remove_genome_aliases

remove_genome_aliases(digest, aliases=None)

Remove alias for a specified genome digest. This method will remove the digest both from the genomes object and from the aliases mapping in tbe config

Parameters:

Name Type Description Default
digest str

genome digest to remove an alias for

required
aliases list[str]

a collection to aliases to remove for the genome. If not provided, all aliases for the digest will be remove

None

Returns:

Type Description

whether the removal has been performed

Source code in refgenconf/refgenconf.py
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
def remove_genome_aliases(self, digest, aliases=None):
    """
    Remove alias for a specified genome digest. This method will remove the
    digest both from the genomes object and from the aliases mapping
    in tbe config

    :param str digest: genome digest to remove an alias for
    :param list[str] aliases: a collection to aliases to remove for the
        genome. If not provided, all aliases for the digest will be remove
    :return bool: whether the removal has been performed
    """

    def _check_and_remove_alias(rgc, d, a):
        """
        Remove genome alias only if the alias can be remove successfully and
        genome exists
        """
        if rgc[CFG_GENOMES_KEY]:
            rmd = rgc[CFG_GENOMES_KEY].remove_aliases(key=d, aliases=a)
            if not rmd:
                return rmd
            try:
                rgc[CFG_GENOMES_KEY][d][CFG_ALIASES_KEY] = rgc[
                    CFG_GENOMES_KEY
                ].get_aliases(d)
            except KeyError:
                return []
            except yacman.UndefinedAliasError:
                rgc[CFG_GENOMES_KEY][d][CFG_ALIASES_KEY] = []
            return rmd

    # get the symlink mapping before the removal for _remove_symlink_alias
    symlink_mapping = self.get_symlink_paths(genome=digest, all_aliases=True)
    if self.file_path:
        with self as r:
            removed_aliases = _check_and_remove_alias(r, digest, aliases)
    else:
        removed_aliases = _check_and_remove_alias(self, digest, aliases)
    if not removed_aliases:
        return [], []
    self._remove_symlink_alias(symlink_mapping, removed_aliases)
    return removed_aliases

run_plugins

run_plugins(hook)

Runs all installed plugins for the specified hook.

Parameters:

Name Type Description Default
hook str

hook identifier

required
Source code in refgenconf/refgenconf.py
2643
2644
2645
2646
2647
2648
2649
2650
2651
def run_plugins(self, hook):
    """
    Runs all installed plugins for the specified hook.

    :param str hook: hook identifier
    """
    for name, func in self.plugins[hook].items():
        _LOGGER.debug("Running {} plugin: {}".format(hook, name))
        func(self)

seek

seek(genome_name, asset_name, tag_name=None, seek_key=None, strict_exists=None, enclosing_dir=False, all_aliases=False, check_exist=lambda p: os.path.exists(p) or is_url(p))

Seek path to a specified genome-asset-tag alias

Parameters:

Name Type Description Default
genome_name str

name of a reference genome assembly of interest

required
asset_name str

name of the particular asset to fetch

required
tag_name str

name of the particular asset tag to fetch

None
seek_key str

name of the particular subasset to fetch

None
strict_exists

how to handle case in which path doesn't exist; True to raise IOError, False to raise RuntimeWarning, and None to do nothing at all. Default: None (do not check).

None
check_exist

how to check for asset/path existence

lambda p: exists(p) or is_url(p)
enclosing_dir bool

whether a path to the entire enclosing directory should be returned, e.g. for a fasta asset that has 3 seek_keys pointing to 3 files in an asset dir, that asset dir is returned

False
all_aliases bool

whether to return paths to all asset aliases or just the one for the specified 'genome_name` argument

False

Returns:

Type Description

path to the asset

Raises:

Type Description
TypeError

if the existence check is not a one-arg function

refgenconf.MissingGenomeError

if the named assembly isn't known to this configuration instance

refgenconf.MissingAssetError

if the names assembly is known to this configuration instance, but the requested asset is unknown

Source code in refgenconf/refgenconf.py
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
def seek(
    self,
    genome_name,
    asset_name,
    tag_name=None,
    seek_key=None,
    strict_exists=None,
    enclosing_dir=False,
    all_aliases=False,
    check_exist=lambda p: os.path.exists(p) or is_url(p),
):
    """
    Seek path to a specified genome-asset-tag alias

    :param str genome_name: name of a reference genome assembly of interest
    :param str asset_name: name of the particular asset to fetch
    :param str tag_name: name of the particular asset tag to fetch
    :param str seek_key: name of the particular subasset to fetch
    :param bool | NoneType strict_exists: how to handle case in which
        path doesn't exist; True to raise IOError, False to raise
        RuntimeWarning, and None to do nothing at all.
        Default: None (do not check).
    :param function(callable) -> bool check_exist: how to check for
        asset/path existence
    :param bool enclosing_dir: whether a path to the entire enclosing
        directory should be returned, e.g. for a fasta asset that has 3
        seek_keys pointing to 3 files in an asset dir, that asset dir
        is returned
    :param bool all_aliases: whether to return paths to all asset aliases or
        just the one for the specified 'genome_name` argument
    :return str: path to the asset
    :raise TypeError: if the existence check is not a one-arg function
    :raise refgenconf.MissingGenomeError: if the named assembly isn't known
        to this configuration instance
    :raise refgenconf.MissingAssetError: if the names assembly is known to
        this configuration instance, but the requested asset is unknown
    """
    tag_name = tag_name or self.get_default_tag(genome_name, asset_name)
    try:
        genome_digest = self.get_genome_alias_digest(genome_name, fallback=True)
    except yacman.UndefinedAliasError:
        raise MissingGenomeError(f"Your genomes do not include '{genome_name}'")
    genome_ids = _make_list_of_str(
        self.get_genome_alias(genome_digest, fallback=True, all_aliases=True)
    )
    idx = 0
    if genome_name in genome_ids:
        idx = genome_ids.index(genome_name)
    self._assert_gat_exists(genome_name, asset_name, tag_name)
    asset_tag_data = self[CFG_GENOMES_KEY][genome_name][CFG_ASSETS_KEY][asset_name][
        CFG_ASSET_TAGS_KEY
    ][tag_name]
    if not seek_key:
        if asset_name in asset_tag_data[CFG_SEEK_KEYS_KEY]:
            seek_val = asset_tag_data[CFG_SEEK_KEYS_KEY][asset_name]
        else:
            seek_val = ""
    else:
        try:
            seek_val = asset_tag_data[CFG_SEEK_KEYS_KEY][seek_key]
        except KeyError:
            if seek_key == "dir":
                seek_val = "."
            else:
                raise MissingSeekKeyError(
                    f"Seek key '{seek_key}' not defined for: "
                    f"'{genome_name}.{asset_name}:{tag_name}'"
                )
    if enclosing_dir:
        seek_val = ""
    fullpath = os.path.join(
        self.alias_dir, genome_digest, asset_name, tag_name, seek_val
    )
    fullpaths = [fullpath.replace(genome_digest, gid) for gid in genome_ids]
    paths_existence = [check_exist(fp) for fp in fullpaths]
    if all(paths_existence):
        return fullpaths if all_aliases else fullpaths[idx]
    nonexistent_pths = [
        fullpaths[p] for p in [i for i, x in enumerate(paths_existence) if not x]
    ]
    msg = "For genome '{}' path to the asset '{}/{}:{}' doesn't exist: {}".format(
        genome_name,
        genome_name,
        asset_name,
        seek_key,
        tag_name,
        ", ".join(nonexistent_pths),
    )
    if strict_exists is None:
        _LOGGER.debug(msg)
    elif strict_exists is True:
        raise OSError(msg)
    else:
        warnings.warn(msg, RuntimeWarning)
    return fullpaths if all_aliases else fullpaths[idx]

seek_src

seek_src(genome_name, asset_name, tag_name=None, seek_key=None, strict_exists=None, enclosing_dir=False, check_exist=lambda p: os.path.exists(p) or is_url(p))

Seek path to a specified genome-asset-tag

Parameters:

Name Type Description Default
genome_name str

name of a reference genome assembly of interest

required
asset_name str

name of the particular asset to fetch

required
tag_name str

name of the particular asset tag to fetch

None
seek_key str

name of the particular subasset to fetch

None
strict_exists

how to handle case in which path doesn't exist; True to raise IOError, False to raise RuntimeWarning, and None to do nothing at all. Default: None (do not check).

None
check_exist

how to check for asset/path existence

lambda p: exists(p) or is_url(p)
enclosing_dir bool

whether a path to the entire enclosing directory should be returned, e.g. for a fasta asset that has 3 seek_keys pointing to 3 files in an asset dir, that asset dir is returned

False

Returns:

Type Description

path to the asset

Raises:

Type Description
TypeError

if the existence check is not a one-arg function

refgenconf.MissingGenomeError

if the named assembly isn't known to this configuration instance

refgenconf.MissingAssetError

if the names assembly is known to this configuration instance, but the requested asset is unknown

Source code in refgenconf/refgenconf.py
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
def seek_src(
    self,
    genome_name,
    asset_name,
    tag_name=None,
    seek_key=None,
    strict_exists=None,
    enclosing_dir=False,
    check_exist=lambda p: os.path.exists(p) or is_url(p),
):
    """
    Seek path to a specified genome-asset-tag

    :param str genome_name: name of a reference genome assembly of interest
    :param str asset_name: name of the particular asset to fetch
    :param str tag_name: name of the particular asset tag to fetch
    :param str seek_key: name of the particular subasset to fetch
    :param bool | NoneType strict_exists: how to handle case in which
        path doesn't exist; True to raise IOError, False to raise
        RuntimeWarning, and None to do nothing at all.
        Default: None (do not check).
    :param function(callable) -> bool check_exist: how to check for
        asset/path existence
    :param bool enclosing_dir: whether a path to the entire enclosing
        directory should be returned, e.g. for a fasta asset that has 3
        seek_keys pointing to 3 files in an asset dir, that asset dir
        is returned
    :return str: path to the asset
    :raise TypeError: if the existence check is not a one-arg function
    :raise refgenconf.MissingGenomeError: if the named assembly isn't known
        to this configuration instance
    :raise refgenconf.MissingAssetError: if the names assembly is known to
        this configuration instance, but the requested asset is unknown
    """
    tag_name = tag_name or self.get_default_tag(genome_name, asset_name)
    _LOGGER.debug(
        "getting asset: '{}/{}.{}:{}'".format(
            genome_name, asset_name, seek_key, tag_name
        )
    )
    if not callable(check_exist) or len(finspect(check_exist).args) != 1:
        raise TypeError("Asset existence check must be a one-arg function.")
    # 3 'path' key options supported
    # option1: absolute path
    # get just the saute path value from the config
    path_val = _genome_asset_path(
        self[CFG_GENOMES_KEY],
        genome_name,
        asset_name,
        tag_name,
        enclosing_dir=True,
        no_tag=True,
        seek_key=None,
    )
    _LOGGER.debug("Trying absolute path: {}".format(path_val))
    if seek_key:
        path = os.path.join(path_val, seek_key)
    else:
        path = path_val
    if os.path.isabs(path) and check_exist(path):
        return path
    genome_name = self.get_genome_alias_digest(genome_name, fallback=True)
    # option2: relative to genome_folder/{genome} (default, canonical)
    path = _genome_asset_path(
        self[CFG_GENOMES_KEY],
        genome_name,
        asset_name,
        tag_name,
        seek_key,
        enclosing_dir,
    )
    fullpath = os.path.join(self.data_dir, genome_name, path)
    _LOGGER.debug(
        "Trying relative to genome_folder/genome/_data ({}/{}/{}): {}".format(
            self[CFG_FOLDER_KEY], genome_name, DATA_DIR, fullpath
        )
    )
    if check_exist(fullpath):
        return fullpath
    # option3: relative to the genome_folder (if option2 does not exist)
    gf_relpath = os.path.join(
        self[CFG_FOLDER_KEY],
        _genome_asset_path(
            self[CFG_GENOMES_KEY],
            genome_name,
            asset_name,
            tag_name,
            seek_key,
            enclosing_dir,
            no_tag=True,
        ),
    )
    _LOGGER.debug(
        "Trying path relative to genome_folder ({}): {}".format(
            self[CFG_FOLDER_KEY], gf_relpath
        )
    )
    if check_exist(gf_relpath):
        return gf_relpath

    msg = "For genome '{}' the asset '{}.{}:{}' doesn't exist; tried: {}".format(
        genome_name,
        asset_name,
        seek_key,
        tag_name,
        ", ".join([path, gf_relpath, fullpath]),
    )
    # return option2 if existence not enforced
    if strict_exists is None:
        _LOGGER.debug(msg)
    elif strict_exists is True:
        raise OSError(msg)
    else:
        warnings.warn(msg, RuntimeWarning)
    return fullpath

seekr

seekr(genome_name, asset_name, tag_name=None, seek_key=None, remote_class='http', get_url=lambda server, id: construct_request_url(server, id))

Seek a remote path to a specified genome/asset.seek_key:tag

Parameters:

Name Type Description Default
genome_name str

name of a reference genome assembly of interest

required
asset_name str

name of the particular asset to fetch

required
tag_name str

name of the particular asset tag to fetch

None
seek_key str

name of the particular subasset to fetch

None
remote_class str

remote data provider class, e.g. 'http' or 's3'

'http'
get_url

how to determine URL request, given server URL and endpoint operationID

lambda server, id: construct_request_url(server, id)

Returns:

Type Description

path to the asset

Source code in refgenconf/refgenconf.py
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
def seekr(
    self,
    genome_name,
    asset_name,
    tag_name=None,
    seek_key=None,
    remote_class="http",
    get_url=lambda server, id: construct_request_url(server, id),
):
    """
    Seek a remote path to a specified genome/asset.seek_key:tag

    :param str genome_name: name of a reference genome assembly of interest
    :param str asset_name: name of the particular asset to fetch
    :param str tag_name: name of the particular asset tag to fetch
    :param str seek_key: name of the particular subasset to fetch
    :param str remote_class: remote data provider class, e.g. 'http' or 's3'
    :param function(serverUrl, operationId) -> str get_url: how to determine
        URL request, given server URL and endpoint operationID
    :return str: path to the asset
    """
    good_servers = [
        s for s in self[CFG_SERVERS_KEY] if get_url(s, API_ID_ASSET_PATH)
    ]
    _LOGGER.debug(f"Compatible refgenieserver instances: {good_servers}")
    for url in good_servers:
        try:
            genome_digest = self.get_genome_alias_digest(alias=genome_name)
        except yacman.UndefinedAliasError:
            _LOGGER.info(f"No local digest for genome alias: {genome_name}")
            if not self.set_genome_alias(
                genome=genome_name, servers=[url], create_genome=True
            ):
                continue
            genome_digest = self.get_genome_alias_digest(alias=genome_name)

        asset_seek_key_url = get_url(url, API_ID_ASSET_PATH).format(
            genome=genome_digest, asset=asset_name, seek_key=seek_key or asset_name
        )
        if asset_seek_key_url is None:
            continue
        asset_seek_key_target = send_data_request(
            asset_seek_key_url,
            params={"tag": tag_name, "remoteClass": remote_class},
        )
        return asset_seek_key_target

set_default_pointer

set_default_pointer(genome, asset, tag, force_exists=False, force_digest=None, force_fasta=False)

Point to the selected tag by default

Parameters:

Name Type Description Default
genome str

name of a reference genome assembly of interest

required
asset str

name of the particular asset of interest

required
tag str

name of the particular asset tag to point to by default

required
force_digest str

digest to force update of. The alias will not be converted to the digest, even if provided.

None
force_fasta bool

whether setting a default tag for a fasta asset should be forced. Beware: This could lead to genome identity issues

False
force_exists bool

whether the default tag change should be forced (even if it exists)

False
Source code in refgenconf/refgenconf.py
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
def set_default_pointer(
    self,
    genome,
    asset,
    tag,
    force_exists=False,
    force_digest=None,
    force_fasta=False,
):
    """
    Point to the selected tag by default

    :param str genome: name of a reference genome assembly of interest
    :param str asset: name of the particular asset of interest
    :param str tag: name of the particular asset tag to point to by default
    :param str force_digest: digest to force update of. The alias will
        not be converted to the digest, even if provided.
    :param bool force_fasta: whether setting a default tag for a fasta asset
        should be forced. Beware: This could lead to genome identity issues
    :param bool force_exists: whether the default tag change should be
        forced (even if it exists)
    """
    self._assert_gat_exists(genome, asset, tag)
    asset_dict = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset]
    if (
        CFG_ASSET_DEFAULT_TAG_KEY in asset_dict
        and len(asset_dict[CFG_ASSET_DEFAULT_TAG_KEY]) > 0
    ):
        if not force_exists:
            return
        if asset == "fasta" and not force_fasta:
            raise NotImplementedError(
                "Can't change the default tag for fasta assets, "
                "this would lead to genome identity issues"
            )
    self.update_assets(
        genome, asset, {CFG_ASSET_DEFAULT_TAG_KEY: tag}, force_digest=force_digest
    )
    _LOGGER.info(f"Default tag for '{genome}/{asset}' set to: {tag}")

set_genome_alias

set_genome_alias(genome, digest=None, servers=None, overwrite=False, reset_digest=False, create_genome=False, no_write=False, get_json_url=lambda server: construct_request_url(server, API_ID_ALIAS_DIGEST))

Assign a human-readable alias to a genome identifier.

Genomes are identified by a unique identifier which is derived from the FASTA file (part of fasta asset). This way we can ensure genome provenance and compatibility with the server. This function maps a human-readable identifier to make referring to the genomes easier.

Parameters:

Name Type Description Default
genome str

name of the genome to assign to an identifier

required
digest str

identifier to use

None
overwrite bool

whether all the previously set aliases should be removed and just the current one stored

False
no_write bool

whether to skip writing the alias to the file

False

Returns:

Type Description

whether the alias has been established

Source code in refgenconf/refgenconf.py
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
def set_genome_alias(
    self,
    genome,
    digest=None,
    servers=None,
    overwrite=False,
    reset_digest=False,
    create_genome=False,
    no_write=False,
    get_json_url=lambda server: construct_request_url(server, API_ID_ALIAS_DIGEST),
):
    """
    Assign a human-readable alias to a genome identifier.

    Genomes are identified by a unique identifier which is derived from the
    FASTA file (part of fasta asset). This way we can ensure genome
    provenance and compatibility with the server. This function maps a
    human-readable identifier to make referring to the genomes easier.

    :param str genome: name of the genome to assign to an identifier
    :param str digest: identifier to use
    :param bool overwrite: whether all the previously set aliases should be
        removed and just the current one stored
    :param bool no_write: whether to skip writing the alias to the file
    :return bool: whether the alias has been established
    """

    def _check_and_set_alias(rgc, d, a, create=False):
        """
        Set genome alias only if the key alias can be set successfully and
        genome exists or genome creation is forced
        """
        try:
            _assert_gat_exists(rgc[CFG_GENOMES_KEY], gname=digest)
        except MissingGenomeError:
            if not create:
                raise
            rgc[CFG_GENOMES_KEY][d] = PXAM()

        sa, ra = rgc[CFG_GENOMES_KEY].set_aliases(
            aliases=a, key=d, overwrite=overwrite, reset_key=reset_digest
        )
        try:
            rgc[CFG_GENOMES_KEY][d][CFG_ALIASES_KEY] = rgc[
                CFG_GENOMES_KEY
            ].get_aliases(d)
        except KeyError:
            return [], []
        _LOGGER.info(
            f"Set genome alias ({d}: {', '.join(a) if isinstance(a, list) else a})"
        )
        return sa, ra

    if not digest:
        if isinstance(genome, list):
            if len(genome) > 1:
                raise NotImplementedError("Can look up just one digest at a time")
            else:
                genome = genome[0]
        cnt = 0
        if servers is None:
            servers = self[CFG_SERVERS_KEY]
        for server in servers:
            cnt += 1
            url_alias_template = get_json_url(server=server)
            if url_alias_template is None:
                continue
            url_alias = url_alias_template.format(alias=genome)
            _LOGGER.info(f"Setting '{genome}' identity with server: {url_alias}")
            try:
                digest = send_data_request(url_alias)
            except DownloadJsonError:
                if cnt == len(servers):
                    _LOGGER.error(
                        f"Genome '{genome}' not available on any of the "
                        f"following servers: {', '.join(servers)}"
                    )
                    return False
                continue
            _LOGGER.info(f"Determined digest for local '{genome}' alias: {digest}")
            break

    # get the symlink mapping before the removal for _remove_symlink_alias
    symlink_mapping = self.get_symlink_paths(genome=digest, all_aliases=True)
    if self.file_path and not no_write:
        with self as r:
            set_aliases, removed_aliases = _check_and_set_alias(
                rgc=r, d=digest, a=genome, create=create_genome
            )
        self._remove_symlink_alias(symlink_mapping, removed_aliases)
        self._symlink_alias(genome=digest)
    else:
        set_aliases, removed_aliases = _check_and_set_alias(
            rgc=self, d=digest, a=genome, create=create_genome
        )
    if not set_aliases:
        return False
    return True

subscribe

subscribe(urls, reset=False, no_write=False)

Add URLs the list of genome_servers.

Use reset argument to overwrite the current list. Otherwise the current one will be appended to.

Parameters:

Name Type Description Default
urls

urls to update the genome_servers list with

required
reset bool

whether the current list should be overwritten

False
Source code in refgenconf/refgenconf.py
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
def subscribe(self, urls, reset=False, no_write=False):
    """
    Add URLs the list of genome_servers.

    Use reset argument to overwrite the current list.
    Otherwise the current one will be appended to.

    :param list[str] | str urls: urls to update the genome_servers list with
    :param bool reset: whether the current list should be overwritten
    """
    if self.file_path and not no_write:
        with self as r:
            r._update_genome_servers(url=urls, reset=reset)
    else:
        self._update_genome_servers(url=urls, reset=reset)
    _LOGGER.info(f"Subscribed to: {', '.join(urls)}")

tag

tag(genome, asset, tag, new_tag, files=True, force=False)

Retags the asset selected by the tag with the new_tag. Prompts if default already exists and overrides upon confirmation.

This method does not override the original asset entry in the RefGenConf object. It creates its copy and tags it with the new_tag. Additionally, if the retagged asset has any children their parent will be retagged as new_tag that was introduced upon this method execution. By default, the files on disk will be also renamed to reflect the genome configuration file changes

Parameters:

Name Type Description Default
genome str

name of a reference genome assembly of interest

required
asset str

name of particular asset of interest

required
tag str

name of the tag that identifies the asset of interest

required
new_tag str

name of particular the new tag

required
files bool

whether the asset files on disk should be renamed

True

Returns:

Type Description

a logical indicating whether the tagging was successful

Raises:

Type Description
ValueError

when the original tag is not specified

Source code in refgenconf/refgenconf.py
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
def tag(self, genome, asset, tag, new_tag, files=True, force=False):
    """
    Retags the asset selected by the tag with the new_tag.
    Prompts if default already exists and overrides upon confirmation.

    This method does not override the original asset entry in the RefGenConf
    object. It creates its copy and tags it with the new_tag.
    Additionally, if the retagged asset has any children their parent will
    be retagged as new_tag that was introduced upon this method execution.
    By default, the files on disk will be also renamed to reflect the
    genome configuration file changes

    :param str genome: name of a reference genome assembly of interest
    :param str asset: name of particular asset of interest
    :param str tag: name of the tag that identifies the asset of interest
    :param str new_tag: name of particular the new tag
    :param bool files: whether the asset files on disk should be renamed
    :raise ValueError: when the original tag is not specified
    :return bool: a logical indicating whether the tagging was successful
    """
    if any([c in new_tag for c in TAG_NAME_BANNED_CHARS]):
        raise ValueError(
            f"The tag name can't consist of characters: {TAG_NAME_BANNED_CHARS}"
        )
    self.run_plugins(PRE_TAG_HOOK)
    ori_path = self.seek_src(
        genome, asset, tag, enclosing_dir=True, strict_exists=True
    )
    alias_ori_path = self.seek(
        genome, asset, tag, enclosing_dir=True, strict_exists=True
    )
    new_path = os.path.abspath(os.path.join(ori_path, os.pardir, new_tag))
    if self.file_path:
        with self as r:
            if not r.cfg_tag_asset(genome, asset, tag, new_tag, force):
                sys.exit(0)
    else:
        if not self.cfg_tag_asset(genome, asset, tag, new_tag, force):
            sys.exit(0)
    if not files:
        self.run_plugins(POST_TAG_HOOK)
        return
    try:
        if os.path.exists(new_path):
            _remove(new_path)
        os.rename(ori_path, new_path)
        _LOGGER.info("Renamed directory: {}".format(new_path))
        self._symlink_alias(genome, asset, new_tag)
        _remove(alias_ori_path)
    except FileNotFoundError:
        _LOGGER.warning(
            "Could not rename original asset tag directory '{}'"
            " to the new one '{}'".format(ori_path, new_path)
        )
    else:
        if self.file_path:
            with self as r:
                r.cfg_remove_assets(genome, asset, tag, relationships=False)
        else:
            self.cfg_remove_assets(genome, asset, tag, relationships=False)
        _LOGGER.debug(
            "Asset '{}/{}' tagged with '{}' has been removed from"
            " the genome config".format(genome, asset, tag)
        )
        _LOGGER.debug(
            "Original asset has been moved from '{}' to '{}'".format(
                ori_path, new_path
            )
        )
    self.run_plugins(POST_TAG_HOOK)

unsubscribe

unsubscribe(urls, no_write=False)

Remove URLs the list of genome_servers.

Parameters:

Name Type Description Default
urls

urls to update the genome_servers list with

required
Source code in refgenconf/refgenconf.py
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
def unsubscribe(self, urls, no_write=False):
    """
    Remove URLs the list of genome_servers.

    :param list[str] | str urls: urls to update the genome_servers list with
    """
    unsub_list = []
    ori_servers = self[CFG_SERVERS_KEY]
    for s in urls:
        try:
            ori_servers.remove(s)
            unsub_list.append(s)
        except ValueError:
            _LOGGER.warning(
                "URL '{}' not in genome_servers list: {}".format(s, ori_servers)
            )
    if self.file_path and not no_write:
        with self as r:
            r._update_genome_servers(ori_servers, reset=True)
    else:
        self._update_genome_servers(ori_servers, reset=True)
    if unsub_list:
        _LOGGER.info("Unsubscribed from: {}".format(", ".join(unsub_list)))

update_assets

update_assets(genome, asset=None, data=None, force_digest=None)

Updates the genomes in RefGenConf object at any level. If a requested genome-asset mapping is missing, it will be created

Parameters:

Name Type Description Default
genome str

genome to be added/updated

required
asset str

asset to be added/updated

None
force_digest str

digest to force update of. The alias will not be converted to the digest, even if provided.

None
data Mapping

data to be added/updated

None

Returns:

Type Description

updated object

Source code in refgenconf/refgenconf.py
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
def update_assets(self, genome, asset=None, data=None, force_digest=None):
    """
    Updates the genomes in RefGenConf object at any level.
    If a requested genome-asset mapping is missing, it will be created

    :param str genome: genome to be added/updated
    :param str asset: asset to be added/updated
    :param str force_digest: digest to force update of. The alias will
        not be converted to the digest, even if provided.
    :param Mapping data: data to be added/updated
    :return RefGenConf: updated object
    """
    if _check_insert_data(genome, str, "genome"):
        genome = force_digest or self.get_genome_alias_digest(
            alias=genome, fallback=True
        )
        _safe_setdef(self[CFG_GENOMES_KEY], genome, PXAM())
        if _check_insert_data(asset, str, "asset"):
            _safe_setdef(self[CFG_GENOMES_KEY][genome], CFG_ASSETS_KEY, PXAM())
            _safe_setdef(
                self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY], asset, PXAM()
            )
            if _check_insert_data(data, Mapping, "data"):
                self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset].update(data)
    return self

update_genomes

update_genomes(genome, data=None, force_digest=None)

Updates the genomes in RefGenConf object at any level. If a requested genome is missing, it will be added

Parameters:

Name Type Description Default
genome str

genome to be added/updated

required
force_digest str

digest to force update of. The alias will not be converted to the digest, even if provided.

None
data Mapping

data to be added/updated

None

Returns:

Type Description

updated object

Source code in refgenconf/refgenconf.py
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
def update_genomes(self, genome, data=None, force_digest=None):
    """
    Updates the genomes in RefGenConf object at any level.
    If a requested genome is missing, it will be added

    :param str genome: genome to be added/updated
    :param str force_digest: digest to force update of. The alias will
        not be converted to the digest, even if provided.
    :param Mapping data: data to be added/updated
    :return RefGenConf: updated object
    """
    if _check_insert_data(genome, str, "genome"):
        genome = force_digest or self.get_genome_alias_digest(
            alias=genome, fallback=True
        )
        _safe_setdef(self[CFG_GENOMES_KEY], genome, PXAM({CFG_ASSETS_KEY: PXAM()}))
        if _check_insert_data(data, Mapping, "data"):
            self[CFG_GENOMES_KEY][genome].update(data)
    return self

update_relatives_assets

update_relatives_assets(genome, asset, tag=None, data=None, children=False)

A convenience method which wraps the update assets and uses it to update the asset relatives of an asset.

Parameters:

Name Type Description Default
genome str

genome to be added/updated

required
asset str

asset to be added/updated

required
tag str

tag to be added/updated

None
data list

asset parents or children to be added/updated

None
children bool

a logical indicating whether the relationship to be added is 'children'

False

Returns:

Type Description

updated object

Source code in refgenconf/refgenconf.py
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
def update_relatives_assets(
    self, genome, asset, tag=None, data=None, children=False
):
    """
    A convenience method which wraps the update assets and uses it to update the
    asset relatives of an asset.

    :param str genome: genome to be added/updated
    :param str asset: asset to be added/updated
    :param str tag: tag to be added/updated
    :param list data: asset parents or children to be added/updated
    :param bool children: a logical indicating whether the relationship to be
        added is 'children'
    :return RefGenConf: updated object
    """
    tag = tag or self.get_default_tag(genome, asset)
    relationship = CFG_ASSET_CHILDREN_KEY if children else CFG_ASSET_PARENTS_KEY
    if _check_insert_data(data, list, "data"):
        # creates/asserts the genome/asset:tag combination
        self.update_tags(genome, asset, tag)
        tag_data = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
            CFG_ASSET_TAGS_KEY
        ][tag]
        tag_data.setdefault(relationship, list())
        tag_data[relationship] = _extend_unique(
            tag_data[relationship],
            data,
        )

update_seek_keys

update_seek_keys(genome, asset, tag=None, keys=None, force_digest=None)

A convenience method which wraps the updated assets and uses it to update the seek keys for a tagged asset.

Parameters:

Name Type Description Default
genome str

genome to be added/updated

required
asset str

asset to be added/updated

required
tag str

tag to be added/updated

None
force_digest str

digest to force update of. The alias will not be converted to the digest, even if provided.

None
keys Mapping

seek_keys to be added/updated

None

Returns:

Type Description

updated object

Source code in refgenconf/refgenconf.py
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
def update_seek_keys(self, genome, asset, tag=None, keys=None, force_digest=None):
    """
    A convenience method which wraps the updated assets and uses it to
    update the seek keys for a tagged asset.

    :param str genome: genome to be added/updated
    :param str asset: asset to be added/updated
    :param str tag: tag to be added/updated
    :param str force_digest: digest to force update of. The alias will
        not be converted to the digest, even if provided.
    :param Mapping keys: seek_keys to be added/updated
    :return RefGenConf: updated object
    """
    tag = tag or self.get_default_tag(genome, asset)
    if _check_insert_data(keys, Mapping, "keys"):
        self.update_tags(genome, asset, tag, force_digest=force_digest)
        asset = self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset]
        _safe_setdef(asset[CFG_ASSET_TAGS_KEY][tag], CFG_SEEK_KEYS_KEY, PXAM())
        asset[CFG_ASSET_TAGS_KEY][tag][CFG_SEEK_KEYS_KEY].update(keys)
    return self

update_tags

update_tags(genome, asset=None, tag=None, data=None, force_digest=None)

Updates the genomes in RefGenConf object at any level. If a requested genome-asset-tag mapping is missing, it will be created

Parameters:

Name Type Description Default
genome str

genome to be added/updated

required
asset str

asset to be added/updated

None
tag str

tag to be added/updated

None
force_digest str

digest to force update of. The alias will not be converted to the digest, even if provided.

None
data Mapping

data to be added/updated

None

Returns:

Type Description

updated object

Source code in refgenconf/refgenconf.py
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
def update_tags(self, genome, asset=None, tag=None, data=None, force_digest=None):
    """
    Updates the genomes in RefGenConf object at any level.
    If a requested genome-asset-tag mapping is missing, it will be created

    :param str genome: genome to be added/updated
    :param str asset: asset to be added/updated
    :param str tag: tag to be added/updated
    :param str force_digest: digest to force update of. The alias will
        not be converted to the digest, even if provided.
    :param Mapping data: data to be added/updated
    :return RefGenConf: updated object
    """
    if _check_insert_data(genome, str, "genome"):
        genome = force_digest or self.get_genome_alias_digest(
            alias=genome, fallback=True
        )
        _safe_setdef(self[CFG_GENOMES_KEY], genome, PXAM())
        if _check_insert_data(asset, str, "asset"):
            _safe_setdef(self[CFG_GENOMES_KEY][genome], CFG_ASSETS_KEY, PXAM())
            _safe_setdef(
                self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY], asset, PXAM()
            )
            if _check_insert_data(tag, str, "tag"):
                _safe_setdef(
                    self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset],
                    CFG_ASSET_TAGS_KEY,
                    PXAM(),
                )
                _safe_setdef(
                    self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
                        CFG_ASSET_TAGS_KEY
                    ],
                    tag,
                    PXAM(),
                )
                if _check_insert_data(data, Mapping, "data"):
                    self[CFG_GENOMES_KEY][genome][CFG_ASSETS_KEY][asset][
                        CFG_ASSET_TAGS_KEY
                    ][tag].update(data)
    return self

write

write(filepath=None)

Write the contents to a file. If pre- and post-update plugins are defined, they will be executed automatically

Parameters:

Name Type Description Default
filepath str

a file path to write to

None

Returns:

Type Description

the path to the created files

Raises:

Type Description
OSError

when the object has been created in a read only mode or other process has locked the file

TypeError

when the filepath cannot be determined. This takes place only if YacAttMap initialized with a Mapping as an input, not read from file.

OSError

when the write is called on an object with no write capabilities or when writing to a file that is locked by a different object

Source code in refgenconf/refgenconf.py
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
def write(self, filepath=None):
    """
    Write the contents to a file.
    If pre- and post-update plugins are defined, they will be executed automatically

    :param str filepath: a file path to write to
    :raise OSError: when the object has been created in a read only mode or other
        process has locked the file
    :raise TypeError: when the filepath cannot be determined.
        This takes place only if YacAttMap initialized with a Mapping as an input,
         not read from file.
    :raise OSError: when the write is called on an object with no write capabilities
        or when writing to a file that is locked by a different object
    :return str: the path to the created files
    """
    self.run_plugins(PRE_UPDATE_HOOK)
    try:
        path = super(RefGenConf, self).write(filepath=filepath, exclude_case=True)
    except ValidationError:
        _LOGGER.error("The changes were not written to the file")
        raise
    self.run_plugins(POST_UPDATE_HOOK)
    return path

Exceptions

The package defines several custom exceptions for error handling:

MissingGenomeError

Bases: RefgenconfError

Error type for request of unknown genome/assembly.

MissingAssetError

Bases: RefgenconfError

Error type for request of an unavailable genome asset.

RefgenconfError

Bases: Exception

Base exception type for this package

Utility Functions

select_genome_config

select_genome_config(filename=None, conf_env_vars=CFG_ENV_VARS, **kwargs)

Get path to genome configuration file.

Parameters:

Name Type Description Default
filename str

name/path of genome configuration file

None
conf_env_vars Iterable[str]

names of environment variables to consider; basically, a prioritized search list

CFG_ENV_VARS

Returns:

Type Description

path to genome configuration file

Source code in refgenconf/helpers.py
25
26
27
28
29
30
31
32
33
34
def select_genome_config(filename=None, conf_env_vars=CFG_ENV_VARS, **kwargs):
    """
    Get path to genome configuration file.

    :param str filename: name/path of genome configuration file
    :param Iterable[str] conf_env_vars: names of environment variables to
        consider; basically, a prioritized search list
    :return str: path to genome configuration file
    """
    return select_config(filename, conf_env_vars, **kwargs)