Package refget
documentation
create_refget_router
create_refget_router(sequences=False, collections=True, pangenomes=False)
Create a FastAPI router for the sequence collection API. This router provides endpoints for retrieving and comparing sequence collections. You can choose which endpoints to include by setting the sequences, collections, or pangenomes flags.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sequences
|
bool
|
Include sequence endpoints |
False
|
collections
|
bool
|
Include sequence collection endpoints |
True
|
pangenomes
|
bool
|
Include pangenome endpoints |
False
|
Returns:
Type | Description |
---|---|
APIRouter
|
A FastAPI router with the specified endpoints |
Examples:
app.include_router(create_refget_router(sequences=False, pangenomes=False))
Source code in refget/refget_router.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
fasta_to_seqcol_dict
fasta_to_seqcol_dict(fasta_file_path, digest_function=sha512t24u_digest)
Convert a FASTA file into a Sequence Collection object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fasta_file_path
|
str
|
Path to the FASTA file |
required |
digest_function
|
DigestFunction
|
Digest function to use. Defaults to sha512t24u_digest. |
sha512t24u_digest
|
Returns:
Type | Description |
---|---|
dict
|
A canonical sequence collection object |
Source code in refget/utilities.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
fasta_to_digest
fasta_to_digest(fa_file_path, inherent_attrs=['names', 'sequences'])
Given a fasta file path, return a digest
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fa_file_path
|
str | Path
|
Path to the fasta file |
required |
inherent_attrs
|
Optional[list]
|
Attributes to include in the digest. |
['names', 'sequences']
|
Returns:
Type | Description |
---|---|
str
|
The top-level digest for this sequence collection |
Source code in refget/utilities.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
SequenceClient
SequenceClient(urls=['https://www.ebi.ac.uk/ena/cram'], raise_errors=None)
Bases: RefgetClient
A client for interacting with a refget sequences API.
Initializes the sequences client.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
urls
|
list
|
A list of base URLs of the sequences API. Defaults to ["https://www.ebi.ac.uk/ena/cram/sequence/"]. |
['https://www.ebi.ac.uk/ena/cram']
|
raise_errors
|
bool
|
Whether to raise errors or log them. Defaults to None, which will guess. |
None
|
Attributes: urls (list): The list of base URLs of the sequences API.
Source code in refget/clients.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
get_metadata
get_metadata(digest)
Retrieves metadata for a given sequence digest.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
digest
|
str
|
The digest of the sequence. |
required |
Returns:
Type | Description |
---|---|
dict
|
The metadata. |
Source code in refget/clients.py
78 79 80 81 82 83 84 85 86 87 88 89 |
|
get_sequence
get_sequence(digest, start=None, end=None)
Retrieves a sequence for a given digest.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
digest
|
str
|
The digest of the sequence. |
required |
Returns:
Type | Description |
---|---|
str
|
The sequence. |
Source code in refget/clients.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
SequenceCollectionClient
SequenceCollectionClient(urls=['https://seqcolapi.databio.org'], raise_errors=None)
Bases: RefgetClient
A client for interacting with a refget sequence collections API.
Initializes the sequence collection client.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
urls
|
list
|
A list of base URLs of the sequence collection API. Defaults to ["https://seqcolapi.databio.org"]. |
['https://seqcolapi.databio.org']
|
Attributes:
Name | Type | Description |
---|---|---|
urls |
list
|
The list of base URLs of the sequence collection API. |
Source code in refget/clients.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
compare
compare(digest1, digest2)
Compares two sequence collections.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
digest1
|
str
|
The digest of the first sequence collection. |
required |
digest2
|
str
|
The digest of the second sequence collection. |
required |
Returns:
Type | Description |
---|---|
dict
|
The JSON response containing the comparison of the two sequence collections. |
Source code in refget/clients.py
142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
get_attribute
get_attribute(attribute, digest, level=2)
Retrieves a specific attribute for a given digest and detail level.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attribute
|
str
|
The attribute to retrieve. |
required |
digest
|
str
|
The digest of the attribute. |
required |
Returns:
Type | Description |
---|---|
dict
|
The JSON response containing the attribute. |
Source code in refget/clients.py
128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
get_collection
get_collection(digest, level=2)
Retrieves a sequence collection for a given digest and detail level.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
digest
|
str
|
The digest of the sequence collection. |
required |
level
|
int
|
The level of detail for the sequence collection. Defaults to 2. |
2
|
Returns:
Type | Description |
---|---|
dict
|
The JSON response containing the sequence collection. |
Source code in refget/clients.py
114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
list_attributes
list_attributes(attribute, page=None, page_size=None)
Lists all available values for a given attribute with optional paging support.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attribute
|
str
|
The attribute to list values for. |
required |
page
|
int
|
The page number to retrieve. Defaults to None. |
None
|
page_size
|
int
|
The number of items per page. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
dict
|
The JSON response containing the list of available values for the attribute. |
Source code in refget/clients.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
|
list_collections
list_collections(page=None, page_size=None, attribute=None, attribute_digest=None)
Lists all available sequence collections with optional paging and attribute filtering support.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
page
|
int
|
The page number to retrieve. Defaults to None. |
None
|
page_size
|
int
|
The number of items per page. Defaults to None. |
None
|
attribute
|
str
|
The attribute to filter by. Defaults to None. |
None
|
attribute_digest
|
str
|
The attribute digest to filter by. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
dict
|
The JSON response containing the list of available sequence collections. |
Source code in refget/clients.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
service_info
service_info()
Retrieves information about the service.
Returns:
Type | Description |
---|---|
dict
|
The service information. |
Source code in refget/clients.py
203 204 205 206 207 208 209 210 211 |
|
RefgetDBAgent
RefgetDBAgent(engine=None, postgres_str=None, schema=f'{SCHEMA_FILEPATH}/seqcol.json', inherent_attrs=['names', 'lengths', 'sequences'])
Bases: object
Primary aggregator agent, interface to all other agents
Parameterized it via these environment variables: - POSTGRES_HOST - POSTGRES_DB - POSTGRES_USER - POSTGRES_PASSWORD
Source code in refget/agents.py
600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 |
|
calc_similarities
calc_similarities(digestA, digestB)
Calculates the Jaccard similarity between two sequence collections.
This method retrieves two sequence collections using their digests and then computes jaccard similarities for all attributes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
digestA
|
str
|
The digest (identifier) for the first sequence collection. |
required |
digestB
|
str
|
The digest (identifier) for the second sequence collection. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
The Jaccard similarity score between the two sequence collections for all present and shared attributes. |
Source code in refget/agents.py
667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 |
|
calc_similarities_seqcol_dicts
calc_similarities_seqcol_dicts(seqcolA, seqcolB)
Calculates the Jaccard similarity between two sequence collections.
This method retrieves one sequence collections using a digests and then computes jaccard similarities versus another input sequence collection dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seqcolA
|
dict
|
the first sequence collection in dict format. |
required |
seqcolB
|
dict
|
the second sequence collection in dict format. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
The Jaccard similarity score between the two sequence collections for all present and shared attributes. |
Source code in refget/agents.py
691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 |
|
truncate
truncate()
Delete all records from the database
Source code in refget/agents.py
737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 |
|
SequenceCollectionAgent
SequenceCollectionAgent(engine, inherent_attrs=None)
Bases: object
Agent for interacting with database of sequence collection
Source code in refget/agents.py
146 147 148 |
|
add
add(seqcol, update=False)
Add a sequence collection to the database or update it if it exists
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seqcol
|
SequenceCollection
|
The sequence collection to add |
required |
update
|
bool
|
If True, update an existing collection if it exists |
False
|
Returns:
Type | Description |
---|---|
SequenceCollection
|
The added or updated sequence collection |
Source code in refget/agents.py
231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 |
|
add_from_dict
add_from_dict(seqcol_dict, update=False)
Add a sequence collection from a seqcol dictionary
Args: - seqcol_dict (dict): The sequence collection in dictionary form - update (bool): If True, update an existing collection if it exists
Returns: - (SequenceCollection): The added or updated sequence collection
Source code in refget/agents.py
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
|
add_from_fasta_file
add_from_fasta_file(fasta_file_path, update=False)
Given a path to a fasta file, load the sequences into the refget database.
Args: - fasta_file_path (str): Path to the fasta file - update (bool): If True, update an existing collection if it exists
Returns: - (SequenceCollection): The added or updated sequence collection
Source code in refget/agents.py
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 |
|
add_from_fasta_file_with_name
add_from_fasta_file_with_name(fasta_file_path, human_readable_name, update=False)
Given a path to a fasta file, and a human-readable name, load the sequences into the refget database.
Args: - fasta_file_path (str): Path to the fasta file - human_readable_name (str): human_readable_name - update (bool): If True, update an existing collection if it exists
Returns: - (SequenceCollection): The added or updated sequence collection
Source code in refget/agents.py
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
|
add_from_fasta_pep
add_from_fasta_pep(pep, fa_root, update=False)
Given a path to a PEP file and a root directory containing the fasta files, load the fasta files into the refget database.
Args: - pep_path (str): Path to the PEP file - fa_root (str): Root directory containing the fasta files
Returns: - (dict): A dictionary of the digests of the added sequence collections
Source code in refget/agents.py
375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 |
|
get
get(digest, return_format='level2', attribute=None, itemwise_limit=None)
Get a sequence collection by digest
Args: - digest (str): The digest of the sequence collection - return_format (str): The format in which to return the sequence collection - attribute (str): Name of an attribute to return, if you just want an attribute - itemwise_limit (int): Limit the number of items returned in itemwise format
Returns: - (SequenceCollection): The sequence collection (in requested format)
Source code in refget/agents.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|
SequenceAgent
SequenceAgent(engine)
Bases: object
Agent for interacting with database of sequences
Source code in refget/agents.py
79 80 |
|