In [1]:
Copied!
from platform import python_version
python_version()
from platform import python_version
python_version()
Out[1]:
'3.12.3'
In [2]:
Copied!
import refget
refget.__version__
import refget
refget.__version__
Out[2]:
'0.8.0'
Computing digests locally¶
In [3]:
Copied!
from refget import sha512t24u_digest, digest_fasta
from refget import sha512t24u_digest, digest_fasta
Show some results for sequence digests:
In [4]:
Copied!
sha512t24u_digest('GGAA')
sha512t24u_digest('GGAA')
Out[4]:
'YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj'
You can also use the digest_fasta
function to compute digests for a fasta file. This is using a fast rust implementation of the digest functions under the hood, so it's very performant.
In [5]:
Copied!
for x in digest_fasta('../../../test_fasta/base.fa'):
print(f"{x.id}\t{x.length}\t{x.sha512t24u}\t{x.md5}")
for x in digest_fasta('../../../test_fasta/base.fa'):
print(f"{x.id}\t{x.length}\t{x.sha512t24u}\t{x.md5}")
chrX 8 iYtREV555dUFKg2_agSJW6suquUyPpMw 5f63cfaa3ef61f88c9635fb9d18ec945 chr1 4 YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj 31fc6ca291a32fb9df82b85e5f077e31 chr2 4 AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6 92c6a56c9e9459d8a42b96f7884710bc
Connecting to a remote API¶
The refget package provides a simple Python wrapper around a remote hosted refget sequences API. Provide the base url when construction a SequenceClient
object and you can retrieve sequences from the remote server.
In [6]:
Copied!
seq_client = refget.SequenceClient(urls=["https://beta.ensembl.org/data/refget/"])
seq_client = refget.SequenceClient(urls=["https://beta.ensembl.org/data/refget/"])
In [7]:
Copied!
seq_client.get_sequence("6681ac2f62509cfc220d78751b8dc524", start=0, end=10)
seq_client.get_sequence("6681ac2f62509cfc220d78751b8dc524", start=0, end=10)
Out[7]:
'CCACACCACA'
In [8]:
Copied!
seq_client.get_sequence("6681ac2f62509cfc220d78751b8dc524", start=0, end=50)
seq_client.get_sequence("6681ac2f62509cfc220d78751b8dc524", start=0, end=50)
Out[8]:
'CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACC'
You can also hit the {digest}/metadata
and service_info
API endpoints described in the refget API specification:
In [9]:
Copied!
seq_client.get_metadata("6681ac2f62509cfc220d78751b8dc524")
seq_client.get_metadata("6681ac2f62509cfc220d78751b8dc524")
Out[9]:
{'metadata': {'id': '6681ac2f62509cfc220d78751b8dc524', 'md5': '6681ac2f62509cfc220d78751b8dc524', 'trunc512': '959cb1883fc1ca9ae1394ceb475a356ead1ecceff5824ae7', 'ga4gh': 'SQ.lZyxiD_ByprhOUzrR1o1bq0ezO_1gkrn', 'length': 230218, 'aliases': []}}
In [10]:
Copied!
seq_client.service_info()
seq_client.service_info()
Out[10]:
{'id': 'refget.infra.ebi.ac.uk', 'name': 'Refget server', 'type': {'group': 'org.ga4gh', 'artifact': 'refget', 'version': '2.0.0'}, 'description': None, 'organization': {'name': 'EMBL-EBI', 'url': 'https://ebi.ac.uk/'}, 'contactUrl': None, 'documentationUrl': None, 'createdAt': None, 'updatedAt': None, 'environment': None, 'version': '1.0.0', 'refget': {'circular_supported': False, 'subsequence_limit': None, 'algorithms': ['md5', 'ga4gh', 'trunc512'], 'identifier_types': None}}
When requesting a sequence that is not found, the service responds appropriately:
In [11]:
Copied!
seq_client.get_sequence('BogusDigest')
seq_client.get_sequence('BogusDigest')
ERROR:refget.clients:All URLs failed: Error from https://beta.ensembl.org/data/refget: 500 Server Error: Internal Server Error for url: https://beta.ensembl.org/data/refget/sequence/BogusDigest