Sequences Client tutorial¶
Introduction¶
This tutorial will show you how to use the refget.SequenceClient class to interact with a remote Refget sequences API.
Learning objectives
- How do I connect to a remote refget sequences API?
- How do I retrieve sequences by their digest?
- How do I get sequence metadata and service information?
- How does error handling work when sequences are not found?
First, record some versions used in this tutorial:
from platform import python_version
python_version()
'3.12.3'
import refget
refget.__version__
'0.10.0'
Connecting to a remote API¶
The refget package provides a simple Python wrapper around a remote hosted refget sequences API. Provide the base url when construction a SequenceClient object and you can retrieve sequences from the remote server.
from refget.clients import SequenceClient
seq_client = SequenceClient(urls=["https://beta.ensembl.org/data/refget/"])
seq_client.get_sequence("6681ac2f62509cfc220d78751b8dc524", start=0, end=10)
'CCACACCACA'
seq_client.get_sequence("6681ac2f62509cfc220d78751b8dc524", start=0, end=50)
'CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACC'
You can also hit the {digest}/metadata and service_info API endpoints described in the refget API specification:
seq_client.get_metadata("6681ac2f62509cfc220d78751b8dc524")
{'metadata': {'id': '6681ac2f62509cfc220d78751b8dc524',
'md5': '6681ac2f62509cfc220d78751b8dc524',
'trunc512': '959cb1883fc1ca9ae1394ceb475a356ead1ecceff5824ae7',
'ga4gh': 'SQ.lZyxiD_ByprhOUzrR1o1bq0ezO_1gkrn',
'length': 230218,
'aliases': []}}
seq_client.service_info()
{'id': 'refget.infra.ebi.ac.uk',
'name': 'Refget server',
'type': {'group': 'org.ga4gh', 'artifact': 'refget', 'version': '2.0.0'},
'description': None,
'organization': {'name': 'EMBL-EBI', 'url': 'https://ebi.ac.uk/'},
'contactUrl': None,
'documentationUrl': None,
'createdAt': None,
'updatedAt': None,
'environment': None,
'version': '1.0.1',
'refget': {'circular_supported': False,
'subsequence_limit': None,
'algorithms': ['md5', 'ga4gh', 'trunc512'],
'identifier_types': None}}
Error handling¶
When requesting a sequence that doesn't exist, a compliant server should return a 404 error. The client will raise a ConnectionError if all URLs fail:
Note: Some servers may return different error codes (e.g., 500) for invalid digests. The client handles this by trying all configured URLs before failing.
seq_client.get_sequence('BogusDigest')
All URLs failed: Error from https://beta.ensembl.org/data/refget: 404 Client Error: Not Found for url: https://beta.ensembl.org/data/refget/sequence/BogusDigest
Summary
SequenceClient(urls)connects to a remote refget sequences APIclient.get_sequence(digest)retrieves sequences, with optionalstart/endpositionsclient.get_metadata(digest)returns sequence length and aliases