Skip to content

Using the RefGetClient to interact with a Sequence Collections API

Introduction

The refget Python package contains an object called RefGetClient that provides a simple Python API for interacing with a remote refget server. It is capable of interacting either with a Refget Sequences API, or a Refget Sequence Collections API. Here, we will demonstrate how to use it to interact with a Sequence Collections API.

Tutorial

Instantiate a RefGetClient object by giving it the base URL to the API.

from refget import RefGetClient

rgc = RefGetClient()  # This will use default API URLs
rgc = RefGetClient(seqcol_api_urls=["https://seqcolapi.databio.org"], seq_api_urls=None)  # Use the demo seqcolapi instance

Now, you can interact with this object to run any of the API functions Check what's available

rgc.list_collections(page_size=5)

Retrieve a collection:

seqcol = rgc.get_collection("fLf5M0BOIPIqcfbE6R8oYwxsy-PnoV32")
seqcol
# {'lengths': [8, 4, 4],
#  'names': ['chrX', 'chr1', 'chr2'],
#  'sequences': ['SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw',
#   'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj',
#   'SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6'],
#  'sorted_name_length_pairs': ['IWFt7HQ4XoMk34U27BKO-4szSRifP6H5',
#   'chDD8A4S8YZKNNctCimHasAA2Dn596SZ',
#   'enZNOGccwFbN9yJ3YZVifFTFCVA9hIpH']}

Get a list of collections that have a certain digest for an attribute:

l = rgc.list_collections(page=1, page_size=2, attribute="lengths", attribute_digest="cGRMZIb3AVgkcAfNv39RN7hnT5Chk7RX")

List all available values of a specific attribute:

a = rgc.list_attributes("lengths", page_size=3)

Compare two sequence collections:

rgc.compare("fLf5M0BOIPIqcfbE6R8oYwxsy-PnoV32", "MFxJDHkVdTBlPvUFRbYWDZYxmycvHSRp")

Here are some other examples using a different server API:

scclient = RefGetClient(seqcol_api_urls=["http://45.88.81.158:8081/eva/webservices/seqcol"])
seqcol_1 = scclient.get_collection("3mTg0tAA3PS-R1TzelLVWJ2ilUzoWfVq", level=1)
seqcol_2 = scclient.get_collection("3mTg0tAA3PS-R1TzelLVWJ2ilUzoWfVq", level=2)

Now that you have the seqeuence digests, if you gave the client a sequences API URL, you could also retrieve the actual sequences like this (optional):

rgc = refget.RefGetClient(seq_api_urls=[]"https://www.ebi.ac.uk/ena/cram/sequence/"])
rgc.get_sequence(seqcol['sequences'][0])

Debugging

If you want, you can upgrade the logging level for debug code

_LOGGER = logging.getLogger(__name__)
_LOGGER.setLevel("DEBUG")