import refget
from refget import SequenceCollectionClient
Create a client:
seqcol_client = SequenceCollectionClient(urls=["http://127.0.0.1:8100"])
seqcol_client
<SequenceCollectionClient> Service ID: org.databio.seqcolapi Service Name: Sequence collections API URLs: http://127.0.0.1:8100
Now we have a client connected to our server. Now, you can interact with this object to run any of the API functions Check what's available:
seqcol_client.list_collections()
{'pagination': {'page': 0, 'page_size': 100, 'total': 6}, 'results': ['XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk', 'QvT5tAQ0B8Vkxd-qFftlzEk2QyfPtgOv', 'Tpdsg75D4GKCGEHtIiDSL9Zx-DSuX5V8', 'UNGAdNDmBbQbHihecPPFxwTydTcdFKxL', 'sv7GIP1K0qcskIKF3iaBmQpaum21vH74', 'aVzHaGFlUDUNF2IEmNdzS_A8lCY0stQH']}
Retrieve one of these collections:
seqcol_client.get_collection("XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk")
{'lengths': [8, 4, 4], 'names': ['chrX', 'chr1', 'chr2'], 'sequences': ['SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw', 'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj', 'SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6'], 'sorted_sequences': ['SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6', 'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj', 'SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw'], 'name_length_pairs': [{'length': 8, 'name': 'chrX'}, {'length': 4, 'name': 'chr1'}, {'length': 4, 'name': 'chr2'}]}
This gives you the level 2 representation of the sequence collection, which is the canonical, expanded representation. You can also request the more compact level 1 representation, which gives you digests for each of the attributes:
seqcol_client.get_collection("XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk", level=1)
{'lengths': 'cGRMZIb3AVgkcAfNv39RN7hnT5Chk7RX', 'names': 'Fw1r9eRxfOZD98KKrhlYQNEdSRHoVxAG', 'sequences': '0uDQVLuHaOZi1u76LjV__yrVUIz9Bwhr', 'sorted_sequences': 'KgWo6TT1Lqw6vgkXU9sYtCU9xwXoDt6M', 'name_length_pairs': 'B9MESWM8k-hK_OeQK8bZNAG74pLY0Ujq', 'sorted_name_length_pairs': 'wwE4PUok50YyEF2Ne8BBA5__zk92CZH8'}
These attributes are useful because you can use them in the same way you us a top-level sequence digest to look up values of a specific attribute using the get_attribute
function.
For example, here we will use the lengths digest to retrieve just the value of this attribute.
You can see it matches the expanded version retrieved above:
seqcol_client.get_attribute("lengths", "cGRMZIb3AVgkcAfNv39RN7hnT5Chk7RX")
[8, 4, 4]
We can also discover possible attributes with the list_attributes
functio, which will list all available values of a specific attribute:
seqcol_client.list_attributes("lengths", page_size=3)
{'pagination': {'page': 0, 'page_size': 3, 'total': 3}, 'results': ['cGRMZIb3AVgkcAfNv39RN7hnT5Chk7RX', 'x5qpE4FtMkvlwpKIzvHs3a02Nex5tthp', '7-_HdxYiRf-AJLBKOTaJUdxXrUkIXs6T']}
One of the useful applications of attribute digests is that we can use them to discover other sequence collections that have the same values. Here's how to get a list of collections that have a certain digest for an attribute:
seqcol_client.list_collections(page=1,
page_size=2,
attribute="lengths",
attribute_digest="cGRMZIb3AVgkcAfNv39RN7hnT5Chk7RX")
{'pagination': {'page': 4, 'page_size': 2, 'total': 4}, 'results': ['UNGAdNDmBbQbHihecPPFxwTydTcdFKxL', 'aVzHaGFlUDUNF2IEmNdzS_A8lCY0stQH']}
Compare two sequence collections
seqcol_client.compare(
"UNGAdNDmBbQbHihecPPFxwTydTcdFKxL",
"aVzHaGFlUDUNF2IEmNdzS_A8lCY0stQH")
{'digests': {'a': 'UNGAdNDmBbQbHihecPPFxwTydTcdFKxL', 'b': 'aVzHaGFlUDUNF2IEmNdzS_A8lCY0stQH'}, 'attributes': {'a_only': [], 'b_only': [], 'a_and_b': ['lengths', 'name_length_pairs', 'names', 'sequences', 'sorted_sequences']}, 'array_elements': {'a': {'lengths': 3, 'name_length_pairs': 3, 'names': 3, 'sequences': 3, 'sorted_sequences': 3}, 'b': {'lengths': 3, 'name_length_pairs': 3, 'names': 3, 'sequences': 3, 'sorted_sequences': 3}, 'a_and_b': {'lengths': 3, 'name_length_pairs': 1, 'names': 3, 'sequences': 3, 'sorted_sequences': 3}, 'a_and_b_same_order': {'lengths': True, 'name_length_pairs': True, 'names': False, 'sequences': True, 'sorted_sequences': True}}}