Python package
The refget Python package
The refget
Python package aims to provide a Python interface for both remote and local use of the refget protocol.
The refget protocol
Refget will consist of 3 standards for identifying and distributing reference genome data:
- Refget sequences: a GA4GH-approved standard for individual sequences
- Refget sequence collections: a standard for collections of sequences, under review
- Refget pangenomes: a future standard for which development is just beginning
Refget Python package utilities:
-
For refget sequences:
- A lightweight Python client for a remote refget sequences server.
- Local caching of retrieved results, improving performance for applications that require repeated lookups.
- A fully functioning local implementation of the refget sequences protocol for local analysis backed by either memory, SQLite, or MongoDB.
- Convenience functions for computing refget sequence digests from Python and handling FASTA files directly.
-
For refget sequence collections:
- A lightweight Python client for a remote refget sequence collections server.
- A local implementation of the refget sequence collections protocol
- Convenience functions for computing refget sequence collection digests from Python.
-
For pangenome sequences: implementation is still a work in progress.
Install
pip install refget
Basic use
Retrieve results from a RESTful API
import refget
rgc = refget.RefGetClient("https://refget.herokuapp.com/sequence/")
rgc.refget("6681ac2f62509cfc220d78751b8dc524", start=0, end=10)
Compute digests locally
refget.trunc512_digest("TCGA")
Insert and retrieve sequences with a local database
checksum = rgc.load_seq("GGAA")
rgc.refget(checksum)
For more details, see the tutorial.