Skip to content

Python package

The refget Python package

The refget Python package provides a Python implementation of the GA4GH Refget Specifications, which define standards for identifying and distributing reference biological sequences, like reference genomes. It provides standards at 3 levels of data: sequences, sequence collections, and pangenomes (in progress).

The refget Python package includes these capabilities:

Standard Local use
(computing digests locally)
Client
(connecting to a remote API)
API
(implementing an http interface)
Agent
(managing a SQL database)
Sequences
Sequence Collections
Pangenomes X X

Package components

The refget package provides several components for working with GA4GH refget standards:

  • Local digest functions - Python interface to fast Rust-based implementations of GA4GH digests for sequences and sequence collections.

  • RefgetStore - High-performance local storage for sequences and collections. Supports in-memory and on-disk modes, sequence retrieval by digest, FASTA export, and connecting to remote stores.

  • Clients - For interacting with remote Refget APIs: SequenceClient, SequenceCollectionClient, and FastaDrsClient.

  • Agents - For creating refget services with a PostgreSQL database backend. RefgetDBAgent is the primary interface.

  • FastAPI router - Implements the refget API endpoints. Attach to an existing FastAPI service to deploy your own sequence collections API.

  • Compliance tests - Evaluate a remote API instance against the sequence collections standard.

  • CLI - Commands for computing digests (refget fasta), managing local stores (refget store), querying remote servers (refget seqcol), and database administration (refget admin).

Install

pip install refget

Quick start

Compute a sequence collection digest from a FASTA file

refget fasta digest genome.fa

Query a remote seqcol server

# Get a collection by digest
refget seqcol show XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk

# Compare two collections
refget seqcol compare digest1 digest2

# List collections on the server
refget seqcol list

Use the Python client

from refget.clients import SequenceCollectionClient

client = SequenceCollectionClient()
collection = client.get_collection("XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk")
print(collection)

Set up a local RefgetStore

RefgetStore is basically an attempt to:

  • solve efficiency issues with the original refget sequences protocol.
  • provide a way to download the actual data in a sequence collection, which is not provided by the current sequence collection standard.
# Initialize a local store
refget store init

# Import a FASTA file
refget store add genome.fa

# Export sequences
refget store export <digest> --output output.fa