RefgetDBAgent Tutorial¶
This tutorial shows you how to use RefgetDBAgent to manage a PostgreSQL database of sequence collections for building a seqcol API server.
Learning objectives
- Connect to a PostgreSQL database with RefgetDBAgent
- List and retrieve sequence collections
- Compare sequence collections
- Use specialized sub-agents for different operations
Prerequisites¶
This tutorial requires a running PostgreSQL database. Set up environment variables before running:
export POSTGRES_HOST=localhost
export POSTGRES_DB=refget
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=yourpassword
Or use the demo setup:
cd repos/refget
bash deployment/demo_up.sh # Starts postgres + loads demo data
Note
For most users, the CLI (refget admin) or RefgetStore (local file-based storage) are simpler alternatives. Use RefgetDBAgent when you need direct database access for building a seqcol API server.
# Initialize the database agent
# Requires POSTGRES_* environment variables to be set
from refget.agents import RefgetDBAgent
agent = RefgetDBAgent()
print(f"Connected to database via: {agent.engine}")
Connected to database via: Engine(postgresql://seqcol:***@localhost/seqcol)
Querying collections¶
With the demo setup, one sequence collection is pre-loaded. List the available collections:
# List available collections
collections = agent.seqcol.list_by_offset()
print(f"Available collections: {collections}")
Available collections: {'pagination': {'page': 0, 'page_size': 50, 'total': 1}, 'results': [SequenceCollection(digest='XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk', names_digest='Fw1r9eRxfOZD98KKrhlYQNEdSRHoVxAG', sorted_name_length_pairs_digest='zjM1Ie9m0zFbqsAnZ6jAJSXuFpKTr40J', sequences_digest='0uDQVLuHaOZi1u76LjV__yrVUIz9Bwhr', sorted_sequences_digest='KgWo6TT1Lqw6vgkXU9sYtCU9xwXoDt6M', lengths_digest='cGRMZIb3AVgkcAfNv39RN7hnT5Chk7RX', name_length_pairs_digest='B9MESWM8k-hK_OeQK8bZNAG74pLY0Ujq')]}
Retrieving a collection¶
Retrieve a specific collection by its digest:
# Retrieve a collection by digest
digest = "XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk"
collection = agent.seqcol.get(digest, return_format="level2")
print(f"Collection: {collection}")
Collection: {'lengths': [8, 4, 4], 'names': ['chrX', 'chr1', 'chr2'], 'sequences': ['SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw', 'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj', 'SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6'], 'sorted_sequences': ['SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6', 'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj', 'SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw'], 'name_length_pairs': [{'length': 8, 'name': 'chrX'}, {'length': 4, 'name': 'chr1'}, {'length': 4, 'name': 'chr2'}]}
Comparing collections¶
Compare two sequence collections to identify their similarities and differences. In this demo, we'll compare a collection to itself (showing a perfect match):
# Compare a collection with itself (demonstrates the comparison feature)
digest = "XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk"
comparison = agent.compare_digests(digest, digest)
print(f"Comparison result: {comparison}")
Comparison result: {'attributes': {'a_only': [], 'b_only': [], 'a_and_b': ['lengths', 'name_length_pairs', 'names', 'sequences', 'sorted_sequences']}, 'array_elements': {'a_count': {'lengths': 3, 'name_length_pairs': 3, 'names': 3, 'sequences': 3, 'sorted_sequences': 3}, 'b_count': {'lengths': 3, 'name_length_pairs': 3, 'names': 3, 'sequences': 3, 'sorted_sequences': 3}, 'a_and_b_count': {'lengths': 3, 'name_length_pairs': 3, 'names': 3, 'sequences': 3, 'sorted_sequences': 3}, 'a_and_b_same_order': {'lengths': True, 'name_length_pairs': True, 'names': True, 'sequences': True, 'sorted_sequences': True}}}
Sub-agents¶
The RefgetDBAgent provides access to specialized sub-agents:
| Sub-agent | Purpose | Key Methods |
|---|---|---|
agent.seqcol |
Sequence collection operations | add_from_fasta_file(), get(), list_by_offset() |
agent.seq |
Individual sequence operations | get(), add(), list() |
agent.fasta_drs |
FASTA file DRS object management | get(), add(), add_access_method() |
agent.pangenome |
Pangenome operations | get(), add(), add_from_fasta_pep() |
agent.attribute |
Attribute array operations | get(), list(), search() |
Summary
RefgetDBAgent()connects to PostgreSQL usingPOSTGRES_*environment variablesagent.seqcol.list_by_offset()andagent.seqcol.get(digest)for querying collectionsagent.compare_digests(a, b)compares two collections- Sub-agents provide specialized operations:
seqcol,seq,fasta_drs,pangenome,attribute