Computing digests locally¶
The refget
Python package includes general-purpose functions for computing GA4GH-style digests. These functions can be used to compute digests of sequences or sequence collections.
Show some results for sequence digests:
In [7]:
Copied!
from refget import sha512t24u_digest, fasta_to_digest, fasta_to_seqcol, fasta_to_seq_digests
from refget import sha512t24u_digest, fasta_to_digest, fasta_to_seqcol, fasta_to_seq_digests
In [2]:
Copied!
sha512t24u_digest('GGAA')
sha512t24u_digest('GGAA')
Out[2]:
'YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj'
You can also compute a top-level (level 0) digest for a FASTA file like this:
In [3]:
Copied!
fasta_to_digest('../../../test_fasta/base.fa')
fasta_to_digest('../../../test_fasta/base.fa')
Out[3]:
'XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk'
If you want to get the complete level 2 representation of the sequence collection from the fasta file, use the fasta_to_seqcol
function:
In [5]:
Copied!
fasta_to_seqcol('../../../test_fasta/base.fa')
fasta_to_seqcol('../../../test_fasta/base.fa')
Out[5]:
{'lengths': [8, 4, 4], 'names': ['chrX', 'chr1', 'chr2'], 'sequences': ['SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw', 'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj', 'SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6'], 'sorted_name_length_pairs': [b'{"length":4,"name":"chr1"}', b'{"length":4,"name":"chr2"}', b'{"length":8,"name":"chrX"}'], 'sorted_sequences': ['SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw', 'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj', 'SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6']}
Or, if you want to use the lower-level function to just compute individual sequence digests for each sequence in the file, use the fasta_to_seq_digests
function:
In [8]:
Copied!
for x in fasta_to_seq_digests('../../../test_fasta/base.fa'):
print(f"{x.id}\t{x.length}\t{x.sha512t24u}\t{x.md5}")
for x in fasta_to_seq_digests('../../../test_fasta/base.fa'):
print(f"{x.id}\t{x.length}\t{x.sha512t24u}\t{x.md5}")
chrX 8 iYtREV555dUFKg2_agSJW6suquUyPpMw 5f63cfaa3ef61f88c9635fb9d18ec945 chr1 4 YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj 31fc6ca291a32fb9df82b85e5f077e31 chr2 4 AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6 92c6a56c9e9459d8a42b96f7884710bc