Computing digests locally¶
The refget Python package includes general-purpose functions for computing GA4GH-style digests. These functions can be used to compute digests of sequences or sequence collections.
Show some results for sequence digests:
In [3]:
Copied!
from refget import sha512t24u_digest, fasta_to_digest, fasta_to_seqcol_dict, fasta_to_seq_digests
from refget import sha512t24u_digest, fasta_to_digest, fasta_to_seqcol_dict, fasta_to_seq_digests
In [13]:
Copied!
sha512t24u_digest('GGAA')
sha512t24u_digest('GGAA')
Out[13]:
'YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj'
You can also compute a top-level (level 0) digest for a FASTA file like this:
In [5]:
Copied!
fasta_to_digest('../../../test_fasta/base.fa')
fasta_to_digest('../../../test_fasta/base.fa')
Out[5]:
'XZlrcEGi6mlopZ2uD8ObHkQB1d0oDwKk'
If you want to get the complete level 2 representation of the sequence collection from the fasta file, use the fasta_to_seqcol function:
In [7]:
Copied!
fasta_to_seqcol_dict('../../../test_fasta/base.fa')
fasta_to_seqcol_dict('../../../test_fasta/base.fa')
Out[7]:
{'lengths': [8, 4, 4],
'names': ['chrX', 'chr1', 'chr2'],
'sequences': ['SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw',
'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj',
'SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6'],
'sorted_name_length_pairs': [b'{"length":4,"name":"chr1"}',
b'{"length":4,"name":"chr2"}',
b'{"length":8,"name":"chrX"}'],
'sorted_sequences': ['SQ.iYtREV555dUFKg2_agSJW6suquUyPpMw',
'SQ.YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj',
'SQ.AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6']}
Or, if you want to use the lower-level function to just compute individual sequence digests for each sequence in the file, use the fasta_to_seq_digests function:
In [ ]:
Copied!
for x in fasta_to_seq_digests('../../../test_fasta/base.fa'):
print(f"{x.metadata.name}\t{x.metadata.length}\t{x.metadata.sha512t24u}\t{x.metadata.md5}")
for x in fasta_to_seq_digests('../../../test_fasta/base.fa'):
print(f"{x.metadata.name}\t{x.metadata.length}\t{x.metadata.sha512t24u}\t{x.metadata.md5}")
chrX 8 iYtREV555dUFKg2_agSJW6suquUyPpMw 5f63cfaa3ef61f88c9635fb9d18ec945 chr1 4 YBbVX0dLKG1ieEDCiMmkrTZFt_Z5Vdaj 31fc6ca291a32fb9df82b85e5f077e31 chr2 4 AcLxtBuKEPk_7PGE_H4dGElwZHCujwH6 92c6a56c9e9459d8a42b96f7884710bc