tutorial

Tutorial¶

I assume you've already installed refgenie. In this tutorial I'll show you a few ways to use refgenie from the command line (commands that start with a !), and also some Python commands.

To start, initialize an empty refgenie configuration file from the shell and subscribe to the desired asset server:

In [1]:

!refgenie init -c refgenie.yaml -s http://rg.databio.org

Initialized genome configuration file: /Users/mstolarczyk/code/refgenie/docs_jupyter/refgenie.yaml
Created directories:
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/data
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/alias

Here's what it looks like:

In [2]:

!cat refgenie.yaml

config_version: 0.4
genome_folder: /Users/mstolarczyk/code/refgenie/docs_jupyter
genome_servers: 
 - http://rg.databio.org
genomes: null

In [3]:

!refgenie listr -c refgenie.yaml

                             Remote refgenie assets                             
                       Server URL: http://rg.databio.org                        
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ genome           ┃ assets                                                    ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ rCRSd            │ fasta, bowtie2_index, bwa_index, hisat2_index,            │
│                  │ star_index, bismark_bt2_index                             │
│ hg18_cdna        │ fasta, kallisto_index                                     │
│ hs38d1           │ fasta, suffixerator_index, bowtie2_index, bwa_index,      │
│                  │ tallymer_index, hisat2_index, star_index,                 │
│                  │ bismark_bt2_index                                         │
│ hg38_cdna        │ fasta, kallisto_index, salmon_index                       │
│ human_repeats    │ fasta, suffixerator_index, bowtie2_index, bwa_index,      │
│                  │ tallymer_index, hisat2_index, star_index,                 │
│                  │ bismark_bt2_index                                         │
│ rn6_cdna         │ fasta, kallisto_index, salmon_index                       │
│ mm10_cdna        │ fasta, kallisto_index, salmon_index                       │
│ hg38_chr22       │ fasta, suffixerator_index, bowtie2_index, bwa_index,      │
│                  │ tallymer_index, hisat2_index, star_index,                 │
│                  │ bismark_bt2_index                                         │
│ hg38             │ fasta, gencode_gtf, ensembl_gtf, refgene_anno,            │
│                  │ fasta_txome, ensembl_rb, feat_annotation,                 │
│                  │ suffixerator_index, cellranger_reference, bowtie2_index,  │
│                  │ bwa_index, tallymer_index, hisat2_index, star_index,      │
│                  │ bismark_bt2_index, salmon_partial_sa_index                │
│ hg19_cdna        │ fasta, kallisto_index, salmon_index                       │
│ human_rDNA       │ fasta, suffixerator_index, bowtie2_index, bwa_index,      │
│                  │ tallymer_index, hisat2_index, star_index,                 │
│                  │ bismark_bt2_index                                         │
│ human_alu        │ fasta, suffixerator_index, bowtie2_index, bwa_index,      │
│                  │ tallymer_index, hisat2_index, bismark_bt2_index           │
│ human_alphasat   │ fasta, suffixerator_index, bowtie2_index, bwa_index,      │
│                  │ tallymer_index, hisat2_index, star_index,                 │
│                  │ bismark_bt2_index                                         │
│ mouse_chrM2x     │ fasta, suffixerator_index, bowtie2_index, bwa_index,      │
│                  │ tallymer_index, hisat2_index, star_index,                 │
│                  │ bismark_bt2_index                                         │
│ t7               │ fasta, bowtie2_index                                      │
│ mm10             │ fasta, gencode_gtf, ensembl_gtf, refgene_anno,            │
│                  │ fasta_txome, ensembl_rb, feat_annotation,                 │
│                  │ suffixerator_index, cellranger_reference, bwa_index,      │
│                  │ bowtie2_index, hisat2_index, tallymer_index, star_index,  │
│                  │ bismark_bt2_index, salmon_partial_sa_index                │
│ dm6              │ fasta, gencode_gtf, ensembl_gtf, refgene_anno,            │
│                  │ bowtie2_index                                             │
│ hg18             │ fasta, gencode_gtf, fasta_txome, suffixerator_index,      │
│                  │ cellranger_reference, bwa_index, bowtie2_index,           │
│                  │ tallymer_index, hisat2_index, star_index,                 │
│                  │ bismark_bt2_index                                         │
│ hg19             │ fasta, gencode_gtf, ensembl_gtf, refgene_anno,            │
│                  │ fasta_txome, ensembl_rb, feat_annotation,                 │
│                  │ suffixerator_index, cellranger_reference, bwa_index,      │
│                  │ bowtie2_index, tallymer_index, hisat2_index, star_index,  │
│                  │ salmon_partial_sa_index, bismark_bt2_index                │
│ rn6              │ fasta, ensembl_gtf, refgene_anno, fasta_txome,            │
│                  │ suffixerator_index, bwa_index, bowtie2_index,             │
│                  │ tallymer_index, hisat2_index, star_index,                 │
│                  │ bismark_bt2_index, salmon_partial_sa_index                │
│ hg38_noalt_decoy │ fasta, suffixerator_index, bowtie2_index, bwa_index,      │
│                  │ tallymer_index, hisat2_index, bismark_bt2_index           │
│ mm10_primary     │ fasta, bowtie2_index, bwa_index                           │
│ hg38_primary     │ fasta, bowtie2_index, bwa_index                           │
│ hg38_mm10        │ fasta, bwa_index                                          │
└──────────────────┴───────────────────────────────────────────────────────────┘
             use refgenie listr -g <genome> for more detailed view

Now let's enter python and do some stuff.

In [4]:

import refgenconf
rgc = refgenconf.RefGenConf(filepath="refgenie.yaml")

Use pull to download one of the assets:

In [5]:

rgc.pull("mouse_chrM2x", "fasta", "default")

Output()

Out[5]:

(['43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a', 'fasta', 'default'],
 {'asset_path': 'fasta',
  'asset_digest': '8dfe402f7d29d5b036dd8937119e4404',
  'archive_digest': 'bfb7877ee114c61a17a50bd471de47a2',
  'asset_size': '39.4KB',
  'archive_size': '9.1KB',
  'seek_keys': {'fasta': '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a.fa',
   'fai': '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a.fa.fai',
   'chrom_sizes': '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a.chrom.sizes'},
  'asset_parents': [],
  'asset_children': ['43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a/suffixerator_index:default',
   '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a/bowtie2_index:default',
   '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a/bwa_index:default',
   '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a/tallymer_index:default',
   '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a/hisat2_index:default',
   '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a/star_index:default',
   '43f14ba8beed34d52edb244e26f193df6edbb467bd55d37a/bismark_bt2_index:default']},
 'http://rg.databio.org')

Once it's downloaded, use seek to retrieve a path to it.

In [6]:

rgc.seek("mouse_chrM2x", "fasta")

Out[6]:

'/Users/mstolarczyk/code/refgenie/docs_jupyter/alias/mouse_chrM2x/fasta/default/mouse_chrM2x.fa'

You can get the unique asset identifier with id()

In [7]:

rgc.id("mouse_chrM2x", "fasta")

Out[7]:

'8dfe402f7d29d5b036dd8937119e4404'

Building and pulling from the command line¶

Here, we can build a fasta asset instead of pulling one. Back to the shell, we'll grab the Revised Cambridge Reference Sequence (human mitochondrial genome, because it's small):

In [8]:

!wget -O rCRSd.fa.gz http://big.databio.org/refgenie_raw/files.rCRSd.fasta.fasta

--2021-03-09 12:22:40--  http://big.databio.org/refgenie_raw/files.rCRSd.fasta.fasta
Resolving big.databio.org (big.databio.org)... 128.143.245.181, 128.143.245.182
Connecting to big.databio.org (big.databio.org)|128.143.245.181|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8399 (8.2K) [application/octet-stream]
Saving to: ‘rCRSd.fa.gz’

rCRSd.fa.gz         100%[===================>]   8.20K  --.-KB/s    in 0.006s  

2021-03-09 12:22:40 (1.35 MB/s) - ‘rCRSd.fa.gz’ saved [8399/8399]

In [9]:

!refgenie build rCRSd/fasta -c refgenie.yaml  --files fasta=rCRSd.fa.gz -R

Using 'default' as the default tag for 'rCRSd/fasta'
Recipe validated successfully against a schema: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/refgenie/schemas/recipe_schema.yaml
Building 'rCRSd/fasta:default' using 'fasta' recipe
Initializing genome: rCRSd
Loaded AnnotatedSequenceDigestList (1 sequences)
Set genome alias (94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4: rCRSd)
Created alias directories: 
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd
Saving outputs to:
- content: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4
- logs: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/_refgenie_build
### Pipeline run code and environment:

*              Command:  `/Library/Frameworks/Python.framework/Versions/3.6/bin/refgenie build rCRSd/fasta -c refgenie.yaml --files fasta=rCRSd.fa.gz -R`
*         Compute host:  MichalsMBP
*          Working dir:  /Users/mstolarczyk/code/refgenie/docs_jupyter
*            Outfolder:  /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/_refgenie_build/
*  Pipeline started at:   (03-09 12:22:41) elapsed: 0.0 _TIME_

### Version log:

*       Python version:  3.6.5
*          Pypiper dir:  `/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pypiper`
*      Pypiper version:  0.12.1
*         Pipeline dir:  `/Library/Frameworks/Python.framework/Versions/3.6/bin`
*     Pipeline version:  None

### Arguments passed to pipeline:

* `asset_registry_paths`:  `['rCRSd/fasta']`
*             `assets`:  `None`
*            `command`:  `build`
*        `config_file`:  `refgenie.yaml`
*             `docker`:  `False`
*              `files`:  `[['fasta=rCRSd.fa.gz']]`
*             `genome`:  `None`
*      `genome_config`:  `refgenie.yaml`
* `genome_description`:  `None`
*             `logdev`:  `False`
*          `new_start`:  `False`
*          `outfolder`:  `/Users/mstolarczyk/code/refgenie/docs_jupyter/data`
*             `params`:  `None`
*             `recipe`:  `None`
*            `recover`:  `True`
*       `requirements`:  `False`
*             `silent`:  `False`
*     `skip_read_lock`:  `False`
*    `tag_description`:  `None`
*          `verbosity`:  `None`
*            `volumes`:  `None`

----------------------------------------

Target to produce: `/Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/_refgenie_build/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4_fasta__default.flag`  

> `cp rCRSd.fa.gz /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.fa.gz` (63575)
<pre>
psutil.ZombieProcess process still exists but it's a zombie (pid=63575)
Warning: couldn't add memory use for process: 63575
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB.  
  PID: 63575;   Command: cp;    Return code: 0; Memory used: 0GB


> `gzip -df /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.fa.gz` (63576)
<pre>
psutil.ZombieProcess process still exists but it's a zombie (pid=63576)
Warning: couldn't add memory use for process: 63576
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB.  
  PID: 63576;   Command: gzip;  Return code: 0; Memory used: 0GB


> `samtools faidx /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.fa` (63577)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.001GB.  
  PID: 63577;   Command: samtools;  Return code: 0; Memory used: 0.001GB


> `cut -f 1,2 /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.fa.fai > /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.chrom.sizes` (63578)
<pre>
psutil.ZombieProcess process still exists but it's a zombie (pid=63578)
Warning: couldn't add memory use for process: 63578
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.001GB.  
  PID: 63578;   Command: cut;   Return code: 0; Memory used: 0GB


> `touch /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/_refgenie_build/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4_fasta__default.flag` (63580)
<pre>
psutil.ZombieProcess process still exists but it's a zombie (pid=63580)
Warning: couldn't add memory use for process: 63580
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.001GB.  
  PID: 63580;   Command: touch; Return code: 0; Memory used: 0GB

Asset digest: 4eb430296bc02ed7e4006624f1d5ac53
Default tag for '94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta' set to: default

### Pipeline completed. Epilogue
*        Elapsed time (this run):  0:00:00
*  Total elapsed time (all runs):  0:00:00
*         Peak memory (this run):  0.0015 GB
*        Pipeline completed time: 2021-03-09 12:22:41
Finished building 'fasta' asset
Created alias directories: 
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd/fasta/default

The asset should be available for local use, let's call refgenie list to check it:

In [10]:

!refgenie list -c refgenie.yaml --genome rCRSd

                        Local refgenie assets                         
             Server subscriptions: http://rg.databio.org              
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ genome    ┃ asset (seek_keys)                          ┃ tags      ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ rCRSd     │ fasta (fasta, fai, chrom_sizes)            │ default   │
└───────────┴────────────────────────────────────────────┴───────────┘

We can retrieve the path to this asset with:

In [11]:

!refgenie seek rCRSd/fasta -c refgenie.yaml

/Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd/fasta/default/rCRSd.fa

Naturally, we can do the same thing from within Python:

In [12]:

rgc = refgenconf.RefGenConf("refgenie.yaml")
rgc.seek("rCRSd", "fasta")

Out[12]:

'/Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd/fasta/default/rCRSd.fa'

Now, if we have bowtie2-build in our $PATH we can build the bowtie2_index asset with no further requirements.

Let's check the requirements with refgenie build --requirements:

In [13]:

!refgenie build rCRSd/bowtie2_index -c refgenie.yaml --requirements

'bowtie2_index' recipe requirements: 
- assets:
    fasta (fasta asset for genome); default: fasta

Since I already have the fasta asset, that means I don't need anything else to build the bowtie2_index.

In [14]:

!refgenie build rCRSd/bowtie2_index -c refgenie.yaml

Using 'default' as the default tag for 'rCRSd/bowtie2_index'
Recipe validated successfully against a schema: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/refgenie/schemas/recipe_schema.yaml
Building 'rCRSd/bowtie2_index:default' using 'bowtie2_index' recipe
Saving outputs to:
- content: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4
- logs: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/_refgenie_build
### Pipeline run code and environment:

*              Command:  `/Library/Frameworks/Python.framework/Versions/3.6/bin/refgenie build rCRSd/bowtie2_index -c refgenie.yaml`
*         Compute host:  MichalsMBP
*          Working dir:  /Users/mstolarczyk/code/refgenie/docs_jupyter
*            Outfolder:  /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/_refgenie_build/
*  Pipeline started at:   (03-09 12:22:45) elapsed: 0.0 _TIME_

### Version log:

*       Python version:  3.6.5
*          Pypiper dir:  `/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pypiper`
*      Pypiper version:  0.12.1
*         Pipeline dir:  `/Library/Frameworks/Python.framework/Versions/3.6/bin`
*     Pipeline version:  None

### Arguments passed to pipeline:

* `asset_registry_paths`:  `['rCRSd/bowtie2_index']`
*             `assets`:  `None`
*            `command`:  `build`
*        `config_file`:  `refgenie.yaml`
*             `docker`:  `False`
*              `files`:  `None`
*             `genome`:  `None`
*      `genome_config`:  `refgenie.yaml`
* `genome_description`:  `None`
*             `logdev`:  `False`
*          `new_start`:  `False`
*          `outfolder`:  `/Users/mstolarczyk/code/refgenie/docs_jupyter/data`
*             `params`:  `None`
*             `recipe`:  `None`
*            `recover`:  `False`
*       `requirements`:  `False`
*             `silent`:  `False`
*     `skip_read_lock`:  `False`
*    `tag_description`:  `None`
*          `verbosity`:  `None`
*            `volumes`:  `None`

----------------------------------------

Target to produce: `/Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/_refgenie_build/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4_bowtie2_index__default.flag`  

> `bowtie2-build /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.fa /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4` (63609)
<pre>
Settings:
  Output files: "/Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 8284
Using parameters --bmax 6213 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 6213 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 33136 (target: 6212)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 33136 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 33137 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 10248
fchr[G]: 20610
fchr[T]: 24948
fchr[$]: 33136
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4205567 bytes to primary EBWT file: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.1.bt2
Wrote 8292 bytes to secondary EBWT file: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 33136
    bwtLen: 33137
    sz: 8284
    bwtSz: 8285
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 2072
    offsSz: 8288
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 173
    numLines: 173
    ebwtTotLen: 11072
    ebwtTotSz: 11072
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 00:00:00
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
  Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 8284
Using parameters --bmax 6213 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 6213 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 33136 (target: 6212)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 33136 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 33137 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 10248
fchr[G]: 20610
fchr[T]: 24948
fchr[$]: 33136
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4205567 bytes to primary EBWT file: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.rev.1.bt2
Wrote 8292 bytes to secondary EBWT file: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 33136
    bwtLen: 33137
    sz: 8284
    bwtSz: 8285
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 2072
    offsSz: 8288
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 173
    numLines: 173
    ebwtTotLen: 11072
    ebwtTotSz: 11072
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 00:00:00
</pre>

Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB.  
  PID: 63609;   Command: bowtie2-build; Return code: 0; Memory used: 0.003GB


> `touch /Users/mstolarczyk/code/refgenie/docs_jupyter/data/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index/default/_refgenie_build/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4_bowtie2_index__default.flag` (63611)
<pre>
psutil.ZombieProcess process still exists but it's a zombie (pid=63611)
Warning: couldn't add memory use for process: 63611
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB.  
  PID: 63611;   Command: touch; Return code: 0; Memory used: 0GB

Asset digest: 1262e30d4a87db9365d501de8559b3b4
Default tag for '94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/bowtie2_index' set to: default

### Pipeline completed. Epilogue
*        Elapsed time (this run):  0:00:01
*  Total elapsed time (all runs):  0:00:00
*         Peak memory (this run):  0.0028 GB
*        Pipeline completed time: 2021-03-09 12:22:46
Finished building 'bowtie2_index' asset
Created alias directories: 
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd/bowtie2_index/default

We can see a list of available recipes like this:

In [15]:

!refgenie list -c refgenie.yaml --recipes

bismark_bt1_index, bismark_bt2_index, blacklist, bowtie2_index, bwa_index, cellranger_reference, dbnsfp, dbsnp, ensembl_gtf, ensembl_rb, epilog_index, fasta, fasta_txome, feat_annotation, gencode_gtf, hisat2_index, kallisto_index, refgene_anno, salmon_index, salmon_partial_sa_index, salmon_sa_index, star_index, suffixerator_index, tallymer_index, tgMap

We can get the unique digest for any asset with refgenie id:

In [16]:

!refgenie id rCRSd/fasta -c refgenie.yaml

4eb430296bc02ed7e4006624f1d5ac53

Versions¶

In [17]:

from platform import python_version 
python_version()

Out[17]:

'3.6.5'

In [18]:

!refgenie --version

refgenie 0.10.0-dev | refgenconf 0.10.0-dev