Refgenie from within Python
Third-party python tools can rely on our Python object for access to refgenie assets. For this we have a Python package called refgenconf
which provides a class with methods to access local and remote genome assets.
Installing
You should already have refgenconf
if you've installed refgenie
, but if needed you can also install it separately with some variant of pip install refgenconf
.
Quick start
Create a RefGenConf
object, which is the package's main data type. You just need to give it a refgenie genome configuration file (in YAML format). You can create a template using refgenie init
.
As a general rule, the CLI functions are available from within Python under the same names, e.g. refgenie list ...
is available as RefGenConf.list()
method.
import refgenconf
rgc = refgenconf.RefGenConf("genome_config.yaml")
Now, you can interact with it:
print(rgc)
Use this to show all available remote assets:
rgc.listr()
In a tool, you're probably most interested in using refgenie to locate reference genome assets, for which you want to use the get_asset
function. For example:
# identify genome (perhaps provided by user)
genome = "hg38"
# get the local path to bowtie2 indexes:
bt2idx = rgc.seek(genome, "bowtie2_index")
# run bowtie2...
This enables you to write python software that will work on any computing environment without having to worry about passing around brittle environment-specific file paths. See this tutorial for more comprehensive example of how to work with refgenconf
as a tool developer.
See the complete refgenconf python API for more details.