Refgenie Recipe and Asset Class System
Overview
Refgenie’s new recipe and asset class system introduces a flexible, extensible, and user-driven approach to defining and managing reference genome assets. This system eliminates the previous reliance on internally-defined asset types, empowering users and tool developers to define, share, and distribute their own asset types and recipes.
Key Concepts
Asset Classes
- Asset classes define the structure and seek keys (files or directories) that make up an asset type (e.g., a FASTA, GTF, or index).
- Asset classes are no longer hardcoded in refgenie. Instead, they are defined in external YAML or JSON files, which can be created, modified, and shared by anyone.
- This means new asset types can be introduced without modifying refgenie’s source code.
Recipes
- Recipes describe how to build an asset of a given class from input assets.
- Recipes are also defined externally and can be distributed independently of refgenie itself.
- Recipes specify the required input asset classes, parameters, and the steps to build the output asset.
Data Channels
- Both asset classes and recipes can be distributed through data channels—remote or local repositories that refgenie can subscribe to.
- This enables community-driven sharing and rapid adoption of new asset types and build methods.
Benefits
Decoupling and Extensibility
- No more internal asset type lock-in: Refgenie no longer restricts users to a fixed set of asset types. Anyone can define new asset classes and recipes.
- Community-driven innovation: Tool developers and users can publish and share new asset types and build recipes, fostering a collaborative ecosystem.
- Distribution via data channels: Asset classes and recipes can be versioned and distributed through data channels, making it easy to adopt new standards or methods.
Solving the 1:1 Asset-Recipe Coupling Problem
Previously, recipes were tightly coupled to asset types. For example, a genome could only have one fasta
asset (e.g., hg38/fasta
), and all recipes requiring a FASTA as input would use this single asset. This design made it impossible to have multiple assets of the same type under a genome namespace (e.g., both hg38/fasta
and hg38/fasta_txome
).
With the new system:
- Multiple assets of the same class can coexist under a genome (e.g.,
hg38/fasta
,hg38/fasta_txome
, etc.). - Recipes can specify which asset of a given class to use as input, allowing for more complex and flexible workflows.
- This decoupling enables scenarios where, for example, you can build a transcriptome FASTA (
fasta_txome
) alongside the primary genome FASTA, and use either as input to downstream recipes.
Example Workflow
- Define an asset class:
Create a YAML file describing the structure of a new asset type (e.g., a custom index). - Write a recipe:
Create a recipe file specifying how to build this asset from input assets. - Distribute via data channel:
Publish the asset class and recipe to a data channel for others to use. - Build assets:
Users can now build multiple assets of the same class under a genome, and recipes can consume any of these as inputs.
Conclusion
The recipe and asset class system in refgenie makes the platform more modular, extensible, and community-driven. By decoupling asset types from internal definitions and enabling external, user-defined recipes and asset classes, refgenie supports a much broader range of use cases and workflows.