Setting Up Data Channels for Refgenie
Introduction
This guide walks you through creating your own data channel to distribute custom asset classes and recipes for the refgenie ecosystem.
Learning objectives
- How do I set up my own data channel?
- How can I test a local or remote data channel to make sure it's set up correctly?
Overview
A data channel is a collection of asset classes and recipes hosted at a URL that can be consumed by refgenie clients. Setting up your own data channel allows you to:
- Share custom asset classes and recipes with your organization or the community
- Maintain control over your genomic asset definitions
- Create specialized workflows for your specific use cases
- Contribute to the refgenie ecosystem
Prerequisites
- A GitHub account (for GitHub Pages hosting) or web server
- Basic knowledge of YAML syntax
- Python 3.x installed (for validation tools)
Step 1: Repository Structure
The easiest way to do this is to clone the Official Recipes Repository, and then just delete the recipes and asset classes and add your own. Or, you can create your own repository with the following structure:
my-data-channel/
asset_classes/ # Directory for asset class definitions
*.yaml # Asset class YAML files
recipes/ # Directory for recipe definitions
*.yaml # Recipe YAML files
index.yaml # Index file listing all available files
build_index.py # Script to generate index.yaml
data_channel_check.py # Script to validate channel content
index.html # Optional landing page
Step 2: Create Asset Classes
Asset classes define the structure and seek keys for a type of asset. Create YAML files in the asset_classes/
directory:
Example: asset_classes/my_index_asset_class.yaml
name: my_index
version: 0.0.1
description: Custom index format for my tool
seek_keys:
index_file:
value: "{genome}.idx"
description: Main index file
type: file
metadata:
value: "{genome}.meta"
description: Index metadata
type: file
parents: []
For more details, see Asset Class specification.
Step 3: Create Recipes
Recipes describe how to build an asset class from inputs. Create YAML files in the recipes/
directory:
Example: recipes/my_index_asset_recipe.yaml
name: my_index
version: 0.0.1
output_asset_class: my_index
description: Build custom index from FASTA file
input_files:
fasta:
description: Input FASTA file (gzipped or not)
input_params: null
input_assets: null
docker_image: docker.io/myorg/my-indexer:latest
command_templates:
- my-indexer build {{values.files["fasta"]}} -o {{values.output_folder}}/{{values.genome_digest}}.idx
- my-indexer meta {{values.output_folder}}/{{values.genome_digest}}.idx > {{values.output_folder}}/{{values.genome_digest}}.meta
custom_properties:
version: "my-indexer --version | head -1"
default_asset: "my-indexer-{{values.custom_properties.version}}"
For more details, see Recipe specification.
Step 4: Generate the Index File
Next we need to create the index file. The build_index.py
script will do this, and it will run automatically via a GitHub Action. If you like, you may also run it manually to generate your index:
python build_index.py
The generated index.yaml
will look like:
asset_class:
dir: asset_classes
files:
- my_index_asset_class.yaml
recipe:
dir: recipes
files:
- my_index_asset_recipe.yaml
Step 5: Validate Your Data Channel
Use the included validation script to ensure your channel is properly formatted:
python data_channel_check.py .
Step 6: Host Your Data Channel
Option A: GitHub Pages (Recommended)
- Push your repository to GitHub
- Enable GitHub Pages in repository settings (Settings > Pages)
- Select source branch (usually
main
ormaster
) - Your channel will be available at:
https://[username].github.io/[repository-name]/
You can test it with python data_channel_check.py https://[username].github.io/[repository-name]/
The demo repository also has GitHub Actions already set up to automatically update index.yaml
when files change.
Option B: Static Web Server
Upload your files to any static web server ensuring:
- All YAML files are accessible via HTTP/HTTPS
- CORS headers allow cross-origin requests (if needed)
- The index.yaml
file is at the root of your channel URL
Data Channel API Specification
Instead of hosting static files, you can implement a server that provides the Data Channel API. This allows for dynamic generation of asset classes and recipes, database-backed storage, authentication, and other advanced features.
API Overview
A data channel server must provide the following HTTP endpoints:
1. Index Endpoint
URL: /index.yaml
(or base URL)
Method: GET
Content-Type: text/yaml
or application/x-yaml
Response Format:
asset_class:
dir: asset_classes
files:
- file1.yaml
- file2.yaml
recipe:
dir: recipes
files:
- recipe1.yaml
- recipe2.yaml
The index specifies:
- dir
: The relative path to the directory containing the files
- files
: List of available YAML files
2. Asset Class Endpoints
URL Pattern: /{asset_class_dir}/{filename}
Method: GET
Content-Type: text/yaml
or application/x-yaml
Response: Asset class definition YAML conforming to the Asset Class specification
3. Recipe Endpoints
URL Pattern: /{recipe_dir}/{filename}
Method: GET
Content-Type: text/yaml
or application/x-yaml
Response: Recipe definition YAML conforming to the Recipe specification
API Requirements
To be compatible with refgenie clients, your server must return valid YAML. All responses must be parseable YAML.
Step 7: Use your data channel
Once hosted, test your channel with refgenie:
# Add your channel
refgenie1 data_channel add my-channel https://[username].github.io/[repository-name]/index.yaml
# Sync asset classes and recipes
refgenie1 data_channel sync my-channel --exists-ok
# List available asset classes
refgenie1 asset_class list
# Build an asset using your recipe
refgenie1 build genome/my_index --files fasta=/path/to/genome.fa
Best Practices
- Version Control: Always version your asset classes and recipes using semantic versioning
- Documentation: Include clear descriptions in all YAML files
- Validation: Run validation scripts before publishing updates
- Testing: Test recipes locally before publishing
- Backwards Compatibility: Avoid breaking changes to existing definitions
- Naming Conventions: Use consistent, descriptive names for assets and recipes
- Docker Images: Use specific version tags for Docker images, not
latest
- Security: Never include sensitive information in public channels
Troubleshooting
Common Issues
Index file not updating:
- Ensure
build_index.py
is run after adding/modifying files - Check GitHub Actions logs if using CI/CD
404 errors when accessing channel:
- Verify GitHub Pages is enabled and deployed
- Check the exact URL structure matches your repository name
- Ensure all files are committed and pushed
YAML validation errors:
- Use a YAML linter to check syntax
- Ensure all required fields are present
- Check for proper indentation (spaces, not tabs)
Recipe execution failures:
- Verify Docker image is accessible
- Check command templates for syntax errors
- Ensure all template variables are properly escaped
Contributing to the Official Channel
To contribute to the official refgenie recipes channel:
- Fork the repository: https://github.com/refgenie/recipes
- Add your asset classes and recipes
- Run validation:
python data_channel_check.py .
- Submit a pull request with a clear description
The refgenie team will review and merge approved contributions.