The Problem
Bioinformatics pipelines are complex. A typical whole-genome sequencing analysis involves dozens of tools, specific parameter combinations, quality thresholds, and file format conversions. Even experienced bioinformaticians constantly reference documentation for:
- Which aligner to use (BWA-MEM2, Bowtie2, STAR, Minimap2?)
- Correct GATK command sequences for variant calling
- Quality thresholds for filtering (what’s a good Ti/Tv ratio?)
- nf-core pipeline configurations
- R/Python code for differential expression analysis
What if Claude Code could provide expert-level guidance for all of this, instantly?
The Solution: Claude Code Skills
Claude Code supports skills - markdown files that inject domain expertise into the conversation. Unlike MCP servers that require infrastructure, skills are just prompts that load when invoked.
I built a comprehensive genomics skill set that covers the major NGS workflows:
| Skill | Command | Coverage |
|---|---|---|
| Main | /genomics |
Overview, tool selection, routing |
| WGS/WES | /genomics:wgs |
GATK, DeepVariant, Mutect2 |
| RNA-seq | /genomics:rnaseq |
DESeq2, Seurat, Scanpy |
| ChIP/ATAC | /genomics:chipseq |
MACS2, HOMER, TOBIAS |
| Annotation | /genomics:annotation |
VEP, SnpEff, ANNOVAR |
| QC | /genomics:qc |
FastQC, MultiQC, Picard |
| CNV/SV | /genomics:cnv |
GATK CNV, CNVkit, Manta |
Why Skills Over MCP?
I considered building an MCP server (like the AWS HealthOmics MCP), but skills made more sense for this use case:
| Aspect | Skill | MCP Server |
|---|---|---|
| Purpose | Guidance & code generation | Runtime execution |
| Infrastructure | None (just markdown) | Server process |
| Latency | Instant | Network overhead |
| Maintenance | Edit text files | Deploy/monitor service |
| Best for | “How do I…” questions | “Run this workflow” actions |
Skills excel at providing expertise - knowing which tool to use, correct parameters, quality thresholds, and best practices. MCP servers excel at execution - actually running pipelines on cloud infrastructure.
Skill Structure
Each skill file is structured to maximize Claude’s effectiveness:
# WGS/WES Variant Calling Pipeline Skill
You are an expert in whole genome sequencing...
## Workflow Overview
FASTQ → QC → Alignment → Post-processing → Variant Calling → Filtering
## Standard Pipeline
### 1. Quality Control
[FastQC commands with parameters]
### 2. Alignment
[BWA-MEM2 commands with read groups]
### 3. Post-Alignment Processing
[GATK MarkDuplicates, BQSR commands]
## Quality Metrics to Check
| Metric | Good Value | Concern |
|--------|------------|---------|
| Mapping rate | >95% | <90% |
...
## Common Issues & Solutions
1. Low mapping rate: Check read quality, contamination...
Key elements:
- Role definition - “You are an expert in…”
- Workflow overview - Visual pipeline structure
- Complete commands - Copy-paste ready with realistic parameters
- Quality thresholds - Concrete numbers, not vague guidance
- Troubleshooting - Common issues and solutions
What’s Included
WGS/WES Pipeline (/genomics:wgs)
Full GATK best practices workflow:
# Alignment with read groups
bwa-mem2 mem -t 16 \
-R "@RG\tID:${SAMPLE}\tSM:${SAMPLE}\tPL:ILLUMINA\tLB:lib1" \
${REFERENCE} ${R1} ${R2} | \
samtools sort -@ 8 -m 2G -o ${SAMPLE}.sorted.bam -
# BQSR
gatk BaseRecalibrator \
-R ${REFERENCE} \
-I ${SAMPLE}.dedup.bam \
--known-sites ${DBSNP} \
--known-sites ${MILLS_INDELS} \
-O ${SAMPLE}.recal_data.table
# Variant calling
gatk HaplotypeCaller \
-R ${REFERENCE} \
-I ${SAMPLE}.bqsr.bam \
-O ${SAMPLE}.g.vcf.gz \
-ERC GVCF
Plus DeepVariant, Mutect2 for somatic, hard filtering vs VQSR guidance, and nf-core/sarek integration.
RNA-seq Analysis (/genomics:rnaseq)
Covers both traditional alignment and pseudo-alignment approaches:
# Salmon quantification
salmon quant -i salmon_index \
-l A \
-1 sample_R1.fq.gz \
-2 sample_R2.fq.gz \
-p 8 \
--validateMappings \
-o salmon_quant/sample
Complete DESeq2 workflow in R:
library(DESeq2)
library(tximport)
# Import Salmon counts
txi <- tximport(files, type = "salmon", tx2gene = tx2gene)
# Run DESeq2
dds <- DESeqDataSetFromTximport(txi, colData = sample_info, design = ~ condition)
dds <- DESeq(dds)
res <- lfcShrink(dds, coef = "condition_treatment_vs_control", type = "apeglm")
Also includes single-cell analysis with Seurat and Scanpy, pathway analysis with clusterProfiler, and visualization code.
CNV/SV Analysis (/genomics:cnv)
The newest addition - copy number and structural variant detection:
# GATK Somatic CNV
gatk DenoiseReadCounts \
-I tumor.counts.hdf5 \
--count-panel-of-normals cnv_pon.hdf5 \
--standardized-copy-ratios tumor.standardizedCR.tsv \
--denoised-copy-ratios tumor.denoisedCR.tsv
gatk ModelSegments \
--denoised-copy-ratios tumor.denoisedCR.tsv \
--allelic-counts tumor.allelicCounts.tsv \
--output-prefix tumor \
-O segments_output/
Covers CNVkit for WES, Manta/DELLY/GRIDSS for structural variants, SURVIVOR for merging calls, and AnnotSV for annotation.
Quality Control (/genomics:qc)
Comprehensive QC at every stage:
| Stage | Tools | Key Metrics |
|---|---|---|
| Raw reads | FastQC, fastp | Q30%, adapter content |
| Aligned | Picard, mosdepth | Mapping rate, coverage |
| Variants | bcftools stats | Ti/Tv ratio, het/hom |
| Samples | VerifyBamID, Somalier | Contamination, relatedness |
Annotation (/genomics:annotation)
VEP with all the plugins you actually need:
vep -i input.vcf.gz \
--cache --offline --assembly GRCh38 \
--plugin CADD,whole_genome_SNVs.tsv.gz \
--plugin SpliceAI,snv=spliceai_scores.masked.snv.hg38.vcf.gz \
--plugin AlphaMissense,file=AlphaMissense_hg38.tsv.gz \
--plugin REVEL,file=revel_scores.tsv.gz \
--plugin ClinVar,clinvar.vcf.gz \
-o annotated.vcf
Plus filtering strategies for rare disease and cancer, ACMG classification guidance, and database references.
Installation
Clone and use directly:
git clone https://github.com/maratgaliev/claude-skill-genomic-pipelines.git
cd claude-skill-genomic-pipelines
# Claude Code will auto-detect skills
Or copy to existing project:
cp -r claude-skill-genomic-pipelines/.claude /path/to/your/project/
Usage Examples
Designing a WGS pipeline:
/genomics:wgs
I need to set up a germline variant calling pipeline for 50 WGS samples.
We're using GRCh38 and want to use DeepVariant. What's the recommended workflow?
Troubleshooting RNA-seq:
/genomics:rnaseq
My RNA-seq analysis shows very few differentially expressed genes (only 12 with padj < 0.05).
I have 3 replicates per condition. What could be wrong?
CNV analysis:
/genomics:cnv
I have tumor/normal WES data and need to detect copy number alterations.
Should I use GATK CNV or CNVkit? What are the tradeoffs?
Project Structure
.claude/
├── settings.json # Skill registration
└── skills/
├── genomics.md # Main routing skill
├── genomics-wgs.md # 400+ lines of WGS/WES guidance
├── genomics-rnaseq.md # Bulk + scRNA-seq
├── genomics-chipseq.md# Epigenomics
├── genomics-annotation.md
├── genomics-qc.md
└── genomics-cnv.md # CNV/SV analysis
Extending the Skills
To add new capabilities:
- Create a new skill file in
.claude/skills/ - Register it in
.claude/settings.json:
{
"skills": {
"genomics:metagenomics": {
"path": "skills/genomics-metagenomics.md",
"description": "Metagenome analysis (Kraken2, MetaPhlAn, assembly)",
"invocable": true
}
}
}
Potential additions:
- Metagenomics (Kraken2, MetaPhlAn, MAG assembly)
- Long-read analysis (ONT, PacBio specific workflows)
- Spatial transcriptomics
- Multi-omics integration
- Clinical reporting templates
Conclusion
Skills provide a lightweight way to inject domain expertise into Claude Code. For bioinformatics, this means:
- Instant access to best practices and correct parameters
- Complete commands ready to copy and adapt
- Quality thresholds based on community standards
- Troubleshooting guidance for common issues
The genomics skill set covers the major NGS workflows and can be extended for specific needs. No infrastructure required - just markdown files that make Claude an expert bioinformatician.
Repository: github.com/maratgaliev/claude-skill-genomic-pipelines
Resources
- Claude Code Documentation
- nf-core Pipelines - Production-ready Nextflow pipelines
- GATK Best Practices
- Biostars - Bioinformatics Q&A