Building a Genomics Skill for Claude Code


The Problem

Bioinformatics pipelines are complex. A typical whole-genome sequencing analysis involves dozens of tools, specific parameter combinations, quality thresholds, and file format conversions. Even experienced bioinformaticians constantly reference documentation for:

  • Which aligner to use (BWA-MEM2, Bowtie2, STAR, Minimap2?)
  • Correct GATK command sequences for variant calling
  • Quality thresholds for filtering (what’s a good Ti/Tv ratio?)
  • nf-core pipeline configurations
  • R/Python code for differential expression analysis

What if Claude Code could provide expert-level guidance for all of this, instantly?

The Solution: Claude Code Skills

Claude Code supports skills - markdown files that inject domain expertise into the conversation. Unlike MCP servers that require infrastructure, skills are just prompts that load when invoked.

I built a comprehensive genomics skill set that covers the major NGS workflows:

Skill Command Coverage
Main /genomics Overview, tool selection, routing
WGS/WES /genomics:wgs GATK, DeepVariant, Mutect2
RNA-seq /genomics:rnaseq DESeq2, Seurat, Scanpy
ChIP/ATAC /genomics:chipseq MACS2, HOMER, TOBIAS
Annotation /genomics:annotation VEP, SnpEff, ANNOVAR
QC /genomics:qc FastQC, MultiQC, Picard
CNV/SV /genomics:cnv GATK CNV, CNVkit, Manta

Why Skills Over MCP?

I considered building an MCP server (like the AWS HealthOmics MCP), but skills made more sense for this use case:

Aspect Skill MCP Server
Purpose Guidance & code generation Runtime execution
Infrastructure None (just markdown) Server process
Latency Instant Network overhead
Maintenance Edit text files Deploy/monitor service
Best for “How do I…” questions “Run this workflow” actions

Skills excel at providing expertise - knowing which tool to use, correct parameters, quality thresholds, and best practices. MCP servers excel at execution - actually running pipelines on cloud infrastructure.

Skill Structure

Each skill file is structured to maximize Claude’s effectiveness:

# WGS/WES Variant Calling Pipeline Skill

You are an expert in whole genome sequencing...

## Workflow Overview
FASTQ → QC → Alignment → Post-processing → Variant Calling → Filtering

## Standard Pipeline
### 1. Quality Control
[FastQC commands with parameters]

### 2. Alignment
[BWA-MEM2 commands with read groups]

### 3. Post-Alignment Processing
[GATK MarkDuplicates, BQSR commands]

## Quality Metrics to Check
| Metric | Good Value | Concern |
|--------|------------|---------|
| Mapping rate | >95% | <90% |
...

## Common Issues & Solutions
1. Low mapping rate: Check read quality, contamination...

Key elements:

  • Role definition - “You are an expert in…”
  • Workflow overview - Visual pipeline structure
  • Complete commands - Copy-paste ready with realistic parameters
  • Quality thresholds - Concrete numbers, not vague guidance
  • Troubleshooting - Common issues and solutions

What’s Included

WGS/WES Pipeline (/genomics:wgs)

Full GATK best practices workflow:

# Alignment with read groups
bwa-mem2 mem -t 16 \
    -R "@RG\tID:${SAMPLE}\tSM:${SAMPLE}\tPL:ILLUMINA\tLB:lib1" \
    ${REFERENCE} ${R1} ${R2} | \
    samtools sort -@ 8 -m 2G -o ${SAMPLE}.sorted.bam -

# BQSR
gatk BaseRecalibrator \
    -R ${REFERENCE} \
    -I ${SAMPLE}.dedup.bam \
    --known-sites ${DBSNP} \
    --known-sites ${MILLS_INDELS} \
    -O ${SAMPLE}.recal_data.table

# Variant calling
gatk HaplotypeCaller \
    -R ${REFERENCE} \
    -I ${SAMPLE}.bqsr.bam \
    -O ${SAMPLE}.g.vcf.gz \
    -ERC GVCF

Plus DeepVariant, Mutect2 for somatic, hard filtering vs VQSR guidance, and nf-core/sarek integration.

RNA-seq Analysis (/genomics:rnaseq)

Covers both traditional alignment and pseudo-alignment approaches:

# Salmon quantification
salmon quant -i salmon_index \
    -l A \
    -1 sample_R1.fq.gz \
    -2 sample_R2.fq.gz \
    -p 8 \
    --validateMappings \
    -o salmon_quant/sample

Complete DESeq2 workflow in R:

library(DESeq2)
library(tximport)

# Import Salmon counts
txi <- tximport(files, type = "salmon", tx2gene = tx2gene)

# Run DESeq2
dds <- DESeqDataSetFromTximport(txi, colData = sample_info, design = ~ condition)
dds <- DESeq(dds)
res <- lfcShrink(dds, coef = "condition_treatment_vs_control", type = "apeglm")

Also includes single-cell analysis with Seurat and Scanpy, pathway analysis with clusterProfiler, and visualization code.

CNV/SV Analysis (/genomics:cnv)

The newest addition - copy number and structural variant detection:

# GATK Somatic CNV
gatk DenoiseReadCounts \
    -I tumor.counts.hdf5 \
    --count-panel-of-normals cnv_pon.hdf5 \
    --standardized-copy-ratios tumor.standardizedCR.tsv \
    --denoised-copy-ratios tumor.denoisedCR.tsv

gatk ModelSegments \
    --denoised-copy-ratios tumor.denoisedCR.tsv \
    --allelic-counts tumor.allelicCounts.tsv \
    --output-prefix tumor \
    -O segments_output/

Covers CNVkit for WES, Manta/DELLY/GRIDSS for structural variants, SURVIVOR for merging calls, and AnnotSV for annotation.

Quality Control (/genomics:qc)

Comprehensive QC at every stage:

Stage Tools Key Metrics
Raw reads FastQC, fastp Q30%, adapter content
Aligned Picard, mosdepth Mapping rate, coverage
Variants bcftools stats Ti/Tv ratio, het/hom
Samples VerifyBamID, Somalier Contamination, relatedness

Annotation (/genomics:annotation)

VEP with all the plugins you actually need:

vep -i input.vcf.gz \
    --cache --offline --assembly GRCh38 \
    --plugin CADD,whole_genome_SNVs.tsv.gz \
    --plugin SpliceAI,snv=spliceai_scores.masked.snv.hg38.vcf.gz \
    --plugin AlphaMissense,file=AlphaMissense_hg38.tsv.gz \
    --plugin REVEL,file=revel_scores.tsv.gz \
    --plugin ClinVar,clinvar.vcf.gz \
    -o annotated.vcf

Plus filtering strategies for rare disease and cancer, ACMG classification guidance, and database references.

Installation

Clone and use directly:

git clone https://github.com/maratgaliev/claude-skill-genomic-pipelines.git
cd claude-skill-genomic-pipelines
# Claude Code will auto-detect skills

Or copy to existing project:

cp -r claude-skill-genomic-pipelines/.claude /path/to/your/project/

Usage Examples

Designing a WGS pipeline:

/genomics:wgs

I need to set up a germline variant calling pipeline for 50 WGS samples.
We're using GRCh38 and want to use DeepVariant. What's the recommended workflow?

Troubleshooting RNA-seq:

/genomics:rnaseq

My RNA-seq analysis shows very few differentially expressed genes (only 12 with padj < 0.05).
I have 3 replicates per condition. What could be wrong?

CNV analysis:

/genomics:cnv

I have tumor/normal WES data and need to detect copy number alterations.
Should I use GATK CNV or CNVkit? What are the tradeoffs?

Project Structure

.claude/
├── settings.json          # Skill registration
└── skills/
    ├── genomics.md        # Main routing skill
    ├── genomics-wgs.md    # 400+ lines of WGS/WES guidance
    ├── genomics-rnaseq.md # Bulk + scRNA-seq
    ├── genomics-chipseq.md# Epigenomics
    ├── genomics-annotation.md
    ├── genomics-qc.md
    └── genomics-cnv.md    # CNV/SV analysis

Extending the Skills

To add new capabilities:

  1. Create a new skill file in .claude/skills/
  2. Register it in .claude/settings.json:
{
  "skills": {
    "genomics:metagenomics": {
      "path": "skills/genomics-metagenomics.md",
      "description": "Metagenome analysis (Kraken2, MetaPhlAn, assembly)",
      "invocable": true
    }
  }
}

Potential additions:

  • Metagenomics (Kraken2, MetaPhlAn, MAG assembly)
  • Long-read analysis (ONT, PacBio specific workflows)
  • Spatial transcriptomics
  • Multi-omics integration
  • Clinical reporting templates

Conclusion

Skills provide a lightweight way to inject domain expertise into Claude Code. For bioinformatics, this means:

  • Instant access to best practices and correct parameters
  • Complete commands ready to copy and adapt
  • Quality thresholds based on community standards
  • Troubleshooting guidance for common issues

The genomics skill set covers the major NGS workflows and can be extended for specific needs. No infrastructure required - just markdown files that make Claude an expert bioinformatician.

Repository: github.com/maratgaliev/claude-skill-genomic-pipelines

Resources