Snakemake Workflow Manager
Aims
Actions
A small selection
LANG
)Won’t be covered in this course.
Quite straightforward ;-)
mamba create -y -n snake-pit python=3.10 snakemake
conda activate snake-pit
… using {input}
and {output}
… using wildcards
{input}
and {output}
are still available and will list all file namesoutput_files
could also have been generated by reading a sample sheetREF = "static/hs38.fasta"
rule output:
input: "case_1.bam"
rule map_reads:
input:
left="input/{sample}_R1.fastq.gz",
right="input/{sample}_R2.fastq.gz",
output:
bam="output/{sample}.bam",
shell:
r"""
seqtk mergpe {input.left} {input.right} \
| bwa mem -t 4 -R "@RG\tID:{wildcards.sample}\tSM:{wildcards.sample}" \
{REF} - \
| samblaster -M \
| samtools sort -O BAM -o {output.bam}
"""
REF = "static/hs38.fasta"
rule output:
input: "output/case_1.bam"
rule map_reads:
input:
left="input/{sample}_R1.fastq.gz",
right="input/{sample}_R2.fastq.gz",
output:
bam="output/{sample}.bam",
threads: 32,
shell:
r"""
seqtk mergpe {input.left} {input.right} \
| bwa mem -t {threads} -R "@RG\tID:{wildcards.sample}\tSM:{wildcards.sample}" \
{REF} - \
| samblaster -M \
| samtools sort -O BAM -o {output.bam}
"""
config: "config.yaml"
rule output:
input: [f"output/{sample}.bam" for sample in config.samples]
rule map_reads:
input:
left="input/{sample}_R1.fastq.gz",
right="input/{sample}_R2.fastq.gz",
output:
bam="output/{sample}.bam",
shell:
r"""
seqtk mergpe {input.left} {input.right} \
| bwa mem -t {config.map_threads} -R "@RG\tID:{wildcards.sample}\tSM:{wildcards.sample}" \
{REF} - \
| samblaster -M \
| samtools sort -@ {config.sort_threads} -O BAM -o {output.bam}
"""
config.yaml
ref: "static/hs38.fasta"
map_threads: 32
sort_threads: 4
samples:
- case_1
- case_2
config: "config.yaml"
rule output:
input: [f"output/{sample}.bam" for sample in config.samples]
rule map_reads:
input:
left="input/{sample}_R1.fastq.gz",
right="input/{sample}_R2.fastq.gz",
output:
bam="output/{sample}.bam",
threads: config.map_threads
resources:
mem_mb=16*1024,
runtime=24*60, # in minutes == 1day
shell:
r"""
seqtk mergpe {input.left} {input.right} \
| bwa mem -t {config.map_threads} -R "@RG\tID:{wildcards.sample}\tSM:{wildcards.sample}" \
{REF} - \
| samblaster -M \
| samtools sort -@ {config.sort_threads} -O BAM -o {output.bam}
"""
(same config.yaml)
config: "config.yaml"
rule output:
input: [f"output/{sample}.bam" for sample in config.samples]
rule map_reads:
input:
left="input/{sample}_R1.fastq.gz",
right="input/{sample}_R2.fastq.gz",
output:
bam="output/{sample}.bam",
threads: config.map_threads
resources:
mem_mb=16*1024,
runtime=24*60, # in minutes == 1day
shell:
r"""
export TMPDIR=$(mktemp -d)
trap "rm -rf $TMPDIR" EXIT
seqtk mergpe {input.left} {input.right} \
| bwa mem -t {config.map_threads} -R "@RG\tID:{wildcards.sample}\tSM:{wildcards.sample}" \
{REF} - \
| samblaster -M \
| samtools sort -T $TMPDIR/tmp -@ {config.sort_threads} -O BAM -o {output.bam}
"""
config: "config.yaml"
rule output:
input: [f"output/{sample}.bam" for sample in config.samples]
rule map_reads:
input:
left="input/{sample}_R1.fastq.gz",
right="input/{sample}_R2.fastq.gz",
output:
bam="output/{sample}.bam",
threads: config.map_threads
resources:
mem_mb=16*1024,
runtime=24*60, # in minutes == 1day
wildcard_constraints:
sample=r"[A-Za-z0-9_\-]+",
shell:
r"""
export TMPDIR=$(mktemp -d)
trap "rm -rf $TMPDIR" EXIT
seqtk mergpe {input.left} {input.right} \
| bwa mem -t {config.map_threads} -R "@RG\tID:{wildcards.sample}\tSM:{wildcards.sample}" \
{REF} - \
| samblaster -M \
| samtools sort -T $TMPDIR/tmp -@ {config.sort_threads} -O BAM -o {output.bam}
"""
In the Snakefile
rule map_reads:
conda: "envs/map_reads.yaml"
envs/map_reads.yaml
:
run locally with given number of cores
$ ls
Snakefile
config.yaml
intput/
$ snakemake --cores=32 output/case_1.bam
(this will not enforce threads or memory)
$ snakemake --jobs=4 --slurm \
--default-resources slurm_partition=long \
output/case_1.bam
.snakemake/slurm_logs/rule_RULE_NAME/JOBID.log
You can also put functions as inputs:
def foo_input(wildcards):
sample_upper wildcards.sample.upper()
return f"input/{sample_upper}.fastq.gz"
rule foo:
input: foo_input
If you want to return named input files, use unpack()
def foo_input(wildcards):
sample_upper wildcards.sample.upper()
return {
"left": f"input/{sample_upper}_1.fastq.gz",
"right": f"input/{sample_upper}_1.fastq.gz",
}
rule foo:
input: unpack(foo_input)
script:
r"""
# ...
seqtk mergepe {input.left} {input.right} \
# ...
"""
Similarly, the resources can be a callable as well:
def foo_threads(wildcards, input):
return random.choice([23, 42])
def foo_resources(wildcards, input, threads, attempt):
# give 20% more resources per attempt
multiplier = 1 + 0.2 * (attempt - 1)
return {
"mem_mb": 1024 * threads * multiplier,
"runtime": 60 * 60 * 24 * multiplier,
}
rule foo:
threads: threads
resources: foo_resources