Snakemake with Slurm¶

This page describes how to use Snakemake with Slurm.

Prerequisites¶

This assumes that you have Miniconda properly setup with Bioconda.
Also it assumes that you have already activated the Miniconda base environment with source miniconda/bin/activate.

Environment Setup¶

We first create a new environment snakemake-slurm and activate it. We need the snakemake package for this.

host:~$ conda create -y -n snakemake-slurm snakemake
[...]
#
# To activate this environment, use
#
#     $ conda activate snakemake-slurm
#
# To deactivate an active environment, use
#
#     $ conda deactivate
host:~$ conda activate snakemake-slurm
(snakemake-slurm) host:~$

Snakemake Workflow Setup¶

We create a workflow and ensure that it works properly with multi-threaded Snakemake (no cluster submission here!)

host:~$ mkdir -p snake-slurm
host:~$ cd snake-slurm
host:snake-slurm$ cat >Snakefile <<"EOF"
rule default:
    input: "the-result.txt"

rule mkresult:
    output: "the-result.txt"
    shell: r"sleep 1m; touch the-result.txt"
EOF
host:snake-slurm$ snakemake --cores=1
[...]
host:snake-slurm$ ls
Snakefile  the-result.txt
host:snake-slurm$ rm the-result.txt

Snakemake and Slurm¶

You have two options:

Simply use snakemake --profile=cubi-v1 and the Snakemake resource configuration as shown below. STRONGLY PREFERRED
Use the snakemake --cluster='sbatch ...' command.

Note that we sneaked in a sleep 1m? In a second terminal session, we can see that the job has been submitted to SLURM indeed.

host:~$ squeue  -u holtgrem_c
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               325     debug snakejob holtgrem  R       0:47      1 med0127

Threads & Resources¶

The cubi-v1 profile (stored in /etc/xdg/snakemake/cubi-v1 on all cluster nodes) supports the following specification in your Snakemake rule:

threads: the number of threads to execute the job on
memory in a syntax understood by Slurm, EITHER
- resources.mem/resources.mem_mb: the memory to allocate for the whole job, OR
- resources.mem_per_thread: the memory to allocate for each thread.
resources.time: the running time of the rule, in a syntax supported by Slurm, e.g. HH:MM:SS or D-HH:MM:SS
resources.partition: the partition to submit your job into (Slurm will pick a fitting partition for you by default)
resources.nodes: the number of nodes to schedule your job on (defaults to 1 and you will want to keep that value unless you want to use MPI)

You will need Snakemake >=7.0.2 for this.

Here is how to call Snakemake:

# snakemake --profile=cubi-v1 -j1

To set rule-specific resources:

rule myrule:
    threads: 1
    resources:
        mem='8G',
        time='04:00:00',
    input: # ...
    output: # ...
    shell: # ...

You can combine this with Snakemake resource callables, of course:

def myrule_mem(wildcards, attempt):
    mem = 2 * attempt
    return '%dG' % mem

rule snps:
    threads: 1
    resources:
        mem=myrule_mem,
        time='04:00:00',
    input: # ...
    output: # ...
    shell: # ...

Custom logging directory¶

By default, slurm will write log files into the working directory of snakemake, which will look like slurm-$jobid.out.

To change this behaviour, the environment variable SBATCH_DEFAULTS can be set to re-route the --output parameter. If you want to write your files into slurm_logs with a filename pattern of $name-$jobid for instance, consider the following snippet for your submission script:

#!/bin/bash
#
#SBATCH --job-name=snakemake_main_job
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --time=48:10:00
#SBATCH --mem-per-cpu=300M
#SBATCH --output=slurm_logs/%x-%j.log

mkdir -p slurm_logs
export SBATCH_DEFAULTS=" --output=slurm_logs/%x-%j.log"

date
srun snakemake --use-conda -j1 --profile=cubi-v1
date

The name of the snakemake slurm job will be snakemake_main_job, the name of the jobs spawned from it will be called after the rule name in the Snakefile.