nf-core/configs: DKFZ configuration

To use, run the pipeline with -profile dkfz. This will download and launch the dkfz.config, pre-configured for the Deutsches Krebsforschungszentrum (DKFZ) / ODCF LSF cluster in Heidelberg, Germany.

This configuration is tested with Nextflow 25.10.0 (available on the cluster as a module).

The profile only configures the cluster itself (LSF executor, dynamic queue selection, scratch, resource limits and the /omics bind-mount). Pick a container engine on the command line, e.g. -profile dkfz,apptainer or -profile dkfz,conda.

⚠️ Use Apptainer/Singularity (or Conda), not Docker. On the ODCF cluster Docker is only available through LSF’s docker-generic application profile. Nextflow’s docker executor runs docker run directly on the node, which this setup does not allow, so -profile dkfz,docker will not work. Use -profile dkfz,apptainer instead.

Before you use this profile

  1. Load Nextflow via the environment module system on a submission host. Check the pipeline’s README for the required Nextflow version:

    module load Nextflow/25.10.0
  2. Submit from a submission host (bsub01.lsf.dkfz.de / bsub02.lsf.dkfz.de). Do not run heavy work on the login/worker nodes. Wrap the Nextflow driver itself in a bsub job (see below).

  3. The shared /omics filesystem is bind-mounted into every container automatically. If your inputs or references live elsewhere, point NXF_APPTAINER_CACHEDIR / NXF_SINGULARITY_CACHEDIR at a path under /omics so images are cached on shared storage:

    export NXF_APPTAINER_CACHEDIR=/omics/groups/<your-group>/.../apptainer_cache

Queues

Queue selection is automatic, based on each task’s requested time and memory:

QueueSelected whenLimit
shortno time given, or time <= 10.min10 min
mediumtime <= 1.h1 hour
longtime <= 10.h10 hours
verylongtime > 10.hno hard limit
highmemmemory > 200.GBup to ~4 TB

Note: highmem is the only queue that accepts requests above 200 GB (and it rejects requests below 200 GB).

Resource limits, retries and containers

  • Every task is capped to what the cluster can provide via process.resourceLimits (64 CPUs, 1000 GB memory, 720 h). Requests above these are capped automatically.
  • Unlabelled processes default to a safe 1 CPU / 6 GB / 10 min.
  • The shared /omics filesystem is bound into every container via containerOptions, with --nv added for accelerator tasks. If one of your modules sets its own containerOptions, re-add --bind /omics there.

Enable GPU support

This profile turns any task that requests a GPU through Nextflow’s standard accelerator directive into a correct DKFZ GPU submission. It selects the GPU queue, builds the LSF -gpu num=<n>:j_exclusive=yes[:gmem=<n>G] request, and adds --nv so the GPU is visible inside the container.

How a task acquires an accelerator request depends on the pipeline:

  • nf-core pipelines mark GPU-capable processes with the process_gpu label and only switch the accelerator on when the run includes the gpu profile. So add gpu to your profile list:

    nextflow run <pipeline> -profile dkfz,gpu,apptainer --input ... --outdir ...
  • Custom / non-nf-core pipelines just declare accelerator on the GPU process:

    process MY_GPU_TASK {
        accelerator 1
        container 'docker://nvcr.io/...'
     
        script:
        "my_gpu_tool ..."
    }
    nextflow run main.nf -profile dkfz,apptainer --outdir ...

Tasks without an accelerator request are unaffected and run on the normal CPU queues.

Choosing the GPU queue

The --dkfz_gpu_queue parameter selects which GPU queue all GPU jobs are submitted to (default gpu):

  • gpu — default (RTX 2080 Ti … V100/A100-DGX), 72 h wall time
  • gpu-lowprio — same nodes as gpu but low priority; use for large job batches
  • gpu-pro — high-end A100/H200/L40S/GH200, 142 h wall time — requires a separate access application to the DKFZ Data Science Board

Number of GPUs and GPU memory per process

The profile builds the LSF request as -gpu num=<n>:j_exclusive=yes[:gmem=<n>G] (DKFZ requires j_exclusive=yes and rejects mode=exclusive_process). Two things are tunable per process:

  • Number of GPUs — the accelerator directive (default 1).
  • GPU memory (optional) — set ext.gpu_memory to a Nextflow memory value to pin the job to GPUs with at least that much VRAM. When ext.gpu_memory is unset, gmem is omitted and LSF assigns any free GPU.

Approximate values to target each GPU tier (request at or just below the card’s usable VRAM):

ext.gpu_memoryTargetsQueue
10.GBRTX 2080 Ti (11 GB)gpu
15.GBV100 16 GBgpu
23.GBTITAN RTX / Quadro RTX (24 GB)gpu
31.GBV100 32 GBgpu
40.GBA100 40 GBgpu-pro only
46.GBL40Sgpu-pro only
98.GBGH200gpu-pro only
141.GBH200gpu-pro only

Set these directly on the process, or per process name from config (e.g. nf-core’s conf/modules.config):

process {
    // 2 GPUs, any free GPU (no gmem constraint)
    withName: 'FOO:BAR:ALIGN_GPU' {
        accelerator = 2
    }
    // 1 big-memory GPU
    withName: 'FOO:BAR:FOLD' {
        accelerator    = 1
        ext.gpu_memory = 40.GB   // -> A100/L40S/H200; also set --dkfz_gpu_queue gpu-pro
    }
}

⚠️ Requesting 40.GB or more only works on gpu-pro. On the plain gpu queue such a request hangs in PEND forever. Use at most 12 CPUs and ~45 GB host RAM per GPU (DKFZ GPU usage policy).

Running Nextflow on the cluster

Run the Nextflow driver inside an LSF job rather than on a submission host directly. Make a script and submit it with bsub < my_script.sh:

#!/bin/bash
#BSUB -J nf_pipeline
#BSUB -o nf_pipeline.%J.log
#BSUB -q long
#BSUB -n 2
#BSUB -R "rusage[mem=8G]"
#BSUB -W 10:00
 
module load Nextflow/25.10.0
 
# Cache images on shared storage so worker nodes can reach them:
export NXF_APPTAINER_CACHEDIR=/omics/groups/<your-group>/.../apptainer_cache
 
nextflow run <pipeline> \
    -profile dkfz,apptainer \
    --input samplesheet.csv \
    --outdir results

Add gpu to -profile (e.g. -profile dkfz,gpu,apptainer) to send process_gpu tasks to a GPU queue.

Config file

See config file on GitHub

conf/dkfz
// Institutional profile for the DKFZ / ODCF LSF cluster.
 
params {
    config_profile_description = 'Deutsches Krebsforschungszentrum (DKFZ) ODCF HPC cluster profile'
    config_profile_contact     = 'Abid Abrar (abid.abrar@dkfz-heidelberg.de), Kübra Narcı (kuebra.narci@dkfz-heidelberg.de)'
    config_profile_name        = 'DKFZ Cluster'
    config_profile_url         = 'https://www.dkfz.de'
 
    max_cpus   = 64
    max_memory = '1000.GB'
    max_time   = '720.h'
 
    // GPU queue for GPU jobs (options: gpu (default), gpu-lowprio, gpu-pro)
    dkfz_gpu_queue = 'gpu'
}
 
apptainer {
    enabled    = true
    autoMounts = true
}
 
// Ignore the custom dkfz_gpu_queue param in nf-schema validation
validation.ignoreParams = ['dkfz_gpu_queue']
 
process {
    executor = 'lsf'
    scratch  = '$CLUSTER_SCRATCHDIR'
 
    // Retry transient failures: no exit status, signals 130–145 (137 = OOM/preempt), 104/255 (I/O drops)
    errorStrategy = { (task.exitStatus == null || task.exitStatus == Integer.MAX_VALUE || task.exitStatus in ((130..145) + [104, 255])) ? 'retry' : 'finish' }
    maxRetries    = 3
    cache         = 'lenient'
 
    // Cap every task to the cluster ceiling: 64 cores, 1000 GB RAM, 720 h (30 day) wall time
    resourceLimits = [
        cpus  : 64,
        memory: 1000.GB,
        time  : 720.h,
    ]
 
    // Low defaults for unlabelled processes
    cpus   = 1
    memory = 6.GB
    time   = 10.min
 
    // GPU tasks go to a GPU queue; everything else to a CPU queue by time/memory.
    queue = {
        if (task.accelerator) {
            return params.dkfz_gpu_queue
        } else if (task.memory && task.memory > 200.GB) {
            return 'highmem'
        } else if (!task.time || task.time <= 10.min) {
            return 'short'
        } else if (task.time <= 1.h) {
            return 'medium'
        } else if (task.time <= 10.h) {
            return 'long'
        } else {
            return 'verylong'
        }
    }
 
    // GPU request, depends on `accelerator`: a nf-core `process_gpu` task without
    // `-profile gpu` has no accelerator, so it stays on CPU. 
    // j_exclusive=yes is mandatory
    // optional `ext.gpu_memory` pins to GPUs with at least that much VRAM.
    clusterOptions = {
        if (!task.accelerator) {
            return null
        }
        def gpu = "-gpu num=${task.accelerator.request}:j_exclusive=yes"
        if (task.ext.gpu_memory) {
            gpu += ":gmem=${task.ext.gpu_memory.toGiga()}G"
        }
        return gpu
    }
 
    // Bind /omics into every container; add --nv for GPU tasks.
    containerOptions = { task.accelerator ? '--bind /omics --nv' : '--bind /omics' }
}
 
executor {
    name            = 'lsf'
    perJobMemLimit  = true
    perTaskReserve  = false
    queueSize       = 10
    submitRateLimit = '1 sec'
    exitReadTimeout = '30 min'
}