GPU-Enabled Workflows with Docker and WDL

Optimize bioinformatics with Google Cloud: Learn how to create GPU-enabled Docker containers on Artifact Registry, and their application in running WDLs through the Cromwell.

Aug 18, 2023

As the data we deal with in bioinformatics gets more and more complex, using GPU acceleration in Docker containers presents a good strategy to speed things up.

In this blog, we will learn how to:

create a GPU-enabled Docker container
push the custom Docker image to Google Artifacts Registry
craft a WDL script that leverages the GPU-enabled Docker
run a WDL using Cromwell

GPU-Enabled Docker Containers with WDL for Bioinformatics Workflows

0. Setting Up the Environment:

Start by making sure you have Docker installed on your system, or VM. Next, choose an artifact registry like Google Artifact Registry or AWS Elastic Container Registry and set it up according to their instructions. Finally, you can run the following linux command to check if your host machine's GPU is configured properly

nvidia-smi

If everything is in order, you'll see detailed information about your GPU.

1. Build a GPU-Enabled Docker Image

To begin, create a file named Dockerfile with the following content:

FROM nvidia/cuda:12.2.0-devel-ubuntu20.04

This line sets up a GPU-enabled environment using NVIDIA's CUDA 12.2 development image based on Ubuntu 20.04.

From here, you can further customize the image to include specific tools or libraries relevant to your bioinformatics workflow, like so:

FROM nvidia/cuda:12.2.0-devel-ubuntu20.04

WORKDIR /usr/src/app

RUN apt-get update && apt-get install -y \
    git \
    wget
    
ENV CONDA_DIR /opt/conda
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py39_23.5.2-0-Linux-x86_64.sh -O ~/miniconda.sh && \
    /bin/bash ~/miniconda.sh -b -p /opt/conda

ENV PATH=$CONDA_DIR/bin:$PATH

RUN git clone https://github.com/example/example && \
    cd example && \
    python3 setup.py install

Build a Docker image named ‘example’ with the following command:

docker build -t example .

Test the image locally:

docker run --rm --gpus=all -it example

If you see a GPU driver incompatibility error, you may use the following to avoid it:

docker run --rm --gpus=all --env NVIDIA_DISABLE_REQUIRE=1 -it example

2. Push the Image to the Artifact Registry

Once you've built and tested the Docker image locally, you may want to deploy it in a cloud environment, like the Google Artifact Registry (GAR).

You'll need to authenticate your local Docker client with GAR so you can push and pull images from that registry.

gcloud auth configure-docker us-central1-docker.pkg.dev

Next, tag the image with the full registry path including the location (us-central1), project ID (project-1), repository name (repository-1) and image name (example):

docker tag example us-central1-docker.pkg.dev/project-1/repository-1/example

Finally, push the tagged image to the registry:

docker push us-central1-docker.pkg.dev/project-1/repository-1/example

3. Writing the WDL Task

Now let’s create a file named example_workflow.wdl to define a task to run an example command using the GPU-enabled Docker image we created earlier.

task example {
  String sequence_input

  runtime {
    docker: "us-central1-docker.pkg.dev/project-1/repository-1/example"
    memory: "16 GB"
    cpu: "8"
    disks: "local-disk 500 HDD"
    gpuCount: 1
    bootDiskSizeGb: 100
  }

  command <<<
   
    example [...] > result
    tar -cvf result.tar result/

  >>>
  output {
    File example_output = "result.tar"
  }
}

workflow example_workflow {
  String sequence_input

  call example {
    input:
      sequence_input = sequence_input
  }
  output{
    File example_output_workflow = example.example_output
    }
}

The next step is to execute it. We'll use Cromwell, a WDL execution engine.

java -jar cromwell.jar run my_task.wdl -i inputs.json

You'll need to create an inputs.json file with the necessary inputs. For the example workflow we defined earlier, the inputs file might look like this:

{
  "example_workflow.sequence_input": "your_sequence_here"
}

Cromwell will provide log output as the workflow runs, allowing you to monitor its progress.

With these steps, you can create, push, and use GPU-enabled Docker containers in your bioinformatics workflows using WDL.

WDL specifications: https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md
Setting WDL runtime attributes for Cromwell: https://cromwell.readthedocs.io/en/stable/RuntimeAttributes/

Up to Data