Introduction
If you need to download a tool to do a quick and dirty job, there’s more than one way to skin a cat. Below are examples of a few different methods of installation:
Using ready-made executables
These rare unicorns do not require any compilation, you just have to download the executable and place it in your $PATH.
See below for an example using the nextflow executable:
Navigate to the
nextflowgithub repository, and view the latest releases.
wget -nv https://github.com/nextflow-io/nextflow/releases/download/v21.12.1-edge/nextflow && chmod 777 ./nextflow
./nextflow -v
mv ./nextflow /usr/bin/
You can now access the nextflow executable anywhere on your VM.
Compiling from source code
The vast majority of bioinformatics tools are written in C or C++. It is worthwhile knowing how to compile these tools manually.
We will install samtools from source:
Naviagte to the samtools github repository, and view the latest releases.
wget -nv https://github.com/samtools/samtools/releases/download/1.14/samtools-1.14.tar.bz2
tar -xf samtools-1.14.tar.bz2
cd samtools-1.14/
make
./samtools --version
mv samtools /usr/bin
Warning
You may run into errors here if you don’t have the neccessary libraries installed
Installation instructions are usually in the README file, quite simple to follow:
Building samtools
=================
The typical simple case of building Samtools using the HTSlib bundled within
this Samtools release tarball is done as follows:
cd .../samtools-1.14 # Within the unpacked release directory
./configure
make
You may wish to copy the resulting samtools executable into somewhere on your
$PATH, or run it where it is.
Rather than running-in-place like that, the next simplest typical case is to
install samtools etc properly into a directory of your choosing. Building for
installation using the HTSlib bundled within this Samtools release tarball,
and building the various HTSlib utilities such as bgzip is done as follows:
cd .../samtools-1.14 # Within the unpacked release directory
./configure --prefix=/path/to/location
make all all-htslib
make install install-htslib
The main thing to look out for is the Makefile for compiling C code.
Via the Conda repository
Create an environment and install samtools.
conda activate
conda install bioconda::samtools
samtools --version
conda deactivate
Using Singularity
There are plenty of container images for samtools available.
The syntax for downloading from quay.io is as follows: docker://quay.io/biocontainers/<tool_name>:<tool_tag>. All of this information can be accessed at https://quay.io/repository/biocontainers/samtools.
singularity pull --name samtools.img docker://quay.io/biocontainers/samtools:1.13--h8c37831_0
singularity shell -B $(pwd) samtools.img
samtools --version
Using Docker
Similar to singularity, but downloaded as an image and not a ‘physical’ file you can move.
docker pull staphb/samtools
docker run -it staph/samtools
samtools --version
Building a project
It makes much more sense to utilise a container or a dedicated environment for your project - this will enhance the reproducibility of your project should you return to it at a later date.
Conda Environments
The main advantage of conda is that one can create a ‘clean slate’ environment for a project - a directory that contains a specific collection of conda packages you have installed that will not interfere with other environments or your system.
To create a new environment, run the following command:
$ conda create -n test_env
Activate/deactivate the environment using:
$ conda activate test_env
$ conda deactivate test_env
Installing packages
There are 2 ways to install packages using conda:
$ conda activate test_env
$ conda install bioconda::fastqc
Or specify the package version:
$ conda activate test_env
$ conda install bioconda::fastqc=0.11.9
Warning
Be very careful using pinned versions of packages. In some scenarios a pinned package will require outdated dependencies, causing a conflict when compiling the environment.
YAML Files
The preferred, reproducible method for installing conda packages is to use a YAML file.
See below for a YAML file to recapitulate the test_env we created above:
Note
Delete test_env - we will recreate it using YAML files as a proof of concept: conda env remove --name test_env --all
name: test_env
channels:
- bioconda
dependencies:
- fastqc
Save the file and name it environment.yml. Now compile the environment using conda:
$ conda env create -f environment.yml && conda clean -a
$ conda activate test_env
$ fastqc -h
Executable directory
Where have the environments and packages been installed?
The environments are stored under:
$ ls -la ~/.conda/envs/
To take a look at the executables in the test_env environment:
$ ls -la ~/.conda/envs/test_env/bin/
Docker Containers
Note
You will need a Dockerhub account to store your images remotely.
Dockerfile
To create a Docker container, we need to construct a Dockerfile which contains instructions on which base image to use, and installation rules.
We will create a conda environment within our Docker container, we will need a valid environment.yml file. For the sake of demonstration we will use the example given above:
name: test_env
channels:
- bioconda
dependencies:
- fastqc
In the directory where your environment.yml file is located, create a Dockerfile:
FROM nfcore/base:1.14
LABEL authors="Barry Digby" \
description="Docker container containing fastqc"
WORKDIR ./
COPY environment.yml ./
RUN conda env create -f environment.yml && conda clean -a
ENV PATH /opt/conda/envs/test_env/bin:$PATH
We are using a pre-built ubuntu image (FROM nfcore/base:1.14) that comes with Conda pre-installed developed by nf-core.
Note
In your Dockerhub account, create a repository called ‘test’. We will build and push the docker image in the following section.
Build image
To build the image, run the following command:
$ docker build -t USERNAME/test $(pwd)
Check image
You can shell into your image to double check that the tools have been installed correctly:
$ docker images # check images in cache
$ docker run -it barryd237/test
Push to Dockerhub
Now the image has been created, push to Dockerhub:
First time push requires you to login:
$ docker login
$ sudo chmod 666 /var/run/docker.sock
$ docker push USERNAME/test
Advanced use
There will be scenarios in which your tool of choice is not in the Anaconda repository meaning you cannot download it via the environment.yml file.
You will have to provide install instructions to the Dockerfile.
Note
This is fairly tedious, you have to perform a dry-run locally before providing the instructions to the Dockerfile.
Let’s pretend that Bowtie2 is not available via the Anaconda repository - go to the Github repository containing the latest release: https://github.com/BenLangmead/bowtie2
Download the lastest release (
2.4.X) ofBowtie2. Make sure to download theSource code (tar.gz)file.Untar the archive file by running
tar -xvzf v2.4.5.tar.gz.Move to the unzipped directory and figure out if you need to compile the source code. (There is a
Makefilepresent - we do need to compile the code).In the
bowtie2-2.4.5/directory, run the commandmaketo compile the code.Do you need to change permissions for the executables?
Move the executables to somewhere in your
$PATH. This can be done two ways:By moving the executables to a directory in your
$PATHsuch as/usr/local/bin,/usr/binetc like so:sudo mv bowtie2-2.4.5/bowtie2* /usr/local/bin/.By manually adding a directory to your
$PATH:export PATH="/data/bowtie2-2.4.5/:$PATH".
Test the install by printing the documentation:
bowtie2 -h
You will need to perform each of the above tasks in your Dockerfile - which is done ‘blind’ hence the need for a dry-run.
Note
Whilst the nf-core image we are using contains a handful of tools, containers are usually a clean slate. You have to install basics such as unzip, curl etc..
FROM nfcore/base:1.14
LABEL authors="Barry Digby" \
description="Docker container containing stuff"
# We need to install tar
RUN apt-get update; apt-get clean all; apt-get install --yes tar
# Install our conda environment, if you want to.
WORKDIR ./
COPY environment.yml ./
RUN conda env create -f environment.yml && conda clean -a
ENV PATH=/opt/conda/envs/test_env/bin:$PATH
# Chain the commands together
RUN mkdir -p /usr/src/scratch && \\
cd /usr/src/scratch && \\
wget https://github.com/BenLangmead/bowtie2/archive/refs/tags/v2.4.5.tar.gz && \\
tar -xvzf v2.4.5.tar.gz && \\
cd bowtie2-2.4.5/ && \\
make
ENV PATH=/usr/src/scratch/bowtie2-2.4.5/:$PATH
Note
Use RUN commands sparingly! Chain your commands together where possible - each RUN command will create a new layer in the Docker image - causing unnecessary bloat.