Skip to content
Snippets Groups Projects
Commit 97d3d5da authored by nservant's avatar nservant
Browse files

update doc

parent 44350168
No related branches found
No related tags found
No related merge requests found
......@@ -44,14 +44,14 @@ sites (bowtie2)
The nf-core/hic pipeline comes with documentation about the pipeline, found in
the `docs/` directory:
1. [Installation](docs/installation.md)
1. [Installation](https://nf-co.re/usage/installation)
2. Pipeline configuration
* [Local installation](docs/configuration/local.md)
* [Adding your own system](docs/configuration/adding_your_own.md)
* [Reference genomes](docs/configuration/reference_genomes.md)
* [Local installation](https://nf-co.re/usage/local_installation)
* [Adding your own system config](https://nf-co.re/usage/adding_own_config)
* [Reference genomes](https://nf-co.re/usage/reference_genomes)
3. [Running the pipeline](docs/usage.md)
4. [Output and how to interpret the results](docs/output.md)
5. [Troubleshooting](docs/troubleshooting.md)
5. [Troubleshooting](https://nf-co.re/usage/troubleshooting)
## Credits
......
# nf-core/hic: Configuration for other clusters
It is entirely possible to run this pipeline on other clusters, though you will
need to set up your own config file so that the pipeline knows how to work with
your cluster.
> If you think that there are other people using the pipeline who would benefit
from your configuration (eg. other common cluster setups), please let us know.
We can add a new configuration and profile which can used by specifying
`-profile <name>` when running the pipeline. The config file will then be
hosted at `nf-core/configs` and will be pulled automatically before the pipeline
is executed.
If you are the only person to be running this pipeline, you can create your
config file as `~/.nextflow/config` and it will be applied every time you run
Nextflow. Alternatively, save the file anywhere and reference it when running
the pipeline with `-c path/to/config` (see the
[Nextflow documentation](https://www.nextflow.io/docs/latest/config.html)
for more).
A basic configuration comes with the pipeline, which loads the
[`conf/base.config`](../../conf/base.config) by default. This means that you
only need to configure the specifics for your system and overwrite any defaults
that you want to change.
## Cluster Environment
By default, pipeline uses the `local` Nextflow executor - in other words, all
jobs are run in the login session. If you're using a simple server, this may be
fine. If you're using a compute cluster, this is bad as all jobs will run on
the head node.
To specify your cluster environment, add the following line to your config
file:
```nextflow
process.executor = 'YOUR_SYSTEM_TYPE'
```
Many different cluster types are supported by Nextflow. For more information,
please see the
[Nextflow documentation](https://www.nextflow.io/docs/latest/executor.html).
Note that you may need to specify cluster options, such as a project or queue.
To do so, use the `clusterOptions` config option:
```nextflow
process {
executor = 'SLURM'
clusterOptions = '-A myproject'
}
```
## Software Requirements
To run the pipeline, several software packages are required. How you satisfy
these requirements is essentially up to you and depends on your system.
If possible, we _highly_ recommend using either Docker or Singularity.
Please see the [`installation documentation`](../installation.md) for how to
run using the below as a one-off. These instructions are about configuring a
config file for repeated use.
### Docker
Docker is a great way to run nf-core/hic, as it manages all software
installations and allows the pipeline to be run in an identical software
environment across a range of systems.
Nextflow has
[excellent integration](https://www.nextflow.io/docs/latest/docker.html)
with Docker, and beyond installing the two tools, not much else is required -
nextflow will automatically fetch the
[nfcore/hic](https://hub.docker.com/r/nfcore/hic/) image that we have created
and is hosted at dockerhub at run time.
To add docker support to your own config file, add the following:
```nextflow
docker.enabled = true
process.container = "nfcore/hic"
```
Note that the dockerhub organisation name annoyingly can't have a hyphen,
so is `nfcore` and not `nf-core`.
### Singularity image
Many HPC environments are not able to run Docker due to security issues.
[Singularity](http://singularity.lbl.gov/) is a tool designed to run on such
HPC systems which is very similar to Docker.
To specify singularity usage in your pipeline config file, add the following:
```nextflow
singularity.enabled = true
process.container = "shub://nf-core/hic"
```
If you intend to run the pipeline offline, nextflow will not be able to
automatically download the singularity image for you.
Instead, you'll have to do this yourself manually first, transfer the image
file and then point to that.
First, pull the image file where you have an internet connection:
```bash
singularity pull --name nf-core-hic.simg shub://nf-core/hic
```
Then transfer this file and point the config file to the image:
```nextflow
singularity.enabled = true
process.container = "/path/to/nf-core-hic.simg"
```
### Conda
If you're not able to use Docker or Singularity, you can instead use conda to
manage the software requirements.
To use conda in your own config file, add the following:
```nextflow
process.conda = "$baseDir/environment.yml"
```
# nf-core/hic: Local Configuration
If running the pipeline in a local environment, we highly recommend using
either Docker or Singularity.
## Docker
Docker is a great way to run `nf-core/hic`, as it manages all software
installations and allows the pipeline to be run in an identical software
environment across a range of systems.
Nextflow has
[excellent integration](https://www.nextflow.io/docs/latest/docker.html) with
Docker, and beyond installing the two tools, not much else is required.
The `nf-core/hic` profile comes with a configuration profile for docker, making
it very easy to use. This also comes with the required presets to use the AWS
iGenomes resource, meaning that if using common reference genomes you just
specify the reference ID and it will be automatically downloaded from AWS S3.
First, install docker on your system:
[Docker Installation Instructions](https://docs.docker.com/engine/installation/)
Then, simply run the analysis pipeline:
```bash
nextflow run nf-core/hic -profile docker --genome '<genome ID>'
```
Nextflow will recognise `nf-core/hic` and download the pipeline from GitHub.
The `-profile docker` configuration lists the
[nf-core/hic](https://hub.docker.com/r/nfcore/hic/) image that we have created
and is hosted at dockerhub, and this is downloaded.
For more information about how to work with reference genomes, see
[`docs/configuration/reference_genomes.md`](reference_genomes.md).
### Pipeline versions
The public docker images are tagged with the same version numbers as the code,
which you can use to ensure reproducibility. When running the pipeline,
specify the pipeline version with `-r`, for example `-r 1.0`. This uses
pipeline code and docker image from this tagged version.
## Singularity image
Many HPC environments are not able to run Docker due to security issues.
[Singularity](http://singularity.lbl.gov/) is a tool designed to run on such
HPC systems which is very similar to Docker. Even better, it can use create
images directly from dockerhub.
To use the singularity image for a single run, use `-with-singularity`.
This will download the docker container from dockerhub and create a singularity
image for you dynamically.
If you intend to run the pipeline offline, nextflow will not be able to
automatically download the singularity image for you. Instead, you'll have
to do this yourself manually first, transfer the image file and then point to
that.
First, pull the image file where you have an internet connection:
> NB: The "tag" at the end of this command corresponds to the pipeline version.
> Here, we're pulling the docker image for version 1.0 of the nf-core/hic
pipeline
> Make sure that this tag corresponds to the version of the pipeline that
you're using
```bash
singularity pull --name nf-core-hic-1.0.img docker://nf-core/hic:1.0
```
Then transfer this file and run the pipeline with this path:
```bash
nextflow run /path/to/nf-core-hic -with-singularity /path/to/nf-core-hic-1.0.img
```
# nf-core/hic: Reference Genomes Configuration
The nf-core/hic pipeline needs a reference genome for alignment and annotation.
These paths can be supplied on the command line at run time (see the
[usage docs](../usage.md)),
but for convenience it's often better to save these paths in a nextflow config
file.
See below for instructions on how to do this.
Read [Adding your own system](adding_your_own.md) to find out how to set up
custom config files.
## Adding paths to a config file
Specifying long paths every time you run the pipeline is a pain.
To make this easier, the pipeline comes configured to understand reference
genome keywords which correspond to preconfigured paths, meaning that you can
just specify `--genome ID` when running the pipeline.
Note that this genome key can also be specified in a config file if you always
use the same genome.
To use this system, add paths to your config file using the following template:
```nextflow
params {
genomes {
'YOUR-ID' {
fasta = '<PATH TO FASTA FILE>/genome.fa'
}
'OTHER-GENOME' {
// [..]
}
}
// Optional - default genome. Ignored if --genome 'OTHER-GENOME' specified
// on command line
genome = 'YOUR-ID'
}
```
You can add as many genomes as you like as long as they have unique IDs.
## illumina iGenomes
To make the use of reference genomes easier, illumina has developed a
centralised resource called
[iGenomes](https://support.illumina.com/sequencing/sequencing_software/igenome.html).
Multiple reference index types are held together with consistent structure for
multiple genomes.
We have put a copy of iGenomes up onto AWS S3 hosting and this pipeline is
configured to use this by default.
The hosting fees for AWS iGenomes are currently kindly funded by a grant from
Amazon.
The pipeline will automatically download the required reference files when you
run the pipeline.
For more information about the AWS iGenomes, see
[AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/)
Downloading the files takes time and bandwidth, so we recommend making a local
copy of the iGenomes resource.
Once downloaded, you can customise the variable `params.igenomes_base` in your
custom configuration file to point to the reference location.
For example:
```nextflow
params.igenomes_base = '/path/to/data/igenomes/'
```
# nf-core/hic: Installation
To start using the nf-core/hic pipeline, follow the steps below:
1. [Install Nextflow](#1-install-nextflow)
2. [Install the pipeline](#2-install-the-pipeline)
* [Automatic](#21-automatic)
* [Offline](#22-offline)
* [Development](#23-development)
3. [Pipeline configuration](#3-pipeline-configuration)
* [Software deps: Docker and Singularity](#31-software-deps-docker-and-singularity)
* [Software deps: Bioconda](#32-software-deps-bioconda)
* [Configuration profiles](#33-configuration-profiles)
4. [Reference genomes](#4-reference-genomes)
## 1) Install NextFlow
Nextflow runs on most POSIX systems (Linux, Mac OSX etc). It can be installed
by running the following commands:
```bash
# Make sure that Java v8+ is installed:
java -version
# Install Nextflow
curl -fsSL get.nextflow.io | bash
# Add Nextflow binary to your PATH:
mv nextflow ~/bin/
# OR system-wide installation:
# sudo mv nextflow /usr/local/bin
```
See [nextflow.io](https://www.nextflow.io/) for further instructions on how to
install and configure Nextflow.
## 2) Install the pipeline
### 2.1) Automatic
This pipeline itself needs no installation - NextFlow will automatically fetch
it from GitHub if `nf-core/hic` is specified as the pipeline name.
### 2.2) Offline
The above method requires an internet connection so that Nextflow can download
the pipeline files. If you're running on a system that has no internet
connection, you'll need to download and transfer the pipeline files manually:
```bash
wget https://github.com/nf-core/hic/archive/master.zip
mkdir -p ~/my-pipelines/nf-core/
unzip master.zip -d ~/my-pipelines/nf-core/
cd ~/my_data/
nextflow run ~/my-pipelines/nf-core/hic-master
```
To stop nextflow from looking for updates online, you can tell it to run in
offline mode by specifying the following environment variable in your
~/.bashrc file:
```bash
export NXF_OFFLINE='TRUE'
```
### 2.3) Development
If you would like to make changes to the pipeline, it's best to make a fork on
GitHub and then clone the files. Once cloned you can run the pipeline directly
as above.
## 3) Pipeline configuration
By default, the pipeline loads a basic server configuration
[`conf/base.config`](../conf/base.config)
This uses a number of sensible defaults for process requirements and is
suitable for running on a simple (if powerful!) local server.
Be warned of two important points about this default configuration:
1. The default profile uses the `local` executor
* All jobs are run in the login session. If you're using a simple server,
this may be fine. If you're using a compute cluster, this is bad as all jobs
will run on the head node.
* See the
[nextflow docs](https://www.nextflow.io/docs/latest/executor.html) for
information about running with other hardware backends. Most job scheduler
systems are natively supported.
2. Nextflow will expect all software to be installed and available on the
`PATH`
* It's expected to use an additional config profile for docker, singularity
or conda support. See below.
### 3.1) Software deps: Docker
First, install docker on your system:
[Docker Installation Instructions](https://docs.docker.com/engine/installation/)
Then, running the pipeline with the option `-profile docker` tells Nextflow to
enable Docker for this run. An image containing all of the software
requirements will be automatically fetched and used from
[dockerhub](https://hub.docker.com/r/nfcore/hic).
### 3.1) Software deps: Singularity
If you're not able to use Docker then
[Singularity](http://singularity.lbl.gov/) is a great alternative.
The process is very similar: running the pipeline with the option
`-profile singularity` tells Nextflow to enable singularity for this run.
An image containing all of the software requirements will be automatically
fetched and used from singularity hub.
If running offline with Singularity, you'll need to download and transfer the
Singularity image first:
```bash
singularity pull --name nf-core-hic.simg shub://nf-core/hic
```
Once transferred, use `-with-singularity` and specify the path to the image
file:
```bash
nextflow run /path/to/nf-core-hic -with-singularity nf-core-hic.simg
```
Remember to pull updated versions of the singularity image if you update the
pipeline.
### 3.2) Software deps: conda
If you're not able to use Docker _or_ Singularity, you can instead use conda to
manage the software requirements.
This is slower and less reproducible than the above, but is still better than
having to install all requirements yourself!
The pipeline ships with a conda environment file and nextflow has built-in
support for this.
To use it first ensure that you have conda installed (we recommend
[miniconda](https://conda.io/miniconda.html)), then follow the same pattern
as above and use the flag `-profile conda`
### 3.3) Configuration profiles
See [`docs/configuration/adding_your_own.md`](configuration/adding_your_own.md)
## 4) Reference genomes
See [`docs/configuration/reference_genomes.md`](configuration/reference_genomes.md)
# nf-core/hic: Troubleshooting
## Input files not found
If only no file, only one input file , or only read one and not read two is
picked up then something is wrong with your input file declaration
1. The path must be enclosed in quotes (`'` or `"`)
2. The path must have at least one `*` wildcard character. This is even if
you are only running one paired end sample.
3. When using the pipeline with paired end data, the path must use `{1,2}` or
`{R1,R2}` notation to specify read pairs.
4. If you are running Single end data make sure to specify `--singleEnd`
If the pipeline can't find your files then you will get the following error
```bash
ERROR ~ Cannot find any reads matching: *{1,2}.fastq.gz
```
Note that if your sample name is "messy" then you have to be very particular
with your glob specification. A file name like
`L1-1-D-2h_S1_L002_R1_001.fastq.gz` can be difficult enough for a human to
read. Specifying `*{1,2}*.gz` wont work whilst `*{R1,R2}*.gz` will.
## Data organization
The pipeline can't take a list of multiple input files - it takes a glob
expression. If your input files are scattered in different paths then we
recommend that you generate a directory with symlinked files. If running
in paired end mode please make sure that your files are sensibly named so
that they can be properly paired. See the previous point.
## Extra resources and getting help
If you still have an issue with running the pipeline then feel free to
contact us.
Have a look at the [pipeline website](https://github.com/nf-core/hic) to
find out how.
If you have problems that are related to Nextflow and not our pipeline then
check out the [Nextflow gitter channel](https://gitter.im/nextflow-io/nextflow)
or the [google group](https://groups.google.com/forum/#!forum/nextflow).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment