Files · d7e9372f5f0c811b0f273cd918df4ee80d9a5406 · LBMC / Hub / hic · GitLab

Snippets Groups Projects

Add benchmark README + figures

Mia Croiset authored 1 month ago

d7e9372f

d7e9372f 1 month ago

Introduction

The meta Hi-C pipeline is regrouping different Hi-C pipelines for Analysis of Chromosome Conformation Capture data.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

The pipeline is based on the nf-core/hic pipeline and the hicstuff pipeline. It is split into two workflows for now, named hicpro and hicstuff.

Workflow summary (hicpro)

Read QC (FastQC)
Hi-C data processing
1. HiC-Pro
  1. Mapping using a two steps strategy to rescue reads spanning the ligation sites (bowtie2)
  2. Detection of valid interaction products
  3. Duplicates removal
  4. Generate raw and normalized contact maps (iced)
Create genome-wide contact maps at various resolutions (cooler)
Contact maps normalization using balancing algorithm (cooler)
Export to various contact maps formats (HiC-Pro, cooler)
Quality controls (HiC-Pro, HiCExplorer)
Compartments calling (cooltools)
TADs calling (HiCExplorer, cooltools)
Quality control report (MultiQC)

Workflow summary (hicstuff)

Hi-C data préparation
Processing
1. Mapping
2. Merge and filter
3. Fragment attribution
4. Matrix generation

Usage

Prepare the environment

If you want to run the pipeline on the PSMN, you first need to set up your PSMN environment if it's not already done.

Then your going to clone this repository in your scratch/Bio directory or locally on your computer

git clone git@gitbio.ens-lyon.fr:LBMC/hub/hic.git

Then cd in the git directory

Get started

Prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2
HIC_ES_4,SRR5339783_1.fastq.gz,SRR5339783_2.fastq.gz

Each row represents a pair of fastq files (paired end). Now, you can run the pipeline using:

nextflow run main.nf \
   -profile psmn \
   --workflow <hicpro/hicstuff> \
   --input samplesheet.csv \
   --fasta <path/to/genome.fasta> \
   --outdir <OUTDIR> \
   --digestion <dpnii/hindiii/arima/mboi>

If your not running the pipeline on the PSMN, make sure you have Docker installed and use -profile docker instead.

For detailed options, please refer to the parameter documentation for hicpro or hicstuff.

Pipeline output

To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page (hicpro workflow, original nf-core/hic pipeline's results). For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/hic was originally written by Nicolas Servant. hicstuff was originally written by Romain Koszul's lab. This pipeline was modified by Mia Croiset.

Support

For further information or help, don't hesitate to get in touch on the Element or by email.