diff --git a/docs/output.md b/docs/output.md index 2953d3209a3c9b7ec639d74394c70345090cf508..20bf6b9ed02ad93c8d07343855a3219aad6eb63b 100644 --- a/docs/output.md +++ b/docs/output.md @@ -9,20 +9,20 @@ The directories listed below will be created in the results directory after the The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -- [From raw data to valid pairs](#from-raw-data-to-valid-pairs) - - [HiC-Pro](#hicpro) - - [Reads alignment](#reads-alignment) - - [Valid pairs detection](#valid-pairs-detection) - - [Duplicates removal](#duplicates-removal) - - [Contact maps](#hicpro-contact-maps) -- [Hi-C contact maps](#hic-contact-maps) -- [Downstream analysis](#downstream-analysis) - - [Distance decay](#distance-decay) - - [Compartments calling](#compartments-calling) - - [TADs calling](#tads-calling) -- [MultiQC](#multiqc) - aggregate report and quality controls, describing +* [From raw data to valid pairs](#from-raw-data-to-valid-pairs) + * [HiC-Pro](#hicpro) + * [Reads alignment](#reads-alignment) + * [Valid pairs detection](#valid-pairs-detection) + * [Duplicates removal](#duplicates-removal) + * [Contact maps](#hicpro-contact-maps) +* [Hi-C contact maps](#hic-contact-maps) +* [Downstream analysis](#downstream-analysis) + * [Distance decay](#distance-decay) + * [Compartments calling](#compartments-calling) + * [TADs calling](#tads-calling) +* [MultiQC](#multiqc) - aggregate report and quality controls, describing results of the whole pipeline -- [Export](#exprot) - additionnal export for compatibility with downstream +* [Export](#exprot) - additionnal export for compatibility with downstream analysis tool and visualization ## From raw data to valid pairs @@ -50,17 +50,17 @@ mapping step. **Output directory: `results/hicpro/mapping`** -- `*bwt2pairs.bam` - final BAM file with aligned paired data +* `*bwt2pairs.bam` - final BAM file with aligned paired data if `--save_aligned_intermediates` is specified, additional mapping file results are available ; -- `*.bam` - Aligned reads (R1 and R2) from end-to-end alignment -- `*_unmap.fastq` - Unmapped reads after end-to-end alignment -- `*_trimmed.fastq` - Trimmed reads after end-to-end alignment -- `*_trimmed.bam` - Alignment of trimmed reads -- `*bwt2merged.bam` - merged BAM file after the two-steps alignment -- `*.mapstat` - mapping statistics per read mate +* `*.bam` - Aligned reads (R1 and R2) from end-to-end alignment +* `*_unmap.fastq` - Unmapped reads after end-to-end alignment +* `*_trimmed.fastq` - Trimmed reads after end-to-end alignment +* `*_trimmed.bam` - Alignment of trimmed reads +* `*bwt2merged.bam` - merged BAM file after the two-steps alignment +* `*.mapstat` - mapping statistics per read mate Usually, a high fraction of reads is expected to be aligned on the genome (80-90%). Among them, we usually observed a few percent (around 10%) of step 2 @@ -79,14 +79,14 @@ reference genome and the digestion protocol. Invalid pairs are classified as follow: -- Dangling end, i.e. unligated fragments (both reads mapped on the same +* Dangling end, i.e. unligated fragments (both reads mapped on the same restriction fragment) -- Self circles, i.e. fragments ligated on themselves (both reads mapped on the +* Self circles, i.e. fragments ligated on themselves (both reads mapped on the same restriction fragment in inverted orientation) -- Religation, i.e. ligation of juxtaposed fragments -- Filtered pairs, i.e. any pairs that do not match the filtering criteria on +* Religation, i.e. ligation of juxtaposed fragments +* Filtered pairs, i.e. any pairs that do not match the filtering criteria on inserts size, restriction fragments size -- Dumped pairs, i.e. any pairs for which we were not able to reconstruct the +* Dumped pairs, i.e. any pairs for which we were not able to reconstruct the ligation product. Only valid pairs involving two different restriction fragments are used to @@ -102,12 +102,12 @@ can thus be discarded using the `--min_cis_dist` parameter. **Output directory: `results/hicpro/valid_pairs`** -- `*.validPairs` - List of valid ligation products -- `*.DEpairs` - List of dangling-end products -- `*.SCPairs` - List of self-circle products -- `*.REPairs` - List of religation products -- `*.FiltPairs` - List of filtered pairs -- `*RSstat` - Statitics of number of read pairs falling in each category +* `*.validPairs` - List of valid ligation products +* `*.DEpairs` - List of dangling-end products +* `*.SCPairs` - List of self-circle products +* `*.REPairs` - List of religation products +* `*.FiltPairs` - List of filtered pairs +* `*RSstat` - Statitics of number of read pairs falling in each category Of note, these results are saved only if `--save_pairs_intermediates` is used. The `validPairs` are stored using a simple tab-delimited text format ; @@ -138,7 +138,7 @@ removed (see `--keep_dups` to disable duplicates filtering). **Output directory: `results/hicpro/valid_pairs`** -- `*allValidPairs` - combined valid pairs from all read chunks +* `*allValidPairs` - combined valid pairs from all read chunks Additional quality controls such as fragment size distribution can be extracted from the list of valid interaction products. @@ -160,7 +160,7 @@ detection of valid pairs. **Output directory: `results/hicpro/valid_pairs/pairix`** -- `*pairix` - compressed and indexed pairs file +* `*pairix` - compressed and indexed pairs file #### Statistics @@ -169,10 +169,10 @@ All results are available in `results/hicpro/stats`. **Output directory: `results/hicpro/stats`** -- *mapstat - mapping statistics per read mate -- *pairstat - R1/R2 pairing statistics -- *RSstat - Statitics of number of read pairs falling in each category -- *mergestat - statistics about duplicates removal and valid pairs information +* \*mapstat - mapping statistics per read mate +* \*pairstat - R1/R2 pairing statistics +* \*RSstat - Statitics of number of read pairs falling in each category +* \*mergestat - statistics about duplicates removal and valid pairs information #### Contact maps @@ -192,15 +192,15 @@ is specified on the command line. **Output directory: `results/hicpro/matrix`** -- `*.matrix` - genome-wide contact maps -- `*_iced.matrix` - genome-wide iced contact maps +* `*.matrix` - genome-wide contact maps +* `*_iced.matrix` - genome-wide iced contact maps The contact maps are generated for all specified resolutions (see `--bin_size` argument). A contact map is defined by : -- A list of genomic intervals related to the specified resolution (BED format). -- A matrix, stored as standard triplet sparse format (i.e. list format). +* A list of genomic intervals related to the specified resolution (BED format). +* A matrix, stored as standard triplet sparse format (i.e. list format). Based on the observation that a contact map is symmetric and usually sparse, only non-zero values are stored for half of the matrix. The user can specified @@ -254,8 +254,8 @@ Here, we use the implementation available in the [`cooltools`](https://cooltools Results are available in **`results/compartments/`** folder and includes : -- `*cis.vecs.tsv`: eigenvectors decomposition along the genome -- `*cis.lam.txt`: eigenvalues associated with the eigenvectors +* `*cis.vecs.tsv`: eigenvectors decomposition along the genome +* `*cis.lam.txt`: eigenvalues associated with the eigenvectors ### TADs calling @@ -266,8 +266,8 @@ TADs calling remains a challenging task, and even if many methods have been prop Currently, the pipeline proposes two approaches : -- Insulation score using the [`cooltools`](https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-diamond-insulation) package. Results are availabe in **`results/tads/insulation`**. -- [`HiCExplorer TADs calling`](https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html). Results are available at **`results/tads/hicexplorer`**. +* Insulation score using the [`cooltools`](https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-diamond-insulation) package. Results are availabe in **`results/tads/insulation`**. +* [`HiCExplorer TADs calling`](https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html). Results are available at **`results/tads/hicexplorer`**. Usually, TADs results are presented as simple BED files, or bigWig files, with the position of boundaries along the genome. @@ -276,10 +276,10 @@ Usually, TADs results are presented as simple BED files, or bigWig files, with t <details markdown="1"> <summary>Output files</summary> -- `multiqc/` - - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. - - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. - - `multiqc_plots/`: directory containing static images from the report in various formats. +* `multiqc/` + * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. + * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. + * `multiqc_plots/`: directory containing static images from the report in various formats. </details> @@ -292,10 +292,10 @@ Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQ <details markdown="1"> <summary>Output files</summary> -- `pipeline_info/` - - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline. - - Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. +* `pipeline_info/` + * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline. + * Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. </details>