Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# Usage
This document present the usage and parameters of the **hicstuff workflow**. To see the parameters of the **hicpro workflow**, go [here](usage.md). Samplesheet input and core arguments are detailed there.
# Parameters
## Inputs
Inputs are the same as the hicpro workflow
### `--input`
Use this to specify the location of your input FastQ files. For example:
```bash
--input 'path/to/data/sample_*_{1,2}.fastq'
```
Please note the following requirements:
1. The path must be enclosed in quotes
2. The path must have at least one `*` wildcard character
3. When using the pipeline with paired end data, the path must use `{1,2}`
notation to specify read pairs.
If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz`
Note that the Hi-C data analysis workflow requires paired-end data.
## Reference genomes
The pipeline config files come bundled with paths to the Illumina iGenomes reference
index files. If running with docker or AWS, the configuration is set up to use the
[AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource.
### `--genome` (using iGenomes)
There are many different species supported in the iGenomes references. To run
the pipeline, you must specify which to use with the `--genome` flag.
You can find the keys to specify the genomes in the
[iGenomes config file](https://github.com/nf-core/hic/blob/master/conf/igenomes.config).
### `--fasta`
If you prefer, you can specify the full path to your reference genome when you
run the pipeline:
```bash
--fasta '[path to Fasta reference]'
```
### `--bwt2_index`
The bowtie2 indexes are required to align the data with the HiC-Pro workflow. If the
`--bwt2_index` is not specified, the pipeline will either use the iGenomes
bowtie2 indexes (see `--genome` option) or build the indexes on-the-fly
(see `--fasta` option)
```bash
--bwt2_index '[path to bowtie2 index]'
```
### `--chromosome_size`
The Hi-C pipeline also requires a two-column text file with the
chromosome name and the chromosome size (tab-separated).
If not specified, this file will be automatically created by the pipeline.
In the latter case, the `--fasta` reference genome has to be specified.
```bash
chr1 249250621
chr2 243199373
chr3 198022430
chr4 191154276
chr5 180915260
chr6 171115067
chr7 159138663
chr8 146364022
chr9 141213431
chr10 135534747
(...)
```
```bash
--chromosome_size '[path to chromosome size file]'
```
### `--restriction_fragments`
Finally, Hi-C experiments based on restriction enzyme digestion require a BED
file with coordinates of restriction fragments.
```bash
chr1 0 16007 HIC_chr1_1 0 +
chr1 16007 24571 HIC_chr1_2 0 +
chr1 24571 27981 HIC_chr1_3 0 +
chr1 27981 30429 HIC_chr1_4 0 +
chr1 30429 32153 HIC_chr1_5 0 +
chr1 32153 32774 HIC_chr1_6 0 +
chr1 32774 37752 HIC_chr1_7 0 +
chr1 37752 38369 HIC_chr1_8 0 +
chr1 38369 38791 HIC_chr1_9 0 +
chr1 38791 39255 HIC_chr1_10 0 +
(...)
```
If not specified, this file will be automatically created by the pipeline.
In this case, the `--fasta` reference genome will be used.
Note that the `--digestion` or `--restriction_site` parameter is mandatory to create this file.
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
## Digestion of the reads
### `--cutsite`
Generates new gzipped fastq files from original fastq. The function will cut the reads at their religation sites and creates new pairs of reads with the different fragments obtained after cutting at the digestion sites.
Default: false
```bash
--cutsite
```
> :warning: **Warning**: This option cannot be used at the same time as the iterative mapping.
### `--cutsite_mode`
Mode to use to make the digestion. Three values possible: "all", "for_vs_rev", "pile".
Default: for_vs_rev
```bash
--cutsite_mode '[all | for_vs_rev | pile]'
```
### `--cutsite_seed`
Minimum size of a fragment (i.e. seed size used in mapping as reads smaller won't be mapped.)
Default: 20
```bash
--cutsite_seed '[Size of fragment]'
```
## Hicstuff specific parameters
The following options are defined in the `nextflow.config` file, and can be
updated either using a custom configuration file (see `-c` option) or using
command line parameters.
### Mapping
#### `--hicstuff_bwt2_align_opts`
Bowtie2 alignment option for normal mode end-to-end mapping.
Default: '--very-sensitive-local'
```bash
--hicstuff_bwt2_align_opts '[Options for bowtie2 mapping on full reads]'
```
#### `--iteralign`
Truncate reads from a fastq file to 20 basepairs and iteratively extend and re-align the unmapped reads to optimize the proportion of uniquely aligned reads in a 3C library.
Default: false
```
--iteralign
```
Use [`--min_mapq`](usage.md#min_mapq) to set the minimum quality for the iterative alignment.
#### `--hicstuff_read_len`
Read length in the fasta file. If set to None, the length of the first read is used. Set this value to the longest read length in the file if you have different read lengths.
Default: None
```bash
--hicstuff_read_len '[Wanted read length]'
```
If specified, save BAM files after mapping (works for normal and iterative mode). Default: false
```bash
--save_bam
```
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
### Fragment enzyme
#### `--hicstuff_min_size`
Minimum contig size required to keep it. Default:0
```bash
--hicstuff_min_size '[Minimum size value]'
```
#### `--hicstuff_circular`
Use if the genome is circular. Default:false
```bash
--hicstuff_circular
```
#### `--hicstuff_output_contigs`
Name of info contigs file. Default: 'info_contigs.txt'
```bash
--hicstuff_output_contigs '[Name of info contigs file]'
```
#### `--hicstuff_output_frags`
Name of fragments list file. Default: 'fragments_list.txt'
```bash
--hicstuff_output_frags '[Name of fragments file]'
```
#### `--hicstuff_frags_plot`
Whether fragments plot should be generated. Default: false
```bash
--hicstuff_frags_plot
```
#### `--hicstuff_frags_plot_path`
Name of fragments plot file. Default: 'frags_hist.pdf'
```bash
--hicstuff_frags_plot_path '[Name of fragments plot file]'
```
#### `--save_fragments`
If specified, save fragments file. Default: false
```bash
--save_fragments
```
Use [`--min_mapq`](usage.md#min_mapq) to set the minimum quality for the iterative alignment.
#### `--hicstuff_valid_pairs`
Name of valid pairs file. Default: 'valid.pairs'
```bash
--hicstuff_valid_pairs '[Name of valid pairs file]'
```
#### `--hicstuff_valid_idx`
Name of valid pairs index file. Default: 'valid_idx.pairs'
```bash
--hicstuff_valid_idx '[Name of valid pairs index file]'
```
[#### `--hicstuff_circular`](#hicstuff_circular)
#### `--save_pairs`
If specified, save pair files. Default: false
```bash
--save_pairs
```
### Matrix
#### `--hicstuff_matrix`
Common name of matrix files. Default: 'abs_fragments_contacts_weighted.txt'
```bash
--hicstuff_matrix '[Name of matrix file]'
```
> :warning: **Warning**: Hicstuff builds matrices based on the fragments file
>> For matrices build with fixed bin size, it's done in the COOLER subworkflow.
>> See [bin_size](usage.md#bin_size) to change the size of the bin for fixed bin matrices.
### `--skip_plot_matrix`
Do not plot matrices. Default: false
```bash
--skip_plot_matrix
```
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
### Hicstuff options
#### `--filter_event`
Filter spurious or uninformative 3C events. Requires a restriction enzyme. Default: false
```bash
--filter_event
```
#### `--hicstuff_valid_idx_filtered`
Name of filtered valid pairs index file. Default: false
```bash
--hicstuff_valid_idx_filtered '[Name of filtered valid pairs index file]'
```
#### `--hicstuff_plot_events`
Whether plots should be generated at different steps of the pipeline. Default: false
```bash
--hicstuff_plot_events
```
#### `--hicstuff_dist_plot`
Prefix of distance plot file during filter events. Default: 'dist'
```bash
--hicstuff_dist_plot '[Prefix of distance plot file]'
```
#### `--hicstuff_pie_plot`
Prefix of distribution plot file during filter events. Default: 'distrib'
```bash
--hicstuff_pie_plot '[Prefix of pie plot file]'
```
#### `--distance_law`
If true, generates a distance law file with the values of the probabilities to have a contact between two distances for each chromosomes or arms if the file with the positions has been given. The values are not normalized, or averaged. Default: false
```bash
--distance_law
```
#### `--hicstuff_centro_file`
If not None, path of file with Positions of the centromeres separated by a space and in the same order than the chromosomes. Default: 'None'
```bash
--hicstuff_centro_file '[Path of centromeres file]'
```
#### `--hicstuff_base`
Base use to construct the logspace of the bins for distance law. Default: 1.1
```bash
--hicstuff_base '[Base number]'
```
#### `--hicstuff_distance_out_file`
Name of distance law table file. Default: 'distance_law.txt'
```bash
--hicstuff_distance_out_file '[Name of distance law file]'
```
[#### `--hicstuff_circular`](#hicstuff_circular)
#### `--hicstuff_rm_centro`
If the distance law is computed, this is the number of kb that will be removed around the centromere position given by in the centromere file. Default: 'None'
```bash
--hicstuff_rm_centro '[Number of kb to remove]'
```
#### `--hicstuff_distance_plot`
Whether distance law table should be plotted. Default: false
```bash
--hicstuff_distance_plot
```
#### `--hicstuff_distance_out_plot`
Name of distance law table plot if hicstuff_distance_plot is true. Default: distance_law.txt
```bash
--hicstuff_distance_out_plot '[Name of distance law plot file]'
```
#### `--filter_pcr`
If true, PCR duplicates will be filtered based on genomic positions Pairs where both reads have exactly the same coordinates are considered duplicates and only one of those will be conserved. Default: false
```bash
--filter_pcr
> :warning: **Warning**: if true, `--filter_pcr_picard` **must** be false
#### `--hicstuff_filter_pcr_out_file`
Name of pair file after PCR filtering. Default: 'valid_idx_pcrfree.pairs'
```bash
--hicstuff_filter_pcr_out_file '[Name of pcr free pair file]'
```
If specified, duplicate reads are filtered using PICARD MarkDuplicate method. If true, `--filter_pcr` **must** be false. Default:'false'
#### `--save_bam_intermediates`
If specified, save BAM files after PICARD pcr filtering. Default: false
```bash
--save_bam_intermediates
```