Skip to content
Snippets Groups Projects
hicstuff_usage.md 8.95 KiB
Newer Older
Mia Croiset's avatar
Mia Croiset committed
# Usage
This document present the usage and parameters of the **hicstuff workflow**. To see the parameters of the **hicpro workflow**, go [here](usage.md). Samplesheet input and core arguments are detailed there.

# Parameters

## Inputs
Inputs are the same as the hicpro workflow

### `--input`

Use this to specify the location of your input FastQ files. For example:

```bash
--input 'path/to/data/sample_*_{1,2}.fastq'
```

Please note the following requirements:

1. The path must be enclosed in quotes
2. The path must have at least one `*` wildcard character
3. When using the pipeline with paired end data, the path must use `{1,2}`
   notation to specify read pairs.

If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz`

Note that the Hi-C data analysis workflow requires paired-end data.

## Reference genomes

The pipeline config files come bundled with paths to the Illumina iGenomes reference
index files. If running with docker or AWS, the configuration is set up to use the
[AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource.

### `--genome` (using iGenomes)

There are many different species supported in the iGenomes references. To run
the pipeline, you must specify which to use with the `--genome` flag.

You can find the keys to specify the genomes in the
[iGenomes config file](https://github.com/nf-core/hic/blob/master/conf/igenomes.config).

### `--fasta`

If you prefer, you can specify the full path to your reference genome when you
run the pipeline:

```bash
--fasta '[path to Fasta reference]'
```

### `--bwt2_index`

The bowtie2 indexes are required to align the data with the HiC-Pro workflow. If the
`--bwt2_index` is not specified, the pipeline will either use the iGenomes
bowtie2 indexes (see `--genome` option) or build the indexes on-the-fly
(see `--fasta` option)

```bash
--bwt2_index '[path to bowtie2 index]'
```

### `--chromosome_size`

The Hi-C pipeline also requires a two-column text file with the
chromosome name and the chromosome size (tab-separated).
If not specified, this file will be automatically created by the pipeline.
In the latter case, the `--fasta` reference genome has to be specified.

```bash
   chr1    249250621
   chr2    243199373
   chr3    198022430
   chr4    191154276
   chr5    180915260
   chr6    171115067
   chr7    159138663
   chr8    146364022
   chr9    141213431
   chr10   135534747
   (...)
```

```bash
--chromosome_size '[path to chromosome size file]'
```

### `--restriction_fragments`

Finally, Hi-C experiments based on restriction enzyme digestion require a BED
file with coordinates of restriction fragments.

```bash
   chr1   0       16007   HIC_chr1_1    0   +
   chr1   16007   24571   HIC_chr1_2    0   +
   chr1   24571   27981   HIC_chr1_3    0   +
   chr1   27981   30429   HIC_chr1_4    0   +
   chr1   30429   32153   HIC_chr1_5    0   +
   chr1   32153   32774   HIC_chr1_6    0   +
   chr1   32774   37752   HIC_chr1_7    0   +
   chr1   37752   38369   HIC_chr1_8    0   +
   chr1   38369   38791   HIC_chr1_9    0   +
   chr1   38791   39255   HIC_chr1_10   0   +
   (...)
```

If not specified, this file will be automatically created by the pipeline.
In this case, the `--fasta` reference genome will be used.
Note that the `--digestion` or `--restriction_site` parameter is mandatory to create this file.

## Hicstuff specific parameters

The following options are defined in the `nextflow.config` file, and can be
updated either using a custom configuration file (see `-c` option) or using
command line parameters.

### Mapping

#### `--hicstuff_bwt2_align_opts`

Bowtie2 alignment option for end-to-end mapping.
Default: '--very-sensitive-local'

```bash
--hicstuff_bwt2_align_opts '[Options for bowtie2 mapping on full reads]'
```

#### `--save_bam`

If specified, save BAM files after mapping. Default: false

```bash
--save_bam
```

Mia Croiset's avatar
Mia Croiset committed
### Fragment enzyme

#### `--hicstuff_min_size`
Minimum contig size required to keep it. Default:0

```bash
--hicstuff_min_size '[Minimum size value]'
```

#### `--hicstuff_circular`
Use if the genome is circular. Default:false

```bash
--hicstuff_circular
```

#### `--hicstuff_output_contigs`
Name of info contigs file. Default: 'info_contigs.txt'

```bash
--hicstuff_output_contigs '[Name of info contigs file]'
```

#### `--hicstuff_output_frags`
Name of fragments list file. Default: 'fragments_list.txt'

```bash
--hicstuff_output_frags '[Name of fragments file]'
```

#### `--hicstuff_frags_plot`
Whether fragments plot should be generated. Default: false

```bash
--hicstuff_frags_plot
```

#### `--hicstuff_frags_plot_path`
Name of fragments plot file. Default: 'frags_hist.pdf'

```bash
--hicstuff_frags_plot_path '[Name of fragments plot file]'
```

#### `--save_fragments`

If specified, save fragments file. Default: false

```bash
--save_fragments
```

Mia Croiset's avatar
Mia Croiset committed
### Bam2pairs

#### `--hicstuff_valid_pairs`
Name of valid pairs file. Default: 'valid.pairs'

```bash
--hicstuff_valid_pairs '[Name of valid pairs file]'
```

#### `--hicstuff_valid_idx`
Name of valid pairs index file. Default: 'valid_idx.pairs'

```bash
--hicstuff_valid_idx '[Name of valid pairs index file]'
```

#### `--hicstuff_min_qual`
Minimum mapping quality required to keep a pair of Hi-C reads. Default:30

```bash
--hicstuff_min_qual '[Minimum quality value]'
```

[#### `--hicstuff_circular`](#hicstuff_circular)

#### `--save_pairs`

If specified, save pair files. Default: false

```bash
--save_pairs
```


Mia Croiset's avatar
Mia Croiset committed
### Matrix

#### `--hicstuff_matrix`
Common name of matrix files. Default: 'abs_fragments_contacts_weighted.txt'

```bash
--hicstuff_matrix '[Name of matrix file]'
```

#### `--hicstuff_bin`
Binsize for plotting matrix. Default: 10000

```bash
--hicstuff_bin [binsize]
```

> :warning: **Warning**: Depending of the size of your input, the bin size may not correspond and make the pipeline fail
>>10000 is default for human genome.
>>For yeast for example you may want to use a smaller bin

### Hicstuff options

#### `--filter_event`
Filter spurious or uninformative 3C events. Requires a restriction enzyme. Default: false

```bash
--filter_event
```

#### `--hicstuff_valid_idx_filtered`
Name of filtered valid pairs index file. Default: false

```bash
--hicstuff_valid_idx_filtered '[Name of filtered valid pairs index file]'
```

#### `--hicstuff_plot_events`
Whether plots should be generated at different steps of the pipeline. Default: false

```bash
--hicstuff_plot_events
```

#### `--hicstuff_dist_plot`
Prefix of distance plot file during filter events. Default: 'dist'

```bash
--hicstuff_dist_plot '[Prefix of distance plot file]'
```

#### `--hicstuff_pie_plot`
Prefix of distribution plot file during filter events. Default: 'distrib'

```bash
--hicstuff_pie_plot '[Prefix of pie plot file]'
```

#### `--distance_law`
If true, generates a distance law file with the values of the probabilities to have a contact between two distances for each chromosomes or arms if the file with the positions has been given. The values are not normalized, or averaged. Default: false

```bash
--distance_law
```

#### `--hicstuff_centro_file`
If not None, path of file with Positions of the centromeres separated by a space and in the same order than the chromosomes. Default: 'None'

```bash
--hicstuff_centro_file '[Path of centromeres file]'
```

#### `--hicstuff_base`
Base use to construct the logspace of the bins for distance law. Default: 1.1

```bash
--hicstuff_base '[Base number]'
```

#### `--hicstuff_distance_out_file`
Name of distance law table file. Default: 'distance_law.txt'

```bash
--hicstuff_distance_out_file '[Name of distance law file]'
```

[#### `--hicstuff_circular`](#hicstuff_circular)

#### `--hicstuff_rm_centro`
If the distance law is computed, this is the number of kb that will be removed around the centromere position given by in the centromere file. Default: 'None'

```bash
--hicstuff_rm_centro '[Number of kb to remove]'
```

#### `--hicstuff_distance_plot`
Whether distance law table should be plotted. Default: false

```bash
--hicstuff_distance_plot
```

#### `--hicstuff_distance_out_plot`
Name of distance law table plot if hicstuff_distance_plot is true. Default: distance_law.txt

```bash
--hicstuff_distance_out_plot '[Name of distance law plot file]'
```

#### `--hicstuff_filter_pcr`
Mia Croiset's avatar
Mia Croiset committed
If true, PCR duplicates will be filtered based on genomic positions Pairs where both reads have exactly the same coordinates are considered duplicates and only one of those will be conserved. Default: false

```bash
--hicstuff_filter_pcr
```
> :warning: **Warning**: if true, `--filter_pcr_picard` **must** be false
Mia Croiset's avatar
Mia Croiset committed

#### `--hicstuff_filter_pcr_out_file`
Name of pair file after PCR filtering. Default: 'valid_idx_pcrfree.pairs'

```bash
--hicstuff_filter_pcr_out_file '[Name of pcr free pair file]'
```

#### `--filter_pcr_picard`
If specified, duplicate reads are filtered using PICARD MarkDuplicate method. If true, `--hicstuff_filter_pcr` **must** be false. Default:'false'
Mia Croiset's avatar
Mia Croiset committed

```bash
--filter_pcr_picard
#### `--save_bam_intermediates`

If specified, save BAM files after PICARD pcr filtering. Default: false

```bash
--save_bam_intermediates
```