-
Mia Croiset authoredMia Croiset authored
Usage
This document present the usage and parameters of the hicstuff workflow. To see the parameters of the hicpro workflow, go here. Samplesheet input and core arguments are detailed there.
Parameters
Inputs
Inputs are the same as the hicpro workflow
--input
Use this to specify the location of your input FastQ files. For example:
--input 'path/to/data/sample_*_{1,2}.fastq'
Please note the following requirements:
- The path must be enclosed in quotes
- The path must have at least one
*
wildcard character - When using the pipeline with paired end data, the path must use
{1,2}
notation to specify read pairs.
If left unspecified, a default pattern is used: data/*{1,2}.fastq.gz
Note that the Hi-C data analysis workflow requires paired-end data.
Reference genomes
The pipeline config files come bundled with paths to the Illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the AWS-iGenomes resource.
--genome
(using iGenomes)
There are many different species supported in the iGenomes references. To run
the pipeline, you must specify which to use with the --genome
flag.
You can find the keys to specify the genomes in the iGenomes config file.
--fasta
If you prefer, you can specify the full path to your reference genome when you run the pipeline:
--fasta '[path to Fasta reference]'
--bwt2_index
The bowtie2 indexes are required to align the data with the HiC-Pro workflow. If the
--bwt2_index
is not specified, the pipeline will either use the iGenomes
bowtie2 indexes (see --genome
option) or build the indexes on-the-fly
(see --fasta
option)
--bwt2_index '[path to bowtie2 index]'
--chromosome_size
The Hi-C pipeline also requires a two-column text file with the
chromosome name and the chromosome size (tab-separated).
If not specified, this file will be automatically created by the pipeline.
In the latter case, the --fasta
reference genome has to be specified.
chr1 249250621
chr2 243199373
chr3 198022430
chr4 191154276
chr5 180915260
chr6 171115067
chr7 159138663
chr8 146364022
chr9 141213431
chr10 135534747
(...)
--chromosome_size '[path to chromosome size file]'
--restriction_fragments
Finally, Hi-C experiments based on restriction enzyme digestion require a BED file with coordinates of restriction fragments.
chr1 0 16007 HIC_chr1_1 0 +
chr1 16007 24571 HIC_chr1_2 0 +
chr1 24571 27981 HIC_chr1_3 0 +
chr1 27981 30429 HIC_chr1_4 0 +
chr1 30429 32153 HIC_chr1_5 0 +
chr1 32153 32774 HIC_chr1_6 0 +
chr1 32774 37752 HIC_chr1_7 0 +
chr1 37752 38369 HIC_chr1_8 0 +
chr1 38369 38791 HIC_chr1_9 0 +
chr1 38791 39255 HIC_chr1_10 0 +
(...)
If not specified, this file will be automatically created by the pipeline.
In this case, the --fasta
reference genome will be used.
Note that the --digestion
or --restriction_site
parameter is mandatory to create this file.
Digestion of the reads
--cutsite
Generates new gzipped fastq files from original fastq. The function will cut the reads at their religation sites and creates new pairs of reads with the different fragments obtained after cutting at the digestion sites. Default: false
--cutsite
⚠️ Warning: This option cannot be used at the same time as the iterative mapping.
--cutsite_mode
Mode to use to make the digestion. Three values possible: "all", "for_vs_rev", "pile". Default: for_vs_rev
--cutsite_mode '[all | for_vs_rev | pile]'
--cutsite_seed
Minimum size of a fragment (i.e. seed size used in mapping as reads smaller won't be mapped.) Default: 0
--cutsite_seed '[Size of fragment]'
Hicstuff specific parameters
The following options are defined in the nextflow.config
file, and can be
updated either using a custom configuration file (see -c
option) or using
command line parameters.
Mapping
--hicstuff_bwt2_align_opts
Bowtie2 alignment option for normal mode end-to-end mapping. Default: '--very-sensitive-local'
--hicstuff_bwt2_align_opts '[Options for bowtie2 mapping on full reads]'
--iteralign
Truncate reads from a fastq file to 20 basepairs and iteratively extend and re-align the unmapped reads to optimize the proportion of uniquely aligned reads in a 3C library. Default: false
--iteralign
Use --hicstuff_min_qual
to set the minimum quality for the iterative alignment.
--hicstuff_read_len
Read length in the fasta file. If set to None, the length of the first read is used. Set this value to the longest read length in the file if you have different read lengths. Default: None
--hicstuff_read_len '[Wanted read length]'
--save_bam
If specified, save BAM files after mapping (works for normal and iterative mode). Default: false
--save_bam
Fragment enzyme
--hicstuff_min_size
Minimum contig size required to keep it. Default:0
--hicstuff_min_size '[Minimum size value]'
--hicstuff_circular
Use if the genome is circular. Default:false
--hicstuff_circular
--hicstuff_output_contigs
Name of info contigs file. Default: 'info_contigs.txt'
--hicstuff_output_contigs '[Name of info contigs file]'
--hicstuff_output_frags
Name of fragments list file. Default: 'fragments_list.txt'