BigWig visu
Description
This project contains three mains submodules that can be found in the src/ directory:
- The module
bed_handlerallows to make some operations on bed files. This module was designed for a particular project and it is unlikly that you want to use it. - The module
gc_contentallows to create violin plots displaying the GC content of the two or more bed files given in input. It also make a Wilcoxon test to see if the regions contains in those bed files show a difference in their GC-content. - The module
visuallows to display the coverage from bigwig files, created from different conditions, in specific genomic regions defines in two or more bed files.
Prerequisites
This project requires python>=3.8 to work correctly and the following modules must be installed:
- Lazyparser>=0.2.0
- Pandas>=1.0.3
- Loguru>=0.5.3
- numpy>=1.17.4
- pyfaidx>=0.5.7
- biopython>=1.75
- seaborn>=0.10.1
- matplotlib>=3.1.2
Usage
gc_content module
To launch the gc_content module you must enter the following command at the root of this project:
$ python3 -m src.gc_content [PARAMS]
Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below
| Required parameters | Description |
|---|---|
-B / --beds |
A list of beds files containing the regions for which we want to display the coverage |
-b / --bed_names |
A list of names identifying each bed files given in the -B / --beds parameter |
-g / --genome |
A Fasta file containing the entire genome of an organism of interest |
| Optional parameters | Description |
|---|---|
-e / --environment |
Number of nucleotides to display around the genomic intervals defined in the bed files (default 0) |
-f / --ft_names |
A name identifying the kind of genomic intervals defined in the bed files (default: interval) |
Note that you can also display the help for this module by typing:
$ python3 -m src.gc_content --help
The list of element must be separated by a comma when you're writing the command. For example, for the parameter -B if you want to enter 3 beds file you can type
$ python3 -m src.gc_content -B bed1.bed bed2.bed bed3.bed [...]
The [...] represent the last part of the command to write.
visu module
To launch the visu module you must enter the following command at the root of this project:
$ python3 -m src.visu [PARAMS]
Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below
| Required parameters | Description |
|---|---|
-d / --design |
A tabulated file containing 3 columns. The first column contains a bigwig filename, the second contains the condition name and the last one contains the replicate of the condition. |
-B / --bw_folder |
The folder containing the bigwig files mentioned in the first column of the 'design' table |
-r / --region_beds |
A list of one or many bed files containing the genomic intervals to display |
-R / --region_names |
A list of names identifying genomic intervals insides the beds given with the -r / --region_beds parameter. |
Example of the content in the design file:
| bigwig | condition | replicate |
|---|---|---|
| bw1.bw | Cond1 | R1 |
| bw2.bw | Cond1 | R2 |
| bw1.bw | Cond2 | R1 |
| bw2.bw | Cond2 | R2 |
| Optional parameters | Description |
|---|---|
-n / --nb_bin |
An integer corresponding to the number of bins to use to represent the genomic intervals given with -r / --region_beds arguments. (default 100) |
-f / --figure_type |
The kind of representation wanted (barplot or metagene) (default metagene) |
-N / --norm |
A number corresponding to a bin for which the coverage will become 1. 'None' for no normalisation (default 'None'). Note that this parameter can also take a file (description after this table) |
-s / --show_replicates |
'y' to create a figure showing the coverage for all replicates 'n' to display only the average coverage across the conditions defined in -d / --design parameter. (default y) |
-e / --environment |
A list of two integers. The first corresponds to the number of nucleotides to display around the genomic intervals of interest defined with the -r / --region_beds parameter and the second corresponds to the number of bins to use (default 0 0) |
-b / --border_name |
A list of two strings. The name of the left and right border to display in the figures between the genomic intervals defined with the -r / --region_beds and their environment (default '' '') |
-o / --output |
Folder where the figures will be stored (default .) |
-y / --ylim |
A list of two integers that corresponds to the y-axis range in the figure. (default None) |
The figure to give to the --norm parameter must be defined like this:
| condition | region | replicate | coef |
|---|---|---|---|
| Cond1 | interval | R1 | 0.5 |
| Cond1 | interval | R2 | 0.4 |
| Cond2 | interval | R1 | 0.84 |
| Cond2 | interval | R2 | 0.2 |
- The column condition must contain the same conditions defined in the
conditioncolumn of the design file. - The column replicate must contain the same replicates names in the
replicatecolumn of the design file. - The column coef contain a value used to normalise the coverage