BigWig visu
Description
This project contains three mains submodules that can be found in the src/
directory:
- The module
bed_handler
allows to make some operations on bed files. This module was designed for a particular project and it is unlikly that you want to use it. - The module
gc_content
allows to create violin plots displaying the GC content of the two or more bed files given in input. It also make a Wilcoxon test to see if the regions contains in those bed files show a difference in their GC-content. - The module
visu
allows to display the coverage from bigwig files, created from different conditions, in specific genomic regions defines in two or more bed files.
Prerequisites
This project requires python>=3.8
to work correctly and the following modules must be installed:
- Lazyparser>=0.2.0
- Pandas>=1.0.3
- Loguru>=0.5.3
- numpy>=1.17.4
- pyfaidx>=0.5.7
- biopython>=1.75
- seaborn>=0.10.1
- matplotlib>=3.1.2
Build the Sphinx documentation
The documentation was build using Sphinx
.
To build the documentation (which contains this document and the Api of the program), go in the doc folder and enter:
$ make html
Usage
gc_content
module
To launch the gc_content
module you must enter the following command at the root of this project:
$ python3 -m src.gc_content [PARAMS]
Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below
Required parameters | Description |
---|---|
-B / --beds |
A list of beds files containing the regions for which we want to display the coverage |
-b / --bed_names |
A list of names identifying each bed files given in the -B / --beds parameter |
-g / --genome |
A Fasta file containing the entire genome of an organism of interest |
Optional parameters | Description |
---|---|
-e / --environment |
Number of nucleotides to display around the genomic intervals defined in the bed files (default 0) |
-f / --ft_names |
A name identifying the kind of genomic intervals defined in the bed files (default: interval) |
Note that you can also display the help for this module by typing:
$ python3 -m src.gc_content --help
The list of element must be separated by a comma when you're writing the command. For example, for the parameter -B
if you want to enter 3 beds file you can type
$ python3 -m src.gc_content -B bed1.bed bed2.bed bed3.bed [...]
The [...]
represent the last part of the command to write.
visu
module
To launch the visu
module you must enter the following command at the root of this project:
$ python3 -m src.visu [PARAMS]
Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below
Required parameters | Description |
---|---|
-d / --design |
A tabulated file containing 3 columns. The first column contains a bigwig filename, the second contains the condition name and the last one contains the replicate of the condition. |
-B / --bw_folder |
The folder containing the bigwig files mentioned in the first column of the 'design' table |
-r / --region_beds |
A list of one or many bed files containing the genomic intervals to display |
-R / --region_names |
A list of names identifying genomic intervals insides the beds given with the -r / --region_beds parameter. |
Example of the content in the design file:
bigwig | condition | replicate |
---|---|---|
bw1.bw | Cond1 | R1 |
bw2.bw | Cond1 | R2 |
bw1.bw | Cond2 | R1 |
bw2.bw | Cond2 | R2 |
Optional parameters | Description |
---|---|
-n / --nb_bin |
An integer corresponding to the number of bins to use to represent the genomic intervals given with -r / --region_beds arguments. (default 100) |
-f / --figure_type |
The kind of representation wanted (barplot or metagene) (default metagene) |
-N / --norm |
A number corresponding to a bin for which the coverage will become 1. 'None' for no normalisation (default 'None'). Note that this parameter can also take a file (description after this table) |
-s / --show_replicates |
'y' to create a figure showing the coverage for all replicates 'n' to display only the average coverage across the conditions defined in -d / --design parameter. (default y) |
-e / --environment |
A list of two integers. The first corresponds to the number of nucleotides to display around the genomic intervals of interest defined with the -r / --region_beds parameter and the second corresponds to the number of bins to use (default 0 0) |
-b / --border_name |
A list of two strings. The name of the left and right border to display in the figures between the genomic intervals defined with the -r / --region_beds and their environment (default '' '') |
-o / --output |
Folder where the figures will be stored (default .) |
-y / --ylim |
A list of two integers that corresponds to the y-axis range in the figure. (default None) |
The figure to give to the --norm
parameter must be defined like this:
condition | region | replicate | coef |
---|---|---|---|
Cond1 | interval | R1 | 0.5 |
Cond1 | interval | R2 | 0.4 |
Cond2 | interval | R1 | 0.84 |
Cond2 | interval | R2 | 0.2 |
- The column condition must contain the same conditions defined in the
condition
column of the design file. - The column replicate must contain the same replicates names in the
replicate
column of the design file. - The column coef contain a value used to normalise the coverage