Commit 07905234 authored by nfontrod's avatar nfontrod
Browse files

README.md: add readme

parent debea658
# BigWig visu
## Description
This project contains three mains submodules that can be found in the `src/` directory:
* The module `bed_handler` allows to make some operations on bed files. This module was designed for a particular project and it is unlikly that you want to use it.
* The module `gc_content` allows to create violin plots displaying the GC content of the two or more bed files given in input. It also make a Wilcoxon test to see if the regions contains in those bed files show a difference in their GC-content.
* The module `visu` allows to display the coverage from bigwig files, created from different conditions, in specific genomic regions defines in two or more bed files.
## Prerequisites
This project requires `python>=3.8` to work correctly and the following modules must be installed:
* Lazyparser>=0.2.0
* Pandas>=1.0.3
* Loguru>=0.5.3
* numpy>=1.17.4
* pyfaidx>=0.5.7
* biopython>=1.75
* seaborn>=0.10.1
* matplotlib>=3.1.2
## Usage
### `gc_content` module
To launch the `gc_content` module you must enter the following command at the root of this project:
```console
$ python3 -m src.gc_content [PARAMS]
```
Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below
| Required parameters | Description |
|:--------------------|:-----------------------------------------------------------------------:|
| `-B / --beds` | A list of beds files containing the regions for which we want to display the coverage |
| `-b / --bed_names` | A list of names identifying each bed files given in the `-B / --beds` parameter |
| `-g / --genome` | A Fasta file containing the entire genome of an organism of interest |
| Optional parameters | Description |
|:--------------------|:-----------------------------------------------------------------------:|
| `-e / --environment`| Number of nucleotides to display around the genomic intervals defined in the bed files (default 0)|
| `-f / --ft_names` | A name identifying the kind of genomic intervals defined in the bed files (default: interval)
Note that you can also display the help for this module by typing:
```console
$ python3 -m src.gc_content --help
```
The list of element must be separated by a comma when you're writing the command. For example, for the parameter `-B` if you want to enter 3 beds file you can type
```console
$ python3 -m src.gc_content -B bed1.bed bed2.bed bed3.bed [...]
```
The `[...]` represent the last part of the command to write.
### `visu` module
To launch the `visu` module you must enter the following command at the root of this project:
```console
$ python3 -m src.visu [PARAMS]
```
Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below
| Required parameters | Description |
|:--------------------|:-----------------------------------------------------------------------:|
| `-d / --design` | A tabulated file containing 3 columns. The first column contains a bigwig filename, the second contains the condition name and the last one contains the replicate of the condition. |
| `-B / --bw_folder` | The folder containing the bigwig files mentioned in the first column of the 'design' table |
| `-r / --region_beds`| A list of one or many bed files containing the genomic intervals to display |
| `-R / --region_names`| A list of names identifying genomic intervals insides the beds given with the `-r / --region_beds` parameter. |
Example of the content in the design file:
| bigwig | condition | replicate |
|:-------:|:----------:|:---------:|
| bw1.bw | Cond1 | R1 |
| bw2.bw | Cond1 | R2 |
| bw1.bw | Cond2 | R1 |
| bw2.bw | Cond2 | R2 |
| Optional parameters | Description |
|:--------------------|:-----------------------------------------------------------------------:|
| `-n / --nb_bin` | An integer corresponding to the number of bins to use to represent the genomic intervals given with `-r / --region_beds` arguments. (default 100)|
| `-f / --figure_type` | The kind of representation wanted (barplot or metagene) (default metagene)|
| `-N / --norm` | A number corresponding to a bin for which the coverage will become 1. 'None' for no normalisation (default 'None'). Note that this parameter can also take a file (description after this table) |
| `-s / --show_replicates` | 'y' to create a figure showing the coverage for all replicates 'n' to display only the average coverage across the conditions defined in `-d / --design` parameter. (default y) |
| ` -e / --environment` | A list of two integer. The first corresponds to the number of nucleotides to display around the genomic intervals of interest defined with the `-r / --region_beds` parameter and the second corresponds to the number of bins to use (default 0 0)|
| `-b / --border_name` | A list of two string. The name of the left and right border to display in the figures between the genomic intervals defined with the `-r / --region_beds` and their environment |
| `-o / --output` | Folder where the figures will be stored |
| ` -y / --ylim` | A list of two integer that corresponds to the y-axis range in the figure. (default None) |
The figure to give to the `--norm` parameter must be defined like this:
| condition | region | replicate | coef |
|:------:|:------:|:----------:|:---------:|
| Cond1 | interval | R1 | 0.5 |
| Cond1 | interval | R2 | 0.4 |
| Cond2 | interval | R1 | 0.84 |
| Cond2 | interval | R2 | 0.2 |
* The column condition must contain the same conditions defined in the `condition` column of the design file.
* The column replicate must contain the same replicates names in the `replicate` column of the design file.
* The column coef contain a value used to normalise the coverage
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment