diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..fd1c95fb07be1e0f3f043cc320087c1632a37e94 --- /dev/null +++ b/README.md @@ -0,0 +1,108 @@ +# BigWig visu + +## Description + +This project contains three mains submodules that can be found in the `src/` directory: +* The module `bed_handler` allows to make some operations on bed files. This module was designed for a particular project and it is unlikly that you want to use it. +* The module `gc_content` allows to create violin plots displaying the GC content of the two or more bed files given in input. It also make a Wilcoxon test to see if the regions contains in those bed files show a difference in their GC-content. +* The module `visu` allows to display the coverage from bigwig files, created from different conditions, in specific genomic regions defines in two or more bed files. + +## Prerequisites + +This project requires `python>=3.8` to work correctly and the following modules must be installed: +* Lazyparser>=0.2.0 +* Pandas>=1.0.3 +* Loguru>=0.5.3 +* numpy>=1.17.4 +* pyfaidx>=0.5.7 +* biopython>=1.75 +* seaborn>=0.10.1 +* matplotlib>=3.1.2 + +## Usage + +### `gc_content` module + +To launch the `gc_content` module you must enter the following command at the root of this project: + +```console +$ python3 -m src.gc_content [PARAMS] +``` + +Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below + +| Required parameters | Description | +|:--------------------|:-----------------------------------------------------------------------:| +| `-B / --beds` | A list of beds files containing the regions for which we want to display the coverage | +| `-b / --bed_names` | A list of names identifying each bed files given in the `-B / --beds` parameter | +| `-g / --genome` | A Fasta file containing the entire genome of an organism of interest | + +| Optional parameters | Description | +|:--------------------|:-----------------------------------------------------------------------:| +| `-e / --environment`| Number of nucleotides to display around the genomic intervals defined in the bed files (default 0)| +| `-f / --ft_names` | A name identifying the kind of genomic intervals defined in the bed files (default: interval) + +Note that you can also display the help for this module by typing: + +```console +$ python3 -m src.gc_content --help +``` + +The list of element must be separated by a comma when you're writing the command. For example, for the parameter `-B` if you want to enter 3 beds file you can type + +```console +$ python3 -m src.gc_content -B bed1.bed bed2.bed bed3.bed [...] +``` + +The `[...]` represent the last part of the command to write. + +### `visu` module + + +To launch the `visu` module you must enter the following command at the root of this project: + +```console +$ python3 -m src.visu [PARAMS] +``` + +Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below + +| Required parameters | Description | +|:--------------------|:-----------------------------------------------------------------------:| +| `-d / --design` | A tabulated file containing 3 columns. The first column contains a bigwig filename, the second contains the condition name and the last one contains the replicate of the condition. | +| `-B / --bw_folder` | The folder containing the bigwig files mentioned in the first column of the 'design' table | +| `-r / --region_beds`| A list of one or many bed files containing the genomic intervals to display | +| `-R / --region_names`| A list of names identifying genomic intervals insides the beds given with the `-r / --region_beds` parameter. | + +Example of the content in the design file: + +| bigwig | condition | replicate | +|:-------:|:----------:|:---------:| +| bw1.bw | Cond1 | R1 | +| bw2.bw | Cond1 | R2 | +| bw1.bw | Cond2 | R1 | +| bw2.bw | Cond2 | R2 | + +| Optional parameters | Description | +|:--------------------|:-----------------------------------------------------------------------:| +| `-n / --nb_bin` | An integer corresponding to the number of bins to use to represent the genomic intervals given with `-r / --region_beds` arguments. (default 100)| +| `-f / --figure_type` | The kind of representation wanted (barplot or metagene) (default metagene)| +| `-N / --norm` | A number corresponding to a bin for which the coverage will become 1. 'None' for no normalisation (default 'None'). Note that this parameter can also take a file (description after this table) | +| `-s / --show_replicates` | 'y' to create a figure showing the coverage for all replicates 'n' to display only the average coverage across the conditions defined in `-d / --design` parameter. (default y) | +| ` -e / --environment` | A list of two integer. The first corresponds to the number of nucleotides to display around the genomic intervals of interest defined with the `-r / --region_beds` parameter and the second corresponds to the number of bins to use (default 0 0)| +| `-b / --border_name` | A list of two string. The name of the left and right border to display in the figures between the genomic intervals defined with the `-r / --region_beds` and their environment | +| `-o / --output` | Folder where the figures will be stored | +| ` -y / --ylim` | A list of two integer that corresponds to the y-axis range in the figure. (default None) | + +The figure to give to the `--norm` parameter must be defined like this: + +| condition | region | replicate | coef | +|:------:|:------:|:----------:|:---------:| +| Cond1 | interval | R1 | 0.5 | +| Cond1 | interval | R2 | 0.4 | +| Cond2 | interval | R1 | 0.84 | +| Cond2 | interval | R2 | 0.2 | + +* The column condition must contain the same conditions defined in the `condition` column of the design file. +* The column replicate must contain the same replicates names in the `replicate` column of the design file. +* The column coef contain a value used to normalise the coverage \ No newline at end of file