Skip to content
Snippets Groups Projects

BigWig visu

Description

This project contains three mains submodules that can be found in the src/ directory:

  • The module bed_handler allows to make some operations on bed files. This module was designed for a particular project and it is unlikly that you want to use it.
  • The module gc_content allows to create violin plots displaying the GC content of the two or more bed files given in input. It also make a Wilcoxon test to see if the regions contains in those bed files show a difference in their GC-content.
  • The module visu allows to display the coverage from bigwig files, created from different conditions, in specific genomic regions defines in two or more bed files.

Prerequisites

This project requires python>=3.8 to work correctly and the following modules must be installed:

  • Lazyparser>=0.2.0
  • Pandas>=1.0.3
  • Loguru>=0.5.3
  • numpy>=1.17.4
  • pyfaidx>=0.5.7
  • biopython>=1.75
  • seaborn>=0.10.1
  • matplotlib>=3.1.2

Build the Sphinx documentation

The documentation was build using Sphinx.

To build the documentation (which contains this document and the Api of the program), go in the doc folder and enter:

$ make html

Usage

gc_content module

To launch the gc_content module you must enter the following command at the root of this project:

$ python3 -m src.gc_content [PARAMS]

Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below

Required parameters Description
-B / --beds A list of beds files containing the regions for which we want to display the coverage
-b / --bed_names A list of names identifying each bed files given in the -B / --beds parameter
-g / --genome A Fasta file containing the entire genome of an organism of interest
Optional parameters Description
-e / --environment Number of nucleotides to display around the genomic intervals defined in the bed files (default 0)
-f / --ft_names A name identifying the kind of genomic intervals defined in the bed files (default: interval)

Note that you can also display the help for this module by typing:

$ python3 -m src.gc_content --help

The list of element must be separated by a comma when you're writing the command. For example, for the parameter -B if you want to enter 3 beds file you can type

$ python3 -m src.gc_content -B bed1.bed bed2.bed bed3.bed [...]

The [...] represent the last part of the command to write.

visu module

To launch the visu module you must enter the following command at the root of this project:

$ python3 -m src.visu [PARAMS]

Where [PARAMS] corresponds to the parameters given to the program. The list of available params is defined below

Required parameters Description
-d / --design A tabulated file containing 3 columns. The first column contains a bigwig filename, the second contains the condition name and the last one contains the replicate of the condition.
-B / --bw_folder The folder containing the bigwig files mentioned in the first column of the 'design' table
-r / --region_beds A list of one or many bed files containing the genomic intervals to display
-R / --region_names A list of names identifying genomic intervals insides the beds given with the -r / --region_beds parameter.

Example of the content in the design file:

bigwig condition replicate
bw1.bw Cond1 R1
bw2.bw Cond1 R2
bw1.bw Cond2 R1
bw2.bw Cond2 R2
Optional parameters Description
-n / --nb_bin An integer corresponding to the number of bins to use to represent the genomic intervals given with -r / --region_beds arguments. (default 100)
-f / --figure_type The kind of representation wanted (barplot or metagene) (default metagene)
-N / --norm A number corresponding to a bin for which the coverage will become 1. 'None' for no normalisation (default 'None'). Note that this parameter can also take a file (description after this table)
-s / --show_replicates 'y' to create a figure showing the coverage for all replicates 'n' to display only the average coverage across the conditions defined in -d / --design parameter. (default y)
-e / --environment A list of two integers. The first corresponds to the number of nucleotides to display around the genomic intervals of interest defined with the -r / --region_beds parameter and the second corresponds to the number of bins to use (default 0 0)
-b / --border_name A list of two strings. The name of the left and right border to display in the figures between the genomic intervals defined with the -r / --region_beds and their environment (default '' '')
-o / --output Folder where the figures will be stored (default .)
-y / --ylim A list of two integers that corresponds to the y-axis range in the figure. (default None)

The figure to give to the --norm parameter must be defined like this:

condition region replicate coef
Cond1 interval R1 0.5
Cond1 interval R2 0.4
Cond2 interval R1 0.84
Cond2 interval R2 0.2
  • The column condition must contain the same conditions defined in the condition column of the design file.
  • The column replicate must contain the same replicates names in the replicate column of the design file.
  • The column coef contain a value used to normalise the coverage