Skip to content
Snippets Groups Projects
Forked from LBMC / nextflow
Source project has a limited visibility.

ChIA-PET network

Description

The goal of this module is to build a database containing interactions data recovered from 51 ChIA-PET projects and then analyse those data to be able to study the link between the spatial organization of the human genome and the regulation of alternative splicing.

This project is still under active developpement

Prerequisites

  1. This project is coded in python3.8 and uses the modules listed in requirements.txt. To easily install them with pip, enter in your terminal:
pip install -r requirements.txt
  1. PmagicGEO - a script that allows to obtain metadata associated to a GSE or a GSM (GEO classification)

  2. You need to have under your data:

data/
  bed/
    exon.bed # bed file containing Every fasterDB exons
    exon_orf.bed # bed file containing the ORF of exons
    gene.bed # bed file containing every FasterDB gene
    intron.bed # bed file containing every FasterDB intron
  metadata_files/
    chia_pet_list_GSM.txt # A text file containing the selected public ChIA-PET experiments
    chia_pet_list.csv #  A test file containing some metadata for the selected ChIA-PET experiment
  splicing_lore_data/
    ase_event.txt # A list of exon skipping events recovered from public datasets analysed with FaRLine
    splicing_lore_projects.txt # A list containing the public experiments and some associated metadata analysed with FaRline to recover exon skipping exons stored in ase_even.txt
  interactions_files/
    chr_sizes_hg19_nochr.txt # A text file containing the size of every chromosome in hg19
    chia_pet/ # folder containing bed files of interactions recovered from public ChIA-PET datasets selected
      GSM1327093.bed
      ...
      GSM970216.bed

associated publications:

FaRLine: Benoit-Pilven C, Marchet C, Chautard E, Lima L, Lambert MP, Sacomoto G, Rey A, Cologne A, Terrone S, Dulaurier L, Claude JB, Bourgeois CF, Auboeuf D, Lacroix V. Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data. Sci Rep. 2018 Mar 9;8(1):4307. doi: 10.1038/s41598-018-21770-7. PMID: 29523794; PMCID: PMC5844962.

FasterDB: Mallinjoud P, Villemin JP, Mortada H, Polay Espinoza M, Desmet FO, Samaan S, Chautard E, Tranchevent LC, Auboeuf D. Endothelial, epithelial, and fibroblast cells exhibit specific splicing programs independently of their tissue of origin. Genome Res. 2014 Mar;24(3):511-21. doi: 10.1101/gr.162933.113. Epub 2013 Dec 4. PMID: 24307554; PMCID: PMC3941115.

Creation of the database

To launch the creatiob of the database, just enter the following command line:

$ python3 -m src.db_utils

This will create a databe having the following structure:

database

Downloading some encode eclip

To automatically download some encode eclip peaks from selected project that are used to create some figure, you must have the following file in the data folder:

"CLIP_bed" / "experiment_report_2021_1_22_16h_39m.tsv".

It contains Encode eCLIP experiment done by depleting a splicing factor/ transcription factor.

To download them, launch:

$ python3 -m src.download_encode_eclip

Creation of barplot figures indicating if exons regulated by the same splicing factor are more often co-localised.

To launch this submodule, make sure you have built the database described in Creation of the database section

Then to launch figure creation, juste type

$ python3 src.figures_utils

This will create the folder results/figures_all_chia_pet_datasets

barplot

Where :

  • DDX5/17 down corresponds to exons less included in proteins when DDX5/17 is depleted.
  • DDX5/17 up corresponds to exons more included in proteins when DDX5/17 is depleted.
  • Control down corresponds to 1000 control set of randomly selected exons having the same size as the number of 'DDX5/17 down' exon.
  • Control up corresponds to 1000 control set of randomly selected exons having the same size as the number of 'DDX5/17 up' exon.
  • The y-axis represent the number of exons directly in contact