ChIA-PET network
Description
The goal of this module is to build a database containing interactions data recovered from 51 ChIA-PET projects and then analyse those data to be able to study the link between the spatial organization of the human genome and the regulation of alternative splicing.
This project is still under active developpement
Prerequisites
- This project is coded in
python3.8
and uses the modules listed inrequirements.txt
. To easily install them with pip, enter in your terminal:
pip install -r requirements.txt
-
PmagicGEO - a script that allows to obtain metadata associated to a GSE or a GSM (GEO classification)
-
You need to have under your data:
data/
bed/
exon.bed # bed file containing Every fasterDB exons
exon_orf.bed # bed file containing the ORF of exons
gene.bed # bed file containing every FasterDB gene
intron.bed # bed file containing every FasterDB intron
metadata_files/
chia_pet_list_GSM.txt # A text file containing the selected public ChIA-PET experiments
chia_pet_list.csv # A test file containing some metadata for the selected ChIA-PET experiment
splicing_lore_data/
ase_event.txt # A list of exon skipping events recovered from public datasets analysed with FaRLine
splicing_lore_projects.txt # A list containing the public experiments and some associated metadata analysed with FaRline to recover exon skipping exons stored in ase_even.txt
interactions_files/
chr_sizes_hg19_nochr.txt # A text file containing the size of every chromosome in hg19
chia_pet/ # folder containing bed files of interactions recovered from public ChIA-PET datasets selected
GSM1327093.bed
...
GSM970216.bed
associated publications:
FaRLine: Benoit-Pilven C, Marchet C, Chautard E, Lima L, Lambert MP, Sacomoto G, Rey A, Cologne A, Terrone S, Dulaurier L, Claude JB, Bourgeois CF, Auboeuf D, Lacroix V. Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data. Sci Rep. 2018 Mar 9;8(1):4307. doi: 10.1038/s41598-018-21770-7. PMID: 29523794; PMCID: PMC5844962.
FasterDB: Mallinjoud P, Villemin JP, Mortada H, Polay Espinoza M, Desmet FO, Samaan S, Chautard E, Tranchevent LC, Auboeuf D. Endothelial, epithelial, and fibroblast cells exhibit specific splicing programs independently of their tissue of origin. Genome Res. 2014 Mar;24(3):511-21. doi: 10.1101/gr.162933.113. Epub 2013 Dec 4. PMID: 24307554; PMCID: PMC3941115.
Creation of the database
To launch the creatiob of the database, just enter the following command line:
$ python3 -m src.db_utils
This will create a databe having the following structure:
Downloading some encode eclip
To automatically download some encode eclip peaks from selected project that are used to create some figure, you must have the following file in the data
folder:
"CLIP_bed" / "experiment_report_2021_1_22_16h_39m.tsv"
.
It contains Encode eCLIP experiment done by depleting a splicing factor/ transcription factor.
To download them, launch:
$ python3 -m src.download_encode_eclip
Creation of barplot figures indicating if exons regulated by the same splicing factor are more often co-localised.
To launch this submodule, make sure you have built the database described in Creation of the database section
Then to launch figure creation, juste type
$ python3 src.figures_utils
This will create the folder results/figures_all_chia_pet_datasets
Where :
-
DDX5/17 down
corresponds to exons less included in proteins when DDX5/17 is depleted. -
DDX5/17 up
corresponds to exons more included in proteins when DDX5/17 is depleted. -
Control down
corresponds to 1000 control set of randomly selected exons having the same size as the number of 'DDX5/17 down' exon. -
Control up
corresponds to 1000 control set of randomly selected exons having the same size as the number of 'DDX5/17 up' exon. - The
y-axis
represent the number of exons directly in contact