Skip to content
Snippets Groups Projects
Select Git revision
  • master
  • dev
  • non-b_dev
  • distance
  • Ali_branch
5 results

chia-pet_network

  • Clone with SSH
  • Clone with HTTP
  • Fontrodona Nicolas's avatar
    nfontrod authored
    src/figures_utils/TF_venn_of_project_with_common_origin.py: figure to create venn diagram between projects of the same origin
    ad1073b1
    History

    ChIA-PET network

    Description

    The goal of this module is to build a database containing interactions data recovered from 51 ChIA-PET projects and then analyse those data to be able to study the link between the spatial organization of the human genome and the regulation of alternative splicing.

    This project is still under active developpement

    Prerequisites

    1. This project is coded in python3.8 and uses the modules listed in requirements.txt. To easily install them with pip, enter in your terminal:
    pip install -r requirements.txt
    1. PmagicGEO - a script that allows to obtain metadata associated to a GSE or a GSM (GEO classification)

    2. You need to have under your data:

    data/
      bed/
        exon.bed # bed file containing Every fasterDB exons
        exon_orf.bed # bed file containing the ORF of exons
        gene.bed # bed file containing every FasterDB gene
        intron.bed # bed file containing every FasterDB intron
      metadata_files/
        chia_pet_list_GSM.txt # A text file containing the selected public ChIA-PET experiments
        chia_pet_list.csv #  A test file containing some metadata for the selected ChIA-PET experiment
      splicing_lore_data/
        ase_event.txt # A list of exon skipping events recovered from public datasets analysed with FaRLine
        splicing_lore_projects.txt # A list containing the public experiments and some associated metadata analysed with FaRline to recover exon skipping exons stored in ase_even.txt
      interactions_files/
        chr_sizes_hg19_nochr.txt # A text file containing the size of every chromosome in hg19
        chia_pet/ # folder containing bed files of interactions recovered from public ChIA-PET datasets selected
          GSM1327093.bed
          ...
          GSM970216.bed

    associated publications:

    FaRLine: Benoit-Pilven C, Marchet C, Chautard E, Lima L, Lambert MP, Sacomoto G, Rey A, Cologne A, Terrone S, Dulaurier L, Claude JB, Bourgeois CF, Auboeuf D, Lacroix V. Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data. Sci Rep. 2018 Mar 9;8(1):4307. doi: 10.1038/s41598-018-21770-7. PMID: 29523794; PMCID: PMC5844962.

    FasterDB: Mallinjoud P, Villemin JP, Mortada H, Polay Espinoza M, Desmet FO, Samaan S, Chautard E, Tranchevent LC, Auboeuf D. Endothelial, epithelial, and fibroblast cells exhibit specific splicing programs independently of their tissue of origin. Genome Res. 2014 Mar;24(3):511-21. doi: 10.1101/gr.162933.113. Epub 2013 Dec 4. PMID: 24307554; PMCID: PMC3941115.

    Creation of the database

    To launch the creatiob of the database, just enter the following command line:

    $ python3 -m src.db_utils

    This will create a databe having the following structure:

    database

    Downloading some encode eclip

    To automatically download some encode eclip peaks from selected project that are used to create some figure, you must have the following file in the data folder:

    "CLIP_bed" / "experiment_report_2021_1_22_16h_39m.tsv".

    It contains Encode eCLIP experiment done by depleting a splicing factor/ transcription factor.

    To download them, launch:

    $ python3 -m src.download_encode_eclip

    Creation of barplot figures indicating if exons regulated by the same splicing factor are more often co-localised.

    To launch this submodule, make sure you have built the database described in Creation of the database section

    Then to launch figure creation, juste type

    $ python3 src.figures_utils

    This will create the folder results/figures_all_chia_pet_datasets

    barplot

    Where :

    • DDX5/17 down corresponds to exons less included in proteins when DDX5/17 is depleted.
    • DDX5/17 up corresponds to exons more included in proteins when DDX5/17 is depleted.
    • Control down corresponds to 1000 control set of randomly selected exons having the same size as the number of 'DDX5/17 down' exon.
    • Control up corresponds to 1000 control set of randomly selected exons having the same size as the number of 'DDX5/17 up' exon.
    • The y-axis represent the number of exons directly in contact