Methods
A logistic regression analysis was performed to test if internal exons down-regulated* by DDX5/17, U1-70K** or not down-regulated by these proteins (control exons) have a different closeness to CTCF peaks. We modeled the closeness to a CTCF peak according to the different groups of exons using the glm function, with family = binomial (“logit”) in R software (R Core Team 2018). An exon was considered as close to a CTCF peak if it contains the CTCF peak or if the CTCF peak is located at distance from 1b to 1 or 2kb or less upstream/downstream/around the exon. To test the differences between every couple of group of exons, a Tukey’s test was used (with R, emmeans function (library emmeans)). The same statistical analysis was performed only on first or last exon of genes having at least one internal exon. It's purpose is to test whether genes with at least one exon down-regulated* by DDX5/17, U1-70K or other genes (control) have a first or last exon with a different closeness to CTCF peaks. Genes containing more than one exon with heterogeneous regulation where removed from the analysis.
* Other case tested: ... up- or down-regulated ...
** Other protein tested:
- SRSF1
- RBMX
- RBM25
- HNRNPK
Weblogo PAS
The script create_pas_weblogo was developed to identify motifs located within the 50 last nucleotide of last exons of genes regulated or not by DDX5/17