LBMC
ReGArDS
DESeq2-wrapper

Repository

sudo apt-get install pandoc
> library(devtools)
> install_gitlab("LBMC/regards/deseq2-wrapper", host = "https://gitbio.ens-lyon.fr", quiet = FALSE)
mkdir deseq2-wrapper
git clone http://gitbio.ens-lyon.fr/LBMC/regards/deseq2-wrapper.git deseq2-wrapper
cd deseq2-wrapper
R
library(devtools)
install(".")
# my_R_file.R

#!/bin/Rscript
library(DESeqwrapper)
cli_run_deseq2()
$ Rscript my_R_file.R --help

usage: my_R_file.R [--] [--help] [--opts OPTS] [--output OUTPUT]
       [--filter FILTER] [--gene_expression_threshold
       GENE_EXPRESSION_THRESHOLD] [--basemean_threshold
       BASEMEAN_THRESHOLD] [--lfc_threshold LFC_THRESHOLD] design path
       formula contrasts

Wrapper to perform DESeq2 enrichment analysis

positional arguments:
  design                           A file containing the design of
                                   interest
  path                             The folder where the tsv file are
                                   stored (default .)
  formula                          The design formula for DESeq2
  contrasts                        The comparisons to perform in the
                                   form of deseq2 contrasts : each
                                   comparisons must be composed of
                                   three feature separed by a '-'. The
                                   first feature corresponds to a
                                   column given in the design file an
                                   present in the --formula, the two
                                   other are the names of the fold
                                   changes for the numerator, and the
                                   names of the fold changes for the
                                   denominator

flags:
  -h, --help                       show this help message and exit

optional arguments:
  -x, --opts                       RDS file containing argument values
  -o, --output                     folder were the results will be
                                   created [default: .]
  -f, --filter                     A file containg the genes id to keep
                                   [default: ]
  -g, --gene_expression_threshold  A threshold to keep genes with at
                                   least this average expression across
                                   all samples prior to deseq2 analysis
                                   [default: 2]
  -b, --basemean_threshold         A threshold to keep significant
                                   genes with at least this baseMean
                                   (after deseq2 analysis) [default: 0]
  -l, --lfc_threshold              A threshold to keep significant
                                   genes with at least this log2fc
                                   [default: 0]
  -c, --gene_names                 Optional: A file containing the
                                   correspondance between geneIDs and
                                   gene namesThe first colomn contains
                                   geneIDs and the header is 'gene'The
                                   second colomn contains corresponding
                                   gene names and the header is
                                   'gene_name [default: ]
  -n, --norm                       A two column dataframe containing
                                   thecolumns sample and size_factor.
                                   Sample column must correspond to the
                                   samples name given in design file.
                                   Size_factorcorrespond to the scaling
                                   value to applyto count data in
                                   deseq2 [default: ]
sample  size_factor
276_DMSO_DMSO_0 1
277_DMSO_DMSO_0 2
> library(DESeqwrapper)
> ?run_deseq2 #display the help of run_deseq2 function
> # example usage
> design <- "path_to/design.txt"
> (design_table <- read.table(design, sep="\t", h = T))
                      count_files          sample condition
1 276_DMSO_DMSO_0_no_spike-in.tsv 276_DMSO_DMSO_0      DMSO
2 277_DMSO_DMSO_0_no_spike-in.tsv 277_DMSO_DMSO_0      DMSO
3 278_DMSO_DMSO_0_no_spike-in.tsv 278_DMSO_DMSO_0      DMSO
4 276_BRAF_DMSO_0_no_spike-in.tsv 276_BRAF_DMSO_0      BRAF
5 277_BRAF_DMSO_0_no_spike-in.tsv 277_BRAF_DMSO_0      BRAF
6 278_BRAF_DMSO_0_no_spike-in.tsv 278_BRAF_DMSO_0      BRAF
> path <- "path/to/htseq_files"
> my_formula <- "~ condition"
> filter_list <- c("gene1", "...", "genen") # A vector containing the gene to keep. It can be equal to NULL (default value) to keep all genes
> output_folder <- "results/"
> my_contrasts <- list(c("condition", "BRAF", "DMSO"), #contrasts: a list containing the column used for the comparison in the design and the test and the control condition.
+ c("condition", "DMSO", "BRAF")) # you can do more that one differential expression analysis this way: (i.e BRAF vs DMSO and DMSO vs BRAF here)
> res <- run_deseq2(design_table, path, my_contrasts,
+             my_formula, filter_list,
+             min_expression = 2, output_folder = output_folder,
+             basemean_threshold = 0, lfc_threshold = 0)
> # Note that lfc_threshold, basemean_threshold and min_expression corresponds to the parameters lfc_threshold, basemean_threshold, gene_expression_threshold of the cli program (see above) respectively.
> # res is a list containing the name of each comparison defined in my_contrasts
> str(res, max.level=1) # 2 comparisons in my_contrast, so we have a list of 2 elements
List of 1
 $ BRAF_vs_DMSO:List of 4
 $ DMSO_vs_BRAF:List of 4
> str(res$BRAF_vs_DMSO, max.level=1)
 $ data      :'data.frame':     1 obs. of  4 variables:
 $ dds       :Formal class 'DESeqDataSet' [package "DESeq2"] with 8 slots
 $ results   :'data.frame':     11893 obs. of  7 variables:
 $ de_results:'data.frame':     1620 obs. of  7 variables:
result_folder
├── DE_plots_TEST_vs_CTRL.pdf
├── DE_statistics.txt
├── Differential_expression_TEST_vs_CTRL_lfct_0.585.txt
├── Differential_expression_TEST_vs_CTRL_lfct_0.585_baseMt_10_sig.txt
├── plots_general_view.pdf
├── readcounts_norm_[N]genes.csv
└── readcounts_raw_[N]genes.csv