diff --git a/session_1/slides.Rmd b/session_1/slides.Rmd new file mode 100644 index 0000000000000000000000000000000000000000..afa5846791657956601aab7c51997d8a9c9c5825 --- /dev/null +++ b/session_1/slides.Rmd @@ -0,0 +1,552 @@ +--- +title: "R#1: Introduction to R and RStudio" +author: Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr) +date: 10 Oct 2019 +output: + beamer_presentation: + theme: "metropolis" + slide_level: 3 + fig_caption: false + df_print: tibble + highlight: tango + latex_engine: xelatex +--- +## R#1: Introduction to R and RStudio +The goal of this practical is to familiarize yourself with R and the RStudio +environment. + +The objectives of this session will be to: + +- Understand the purpose of each pane in RStudio +- Do basic computation with R +- Define variables and assign data to variables +- Manage a workspace in R +- Call functions +- Manage packages + +## Acknowledgments + +\begin{columns} + \begin{column}{0.5\textwidth} + \includegraphics[width=\textwidth]{img/software_carpentry_logo} + {\bf https://software-carpentry.org/} + http://swcarpentry.github.io/r-novice-gapminder/ + \end{column} + \begin{column}{0.5\textwidth} + \includegraphics[width=\textwidth]{img/r_for_data_science.png} + \end{column} +\end{columns} + +# Some R background + +\includegraphics[width=40pt]{img/Rlogo.png} +is a programming language and free software environment for statistical +computing and graphics supported by the *R Foundation for Statistical Computing*. + +# Some R background + +\includegraphics[width=40pt]{img/Rlogo.png} + +- Created by **Ross Ihaka** and **Robert Gentleman** +- initial version released in 1995 +- free and open-source implementation the S programming language +- currently developed by the **R Development Core Team**. + +# Some R background + +Reasons to use \includegraphics[width=40pt]{img/Rlogo.png} + +- It’s free, well documented, and runs almost everywhere +- it has a large (and growing) user base among scientists +- it has a large library of external packages available for performing diverse tasks. + +- **15,068** available packages on https://cran.r-project.org/ +- **3,087** available packages on http://www.bioconductor.org +- **122,720** available repository on https://github.com/ + +# Some R background + +\includegraphics[width=\textwidth]{img/R_terminal.png} + +# RStudio, the R IDE + +\begin{block}{IDR: Integrated development environment} +application that provides {\bf comprehensive facilities} to computer programmers for +software development +\end{block} + +- free +- open source + +## An interface + +\includegraphics[width=\textwidth]{img/RStudio.png} + +## The same console as before + +\includegraphics[width=\textwidth]{img/RStudio_console.png} + +# R as a calculator + +- Add: `+` +- Divide: `/` +- Multiply: `*` +- Subtract: `-` +- Exponents: `^` or `**` +- Parentheses: `(`, `)` + +# R as a calculator + +```R +1 + 100 +1 + +``` + +\pause + +```R +3 + 5 * 2 +``` + +```R +(3 + 5) * 2 +``` + +\pause + +```R +(3 + (5 * (2 ^ 2))) # hard to read +3 + 5 * 2 ^ 2 # clear, if you remember the rules +3 + 5 * (2 ^ 2) # if you forget some rules, this might help +``` + +\pause + +```R +2/10000 +``` + +\pause + +`2e-4` is shorthand for `2 * 10^(-4)` + +```R +5e3 +``` + +## Mathematical functions + +```R +log(1) # natural logarithm +``` + +\pause + +```R +log10(10) # base-10 logarithm +``` + +\pause + +```R +exp(0.5) +``` + +\pause + +Compute the factorial of `9` + +\pause + +```R +factorial(9) +``` + +## Comparing things + +equality (note two equal signs read as "is equal to") +```R +1 == 1 +``` +\pause + +inequality (read as "is not equal to") +```R +1 != 2 +``` +\pause + +less than +```R +1 < 2 +``` +\pause + +less than or equal to +```R +1 <= 1 +``` +\pause + +greater than +```R +1 > 0 +``` + +## Variables and assignment + +`<-` is the assignment operator in R. (read as left member take right member value) + +```R +x <- 1/40 +``` + +```R +x +``` + +## The environment + +\includegraphics[width=\textwidth]{img/RStudio_environment.png} + +## Variables and assignment + +```R +log(x) +x <- 100 +log(x) +``` +\pause + +```R +x <- x + 1 +y <- x * 2 +``` + +\pause + +```R +z <- "x" +x + z +``` + +## Variables and assignment + +Variable names can contain letters, numbers, underscores and periods. + +They cannot start with a number nor contain spaces at all. + +Different people use different conventions for long variable names, these include + +```R +periods.between.words +underscores_between_words +camelCaseToSeparateWords +``` + +What you use is up to you, but be consistent. + +\pause +It is also possible to use the `=` operator for assignment but **don’t do it !** + +## Variables and assignment + +Which of the following are valid R variable names? + +``` +min_height +max.height +_age +.mass +MaxLength +min-length +2widths +celsius2kelvin +``` + +**http://perso.ens-lyon.fr/laurent.modolo/R/1_a** + +## Functions are also variables + +```R +logarithm <- log +``` + +\pause + +A R function can have different arguments + +```R +function (x, base = exp(1)) +``` + +- `base` is a named argument are read from left to right +- named arguments breaks the reading order +- named arguments make your code more readable + +\pause + +To know more about the `log` function we can read its manual. + +```R +help(log) +``` + +\pause + +```R +?log +``` + +## Various output + +\includegraphics[width=\textwidth]{img/RStudio_outputs.png} + +## Functions are also variables + +Test that your `logarithm` function can work in base 10 + +\pause + +```R +10^logarithm(12, base = 10) +``` + +## Functions are also variables + +We can also define our own function with +```R +function_name <- function(a, b){ + result_1 <- operation1(a, b) + result_2 <- operation2(result_1, b) + return(result_2) +} +``` + +\pause + +write a function to test the base of the logarithm function + +\pause + +```R +base_test <- function(x, base){ + log_result <- logarithm(x, base=base) + exp_result <- base^log_result + test_result <- x == exp_result + return(test_result) +} +``` + +**http://perso.ens-lyon.fr/laurent.modolo/R/1_b** + +## Functions are also variables + +```R +base_test <- function(x, base){ + print(x) + log_result <- logarithm(x, base=base) + print(log_result) + exp_result <- base^log_result + print(exp_result) + print(x) + test_result <- x == exp_result + return(test_result) +} +``` + +**http://perso.ens-lyon.fr/laurent.modolo/R/1_c** + +## Functions are also variables + +```R +base_test <- function(x, base){ + print(x) + log_result <- logarithm(x, base=base) + print(log_result) + exp_result <- base^log_result + print(exp_result) + print(x) + test_result <- isTRUE(all.equal(x, exp_result)) + return(test_result) +} +``` + +**http://perso.ens-lyon.fr/laurent.modolo/R/1_d** + +## Functions are also variables + +```R +base_test <- function(x, base){ + return(isTRUE(all.equal(x, base^logarithm(x, base=base)))) +} +``` + +**http://perso.ens-lyon.fr/laurent.modolo/R/1_e** + +## The environment + +\includegraphics[width=\textwidth]{img/RStudio_environment.png} + +## A code editor + +\includegraphics[width=\textwidth]{img/RStudio_editor.png} + +## A code editor + +RStudio offers you great flexibility in running code from within the editor window. There are buttons, menu choices, and keyboard shortcuts. To run the current line, you can + +- click on the Run button above the editor panel, or +- select “Run Lines†from the “Code†menu, or +- hit Ctrl+Return in Windows or Linux or Cmd+Return on OS X. To run a block of code, select it and then Run. + +If you have modified a line of code within a block of code you have just run, there is no need to reselect the section and Run, you can use the next button along, Rerun the previous region. This will run the previous code block including the modifications you have made. + +## A code editor + +Copy your `logarithm` and `base_test` into a `tp_1.R` file + +\pause + +We can now clean your environment + +```R +rm(x) +``` + +\pause + +```R +?rm +``` + +\pause + +```R +ls() +``` + +\pause + +```R +rm(list = ls()) +``` + +## Installing packages + +```R +install.packages("tidyverse") +``` + +```R +install.packages("ggplot2") +``` + +## Installing packages + +\includegraphics[width=\textwidth]{img/RStudio_outputs.png} + +## Loading packages + +```R +sessionInfo() +``` + +\pause + +```R +library(tidyverse) +``` + +\pause + +```R +sessionInfo() +``` + +\pause + +```R +unloadNamespace("tidyverse") +``` + +\pause + +```R +sessionInfo() +``` + +# Complex variable type + +## Vector (aka list) + +```R +c(1, 2, 3, 4, 5) +``` + +\pause + +```R +1:5 +``` + +\pause + +```R +2^(1:5) +``` + +\pause + +```R +x <- 1:5 +2^x +``` + +\pause + +```R +log(x) +logarithm(x) +base_test(x, base = 10) +``` + +## Vector (aka list) + +```R +typeof(x) +``` + +\pause + +```R +typeof(x + 0.5) +``` + +\pause + +```R +is.vector(x) +``` + +\pause + +```R +y <- c(a = 1, b = 2, c = 3, d = 4, e = 5) +typeof(y) +is.vector(y) +``` + +\pause + +```R +x == y +``` + +\pause + +```R +all.equal(x, y) +``` + + diff --git a/session_3/slides.Rmd b/session_3/slides.Rmd new file mode 100644 index 0000000000000000000000000000000000000000..38ad95939f4ea11da89544c886b116f4a42165cb --- /dev/null +++ b/session_3/slides.Rmd @@ -0,0 +1,98 @@ +--- +title: "R#3: stats with ggplot2" +author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)" +date: "08 Nov 2019" +output: + beamer_presentation: + theme: metropolis + slide_level: 3 + fig_caption: no + df_print: tibble + highlight: tango + latex_engine: xelatex + slidy_presentation: + highlight: tango +--- + +```{r setup, include=FALSE, cache=TRUE} +knitr::opts_chunk$set(echo = FALSE) +library(tidyverse) +``` + +## R#3: stats with ggplot2 +The goal of this practical is to practices advanced features of `ggplot2`. + +The objectives of this session will be to: +- learn about statistical transformations +- practices position adjustments +- change the coordinate systems + +## `ggplot2` statistical transformations + +We are going to use the `diamonds` data set included in `tidyverse`. + +- Use the `help` and `View` command to explore this data set. +- Try the `str` command, which information are displayed ? + +## `ggplot2` statistical transformations + +```{r diamonds_barplot, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds) + + geom_bar(mapping = aes(x = cut)) +``` + +More diamonds are available with high quality cuts. + +## `ggplot2` statistical transformations + +On the x-axis, the chart displays cut, a variable from diamonds. On the y-axis, it displays count, but count is not a variable in diamonds! + +The algorithm used to calculate new values for a graph is called a **stat**, short for statistical transformation. The figure below describes how this process works with `geom_bar()`. + +\includegraphics[width=\textwidth]{img/visualization-stat-bar.png} + +## `ggplot2` statistical transformations + +You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using `stat_count()` instead of `geom_bar()`: + +```{r diamonds_stat_count, eval=FALSE, message=FALSE} +ggplot(data = diamonds) + + stat_count(mapping = aes(x = cut)) +``` + +## `ggplot2` statistical transformations + +Every geom has a default stat; and every stat has a default geom. This means that you can typically use geoms without worrying about the underlying statistical transformation. There are three reasons you might need to use a stat explicitly: + +- You might want to override the default stat. **3_a** +- You might want to override the default mapping from transformed variables to aesthetics. **3_b** +- You might want to draw greater attention to the statistical transformation in your code. **3_c** + +## Statistical transformation challenge + +- What does `geom_col()` do? How is it different to `geom_bar()`? +- What variables does `stat_smooth()` compute? What parameters control its behaviour? +- In our proportion bar chart, we need to set `group = 1`. Why? In other words what is the problem with these two graphs? + +```{r diamonds_stats_challenge, eval=FALSE, message=FALSE} +ggplot(data = diamonds) + + geom_bar(mapping = aes(x = cut, y = ..prop..)) +ggplot(data = diamonds) + + geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..)) +``` + +## Position adjustments +You can colour a bar chart using either the `colour` aesthetic, or, more usefully, `fill`: + +```{r diamonds_barplot_color, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds) + + geom_bar(mapping = aes(x = cut, colour = cut)) +``` + +## Position adjustments +You can colour a bar chart using either the `colour` aesthetic, or, more usefully, `fill`: + +```{r diamonds_barplot_fill, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} +ggplot(data = diamonds) + + geom_bar(mapping = aes(x = cut, fill = cut)) +``` \ No newline at end of file diff --git a/web/3_a b/web/3_a new file mode 100644 index 0000000000000000000000000000000000000000..e383cce7aac91f4fc266afbc3de624b66c2288c5 --- /dev/null +++ b/web/3_a @@ -0,0 +1,15 @@ +demo <- tribble( + ~cut, ~freq, + "Fair", 1610, + "Good", 4906, + "Very Good", 12082, + "Premium", 13791, + "Ideal", 21551 +) + +ggplot(data = demo) + + geom_bar(mapping = aes(x = cut, y = freq), stat = "identity") + +# (Don’t worry that you haven’t seen <- or tribble() before. You might be able +# to guess at their meaning from the context, and you’ll learn exactly what +# they do soon!) \ No newline at end of file diff --git a/web/3_b b/web/3_b new file mode 100644 index 0000000000000000000000000000000000000000..ef64ecc9c037c646266b292359267b6d3cab97d2 --- /dev/null +++ b/web/3_b @@ -0,0 +1,4 @@ +# you might want to display a bar chart of proportion, rather than count: + +ggplot(data = diamonds) + + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1)) \ No newline at end of file diff --git a/web/3_c b/web/3_c new file mode 100644 index 0000000000000000000000000000000000000000..d5d848265c9cb95dff92fdc081cf799f4746c29a --- /dev/null +++ b/web/3_c @@ -0,0 +1,15 @@ +# you might use stat_summary(), which summarises the y values for each unique x +# value, to draw attention to the summary that you’re computing: + +ggplot(data = diamonds) + + stat_summary( + mapping = aes(x = cut, y = depth) + ) + +ggplot(data = diamonds) + + stat_summary( + mapping = aes(x = cut, y = depth), + fun.ymin = min, + fun.ymax = max, + fun.y = median + ) \ No newline at end of file