Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • main
  • master
2 results

Target

Select target project
  • LBMC/hub/formations/R_basis
  • can/R_basis
2 results
Select Git revision
  • main
  • master
  • quarto-rebuild
3 results
Show changes
Showing
with 2769 additions and 781 deletions
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
session_6/img/overview_joins.png

50.5 KiB

session_6/img/overview_set.png

11.5 KiB

session_6/img/pivot_longer.png

21.1 KiB

session_6/img/pivot_wider.png

21.7 KiB

This diff is collapsed.
---
title: "R#6: tidydata"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "19 Dec 2019"
output:
slidy_presentation:
highlight: tango
beamer_presentation:
theme: metropolis
slide_level: 3
fig_caption: no
df_print: tibble
highlight: tango
latex_engine: xelatex
---
```{r setup, include=FALSE, echo = F}
library(tidyverse)
library(nycflights13)
flights2 <- flights %>%
select(year:day, hour, origin, dest, tailnum, carrier)
```
## Tidydata
There are three interrelated rules which make a dataset tidy:
- Each variable must have its own column.
- Each observation must have its own row.
- Each value must have its own cell.
```{r load_data, eval=T, message=T}
library(tidyverse)
```
## pivot longer
```{r table4a, eval=T, message=T}
table4a # number of TB cases
```
## pivot longer
```{r pivot_longer, eval=T, message=T}
table4a %>%
pivot_longer(-country,
names_to = "year",
values_to = "case")
```
## pivot wider
```{r table2, eval=T, message=T}
table2
```
## pivot wider
```{r pivot_wider, eval=T, message=T}
table2 %>%
pivot_wider(names_from = type,
values_from = count)
```
## Relational data
Sometime the information can be split between different table
```{r airlines, eval=F, echo = T}
library(nycflights13)
flights
airlines
airports
weather
flights2 <- flights %>%
select(year:day, hour, origin, dest, tailnum, carrier)
```
## Relational data
```{r airlines_dag, echo=FALSE, out.width='100%'}
knitr::include_graphics('img/relational-nycflights.png')
```
## joints
```{r joints, echo=FALSE, out.width='100%'}
knitr::include_graphics('img/join-venn.png')
```
## `inner_joint()`
Matches pairs of observations whenever their keys are equal
```{r inner_joint, eval=T}
flights2 %>%
inner_join(airlines)
```
## `left_joint()`
keeps all observations in `x`
```{r left_joint, eval=T}
flights2 %>%
left_join(airlines)
```
## `right_joint()`
keeps all observations in `y`
```{r right_joint, eval=T}
flights2 %>%
right_join(airlines)
```
## `full_joint()`
keeps all observations in `x` and `y`
```{r full_joint, eval=T}
flights2 %>%
full_join(airlines)
```
## Defining the key columns
The default, `by = NULL`, uses all variables that appear in both tables, the so called natural join.
```{r left_join_weather, eval=T}
flights2 %>%
left_join(weather)
```
## Defining the key columns
The default, `by = NULL`, uses all variables that appear in both tables, the so called natural join.
```{r left_join_tailnum, eval=T, echo = T}
flights2 %>%
left_join(planes, by = "tailnum")
```
## Defining the key columns
A named character vector: `by = c("a" = "b")`. This will match variable `a` in table `x` to variable `b` in table `y`.
```{r left_join_airport, eval=T, echo = T}
flights2 %>%
left_join(airports, c("dest" = "faa"))
```
## Filtering joins
Filtering joins match observations in the same way as mutating joins, but affect the observations, not the variables. There are two types:
- `semi_join(x, y)` keeps all observations in `x` that have a match in `y`.
- `anti_join(x, y)` drops all observations in `x` that have a match in `y`.
## Filtering joins
```{r top_dest, eval=T, echo = T}
top_dest <- flights %>%
count(dest, sort = TRUE) %>%
head(10)
flights %>%
semi_join(top_dest)
```
## Set operations
These expect the x and y inputs to have the same variables, and treat the observations like sets:
- `intersect(x, y)`: return only observations in both `x` and `y`.
- `union(x, y)`: return unique observations in `x` and `y`.
- `setdiff(x, y)`: return observations in `x`, but not in `y`.
This diff is collapsed.
This diff is collapsed.
FROM rocker/tidyverse
RUN apt-get update \
&& apt-get install -y \
libxt6 \
cargo
RUN Rscript -e "install.packages('rmdformats')"
#session 1
RUN Rscript -e "install.packages('rvest')"
RUN Rscript -e "install.packages('remotes'); remotes::install_github('rlesur/klippy')"
#session 3
RUN Rscript -e "install.packages('gganimate')"
RUN Rscript -e "install.packages('gifski')"
RUN Rscript -e "install.packages('openxlsx')"
#session4
RUN Rscript -e "install.packages(c('ghibli', 'nycflights13','viridis','ggrepel'))"
\ No newline at end of file
#!/bin/bash
set -euo pipefail +o nounset
TAG="v2022"
IMAGE_NAME="r_for_beginners"
DOCKERFILE_DIR="."
REPO=carinerey/$IMAGE_NAME
echo "## Build docker: $REPO:$TAG ##"
docker build -t $REPO:$TAG $DOCKERFILE_DIR
echo "## Build docker: $REPO ##"
docker build -t $REPO $DOCKERFILE_DIR
if [[ $1 == "push_yes" ]]
then
echo "## Push docker ##"
docker push $REPO:$TAG
docker push $REPO
fi
#! /usr/bin/bash
# USAGE
#wget -qO - http://perso.ens-lyon.fr/laurent.modolo/R/create_users_from_mail.sh | bash -s usertest@mail.fr usertest2@mail.f
# wget -qO - http://perso.ens-lyon.fr/laurent.modolo/R/create_users_from_mail.sh | tr -d '\r' | bash -s usertest@mail.fr usertest2@mail.f
USERMAILS=$@
for USERMAIL in ${USERMAILS[@]}
do
USERNAME=$(echo ${USERMAIL} | sed -E 's/(.*)@.*/\1/')
adduser ${USERNAME} --gecos 'First Last,RoomNumber,WorkPhone,HomePhone' --disabled-password
PASSWD=$(openssl rand -base64 20)
echo "${USERNAME}:${PASSWD}" | chpasswd
adduser ${USERNAME} --gecos 'First Last,RoomNumber,WorkPhone,HomePhone' --disabled-password --force-badname > /dev/null
PASSWD=$(openssl rand -base64 10)
echo "${USERNAME}:${PASSWD}" | chpasswd > /dev/null
echo "======================================================================="
echo "${USERMAIL}:"
echo "${USERNAME}"
......
This diff is collapsed.
This diff is collapsed.
File moved
This diff is collapsed.