Select Git revision
Forked from
LBMC / nextflow
Source project has a limited visibility.
slides.Rmd 9.80 KiB
title: "R#4: data transformation"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "08 Nov 2019"
output:
beamer_presentation:
theme: metropolis
slide_level: 3
fig_caption: no
df_print: tibble
highlight: tango
latex_engine: xelatex
slidy_presentation:
highlight: tango
knitr::opts_chunk$set(echo = FALSE)
library(tidyverse)
R#4: data transformation
The goal of this practical is to practices data transformation with tidyverse
.
The objectives of this session will be to:
- Filter rows with
filter()
- Arrange rows with
arrange()
- Select columns with
select()
- Add new variables with
mutate()
- Combining multiple operations with the pipe
%>%
nycflights13
nycflights13::flights
contains all 336,776 flights that departed from New York City in 2013. The data comes from the US Bureau of Transportation Statistics, and is documented in ?flights
library(nycflights13)
library(tidyverse)
nycflights13
flights
- int stands for integers.
- dbl stands for doubles, or real numbers.
- chr stands for character vectors, or strings.
- dttm stands for date-times (a date + a time).
- lgl stands for logical, vectors that contain only TRUE or FALSE.
- fctr stands for factors, which R uses to represent categorical variables with fixed possible values.
- date stands for dates.
filter()
Filter rows with filter()
allows you to subset observations based on their values.
filter(flights, month == 1, day == 1)
filter()
Filter rows with dplyr
functions never modify their inputs, so if you want to save the result, you’ll need to use the assignment operator, <-
jan1 <- filter(flights, month == 1, day == 1)
R either prints out the results, or saves them to a variable.
(dec25 <- filter(flights, month == 12, day == 25))