solution_RNASeq.nf
title: "R.4: data transformation"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr), Hélène Polvèche [hpolveche@istem.fr](mailto:hpolveche@istem.fr)"
date: "2022"
library(fontawesome)
if ("conflicted" %in% .packages()) {
conflicted::conflicts_prefer(dplyr::filter)
}
rm(list = ls())
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
Introduction
The goal of this session is to practice data transformation with tidyverse
.
The objectives will be to:
- Filter rows with
filter()
- Arrange rows with
arrange()
- Select columns with
select()
- Add new variables with
mutate()
For this session, we are going to work with a new dataset included in the nycflights13
package.
Solution
install.packages("nycflights13")
library("tidyverse")
library("nycflights13")
Data set : nycflights13
nycflights13::flights
contains all 336,776 flights that departed from New York City in 2013.
The data comes from the US Bureau of Transportation Statistics, and is documented in ?flights
?flights
You can display the first rows of the dataset to have an overview of the data.
flights
You can use the function colnames(dataset)
to get all the column names of a table:
colnames(flights)
Data type
In programming languages, variables can have different types.
When you display a tibble
you can see the type of a column.
Here is a list of common variable types that you will encounter:
- int stands for integers.
- dbl stands for doubles or real numbers.
- chr stands for character vectors or strings.
- dttm stands for date-times (a date + a time).
-
lgl stands for logical, vectors that contain only
TRUE
orFALSE
. - fctr stands for factors, which R uses to represent categorical variables with fixed possible values.
- date stands for dates.
It's important for you to know about and understand the different types because certain operations are only allowed between certain types. For instance, you cannot add an int to a chr, but you can add an int to a dbl the results will be a dbl.