`nycflights13::flights` Contains all 336,776 flights that departed from New York City in 2013.
`nycflights13::flights` Contains all 336,776 flights that departed from New York City in 2013.
The data comes from the US Bureau of Transportation Statistics, and is documented in `?flights`
The data comes from the US Bureau of Transportation Statistics, and is documented in `?flights`
```R
?flights
```
You can display the first rows of the dataset to have an overview of the data.
```{r display_data, include=TRUE}
```{r display_data, include=TRUE}
flights
flights
```
```
To know all the colnames of a table you can use the function `colnames(dataset)`
```{r display_colnames, include=TRUE}
colnames(flights)
```
## Data type
## Data type
In programming languages, all variables are not equal.
In programming languages, all variables are not equal.
...
@@ -90,48 +103,89 @@ You cannot add an **int** to a **chr**, but you can add an **int** to a **dbl**
...
@@ -90,48 +103,89 @@ You cannot add an **int** to a **chr**, but you can add an **int** to a **dbl**
Variable **types** are important to keep in mind for comparisons.
Variable **types** are important to keep in mind for comparisons.
The `filter()` function allows you to subset observations based on their values.
The `filter()` function allows you to subset observations based on their values.
<div class="pencadre">
The good reflex to take when you meet a new function of a package is to look at the help with `?function_name` to learn how to use it and to know the different arguments.
```R
?filter
```
What is the results of the following `filter` command ?
```{r filter_month_day, include=TRUE, eval=FALSE}
## Use test to filter on a column
filter(flights, month == 1, day == 1)
You can use the relational operators (`<`,`>`,`==`,`<=`,`>=`,`!=`) to make a test on a column and keep rows for which the results is `TRUE`.
```{r filter_sup_eq, include=TRUE, eval=FALSE}
filter(flights, air_time >= 680)
filter(flights, carrier == "HA")
filter(flights, origin != "JFK")
```
```
</div>
The operator `%in%` is very usefull to test if a value is in a list.
```{r filter_sup_inf, include=TRUE, eval=FALSE}
filter(flights, carrier %in% c("OO","AS"))
filter(flights, month %in% c(5,6,7,12))
```
`dplyr` functions never modify their inputs, so if you want to save the result, you’ll need to use the assignment operator, `<-`
`dplyr` functions never modify their inputs, so if you want to save the result, you’ll need to use the assignment operator, `<-`
<div class="pencadre">
<div class="pencadre">
Save the previous command in a `jan1` variable
Save the flights longer than 680 minutes in a `long_flights` variable
</div>
</div>
<details><summary>Solution</summary>
<details><summary>Solution</summary>
<p>
<p>
```{r filter_month_day_sav, include=TRUE}
```{r filter_day_sav, include=TRUE}
jan1 <- filter(flights, month == 1, day == 1)
long_flights <- filter(flights, air_time >= 680)
```
```
</p>
</p>
</details>
</details>
## Logical operators to filter on several columns
Multiple arguments to `filter()` are combined with **AND**: every expression must be `TRUE` in order for a row to be included in the output.
```{r filter_month_day_sav, include=TRUE}
filter(flights, month == 12, day == 25)
```
In R you can use the symbols `&` (and), `|` (or), `!` (not) and the function `xor()` to build other kinds of tests.

<div class="pencadre">
<div class="pencadre">
R either prints out the results, or saves them to a variable.
Display the `long_flights` variable and predict the results of
What happens when you put your variable assignment code between parenthesis `(` `)` ?
```{r filter_month_day_sav_display, eval=FALSE}
```{r logical_operators_exemples2, eval=FALSE}
(dec25 <- filter(flights, month == 12, day == 25))
Combinations of logical operators is a powerful programmatic way to select subset of data.
Combinations of logical operators is a powerful programmatic way to select subset of data.
Keep in mind, however, that long logical expression can be hard to read and understand, so it may be easier to apply successive small filters instead of one long one.
Keep in mind, however, that long logical expression can be hard to read and understand, so it may be easier to apply successive small filters instead of one long one.
<div class="pencadre">
R either prints out the results, or saves them to a variable.
What happens when you put your variable assignment code between parenthesis `(` `)` ?
```{r filter_month_day_sav_display, eval=FALSE}
(dec25 <- filter(flights, month == 12, day == 25))
```
</div>
## Missing values
## Missing values
One important feature of R that can make comparison tricky is missing values, or `NA`s for **Not Availables**.
One important feature of R that can make comparison tricky is missing values, or `NA`s for **Not Availables**.