@@ -228,7 +228,7 @@ One important feature of R that can make comparison tricky is missing values, or
...
@@ -228,7 +228,7 @@ One important feature of R that can make comparison tricky is missing values, or
Indeed each of the variable type can contain either a value of this type (i.e., `2` for an **int**) or nothing.
Indeed each of the variable type can contain either a value of this type (i.e., `2` for an **int**) or nothing.
The *nothing recorded in a variable* status is represented with the `NA` symbol.
The *nothing recorded in a variable* status is represented with the `NA` symbol.
As operations with `NA` values dont make sense, if you have `NA` somewhere in your operation, the results will be `NA`
As operations with `NA` values don't make sense, if you have `NA` somewhere in your operation, the results will be `NA`
```{r filter_logical_operators_NA, include=TRUE}
```{r filter_logical_operators_NA, include=TRUE}
NA > 5
NA > 5
...
@@ -245,16 +245,19 @@ is.na(NA)
...
@@ -245,16 +245,19 @@ is.na(NA)
`filter()` only includes rows where the condition is `TRUE`; it excludes both `FALSE` and `NA` values. If you want to preserve missing values, ask for them explicitly:
`filter()` only includes rows where the condition is `TRUE`; it excludes both `FALSE` and `NA` values. If you want to preserve missing values, ask for them explicitly:
`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables.
`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables.
You can select by column names
You can select by column names
```{r select_ymd_a, include=TRUE}
```{r select_ymd_a, include=TRUE}
select(flights, year, month, day)
select(flights, year, month, day)
```
```
By defining a range of columns
By defining a range of columns
```{r select_ymd_b, include=TRUE}
```{r select_ymd_b, include=TRUE}
select(flights, year:day)
select(flights, year:day)
```
```
Or you can do a negative (`-`) to remove columns.
Or, you can do a negative (`-`) to remove columns.
```{r select_ymd_c, include=TRUE}
```{r select_ymd_c, include=TRUE}
select(flights, -(year:day))
select(flights, -(year:day))
```
```
And, you can also rename column names on the fly.
```{r select_ymd_d, include=TRUE}
select(flights, Y = year, M = month, D = day)
```
## Helper functions
## Helper functions
here are a number of helper functions you can use within `select()`:
here are a number of helper functions you can use within `select()`:
- `starts_with("abc")`: matches names that begin with `"abc"`.
- `starts_with("abc")`: matches column names that begin with `"abc"`.
- `ends_with("xyz")`: matches names that end with `"xyz"`.
- `ends_with("xyz")`: matches column names that end with `"xyz"`.
- `contains("ijk")`: matches names that contain `"ijk"`.
- `contains("ijk")`: matches column names that contain `"ijk"`.
- `num_range("x", 1:3)`: matches `x1`, `x2` and `x3`.
- `num_range("x", 1:3)`: matches `x1`, `x2` and `x3`.
- `where(test_function)`: select columns for which the result is TRUE.
See `?select` for more details.
See `?select` for more details.
## Challenges
## Challenges
<div class="pencadre">
<div class="pencadre">
<p>
- Brainstorm as many ways as possible to select `dep_time`, `dep_delay`, `arr_time`, and `arr_delay` from `flights`.
- Brainstorm as many ways as possible to select only `dep_time`, `dep_delay`, `arr_time`, and `arr_delay` from `flights`. You can associate several selections arguments with `|` , `&` and `!`.
- all_of() is for strict selection. If any of the variables in the character vector is missing, an error is thrown.
- any_of() doesn't check for missing variables. It is especially useful with negative selections, when you would like to make sure a variable is removed.
```{r challenge_select_b2, eval=FALSE}
vars <- c(vars, "toto")
select(flights, any_of(vars))
select(flights, all_of(vars))
```
</p>
</details>
- Select all columns wich contain character values ? numeric values ?
</p>
</div>
<details><summary>Solution</summary>
<p>
```{r challenge_select_e1, eval=FALSE}
select(flights, where(is.character))
select(flights, where(is.numeric))
```
```
</p>
</p>
</details>
</details>
- Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?
- Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?