Skip to content
Snippets Groups Projects
Verified Commit 878f2d8f authored by Laurent Modolo's avatar Laurent Modolo
Browse files

update session 7 and 8

parent bc19eac5
No related branches found
No related tags found
1 merge request!6Switch to main as default branch
...@@ -6,7 +6,7 @@ output: ...@@ -6,7 +6,7 @@ output:
rmdformats::downcute: rmdformats::downcute:
self_contain: true self_contain: true
use_bookdown: true use_bookdown: true
default_style: "dark" default_style: "light"
lightbox: true lightbox: true
css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css" css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
--- ---
......
...@@ -6,7 +6,7 @@ output: ...@@ -6,7 +6,7 @@ output:
rmdformats::downcute: rmdformats::downcute:
self_contain: true self_contain: true
use_bookdown: true use_bookdown: true
default_style: "dark" default_style: "light"
lightbox: true lightbox: true
css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css" css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
--- ---
......
...@@ -6,7 +6,7 @@ output: ...@@ -6,7 +6,7 @@ output:
rmdformats::downcute: rmdformats::downcute:
self_contain: true self_contain: true
use_bookdown: true use_bookdown: true
default_style: "dark" default_style: "light"
lightbox: true lightbox: true
css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css" css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
--- ---
......
...@@ -6,7 +6,7 @@ output: ...@@ -6,7 +6,7 @@ output:
rmdformats::downcute: rmdformats::downcute:
self_contain: true self_contain: true
use_bookdown: true use_bookdown: true
default_style: "dark" default_style: "light"
lightbox: true lightbox: true
css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css" css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
--- ---
......
...@@ -6,7 +6,7 @@ output: ...@@ -6,7 +6,7 @@ output:
rmdformats::downcute: rmdformats::downcute:
self_contain: true self_contain: true
use_bookdown: true use_bookdown: true
default_style: "dark" default_style: "light"
lightbox: true lightbox: true
css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css" css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
--- ---
...@@ -276,7 +276,7 @@ flights %>% ...@@ -276,7 +276,7 @@ flights %>%
mutate(wday = strftime(time_hour,'%A')) %>% mutate(wday = strftime(time_hour,'%A')) %>%
group_by(wday) %>% group_by(wday) %>%
mutate( mutate(
prop_cancel_day = sum(canceled)/sum(!canceled), prop_cancel_day = sum(canceled)/n(),
av_delay = mean(dep_delay, na.rm = TRUE) av_delay = mean(dep_delay, na.rm = TRUE)
) %>% ) %>%
ungroup() %>% ungroup() %>%
......
...@@ -6,7 +6,7 @@ output: ...@@ -6,7 +6,7 @@ output:
rmdformats::downcute: rmdformats::downcute:
self_contain: true self_contain: true
use_bookdown: true use_bookdown: true
default_style: "dark" default_style: "light"
lightbox: true lightbox: true
css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css" css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
--- ---
......
--- ---
title: '#7 String & RegExp' title: "R.7: String & RegExp"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)" author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "08 Nov 2019" date: "2021"
always_allow_html: yes
output: output:
beamer_presentation: rmdformats::downcute:
theme: metropolis self_contain: true
slide_level: 3 use_bookdown: true
fig_caption: no default_style: "light"
df_print: tibble lightbox: true
highlight: tango css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
latex_engine: xelatex
slidy_presentation:
highlight: tango
--- ---
```{r setup, include=FALSE, cache=TRUE}
knitr::opts_chunk$set(echo = FALSE) ```{r setup, include=FALSE}
library(tidyverse) rm(list=ls())
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
``` ```
```{r klippy, echo=FALSE, include=TRUE}
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
# Introduction
In the previous session, we have often overlooked a particular type of data, the **string**.
In R a sequence of characters is stored as a string.
## String basics In this session you will learn the distinctive features of the string type and how we can use string of character within a programming language which is composed of particular string of characters as function names, variables.
<div class="pencadre">
As usual we will need the `tidyverse` library.
</div>
<details><summary>Solution</summary>
<p>
```{r load_data, eval=T, message=F}
library(tidyverse)
``` ```
</p>
</details>
# String basics
## String definition
A string can be defined within double `"` or simple `'` quote
```{r string_def, eval=F, message=T}
string1 <- "This is a string" string1 <- "This is a string"
string2 <- 'If I want to include a "quote" string2 <- 'If I want to include a "quote"
inside a string, I use single quotes' inside a string, I use single quotes'
...@@ -37,38 +64,35 @@ If you forget to close a quote, you’ll see +, the continuation character: ...@@ -37,38 +64,35 @@ If you forget to close a quote, you’ll see +, the continuation character:
+ HELP I'M STUCK + HELP I'M STUCK
``` ```
If this happen to you, press Escape and try again! If this happens to you, press `Escape` and try again!
## String basics To include a literal single or double quote in a string you can use \\ to *escape* it:
To include a literal single or double quote in a string you can use \ to “escape” it: ```{r string_def_escape, eval=F, message=T}
```
double_quote <- "\"" # or '"' double_quote <- "\"" # or '"'
single_quote <- '\'' # or "'" single_quote <- '\'' # or "'"
``` ```
if you want to include a literal backslash, you’ll need to double it up: `"\\"`. If you want to include a literal backslash, you’ll need to double it up: `"\\"`.
## String basics
## String representation
the printed representation of a string is not the same as string itself The printed representation of a string is not the same as string itself
``` ```{r string_rep_escape_a, eval=T, message=T}
x <- c("\"", "\\") x <- c("\"", "\\")
x x
#> [1] "\"" "\\" ```
```{r string_rep_escape_b, eval=T, message=T}
writeLines(x) writeLines(x)
#> "
#> \
``` ```
## String basics Some characters have a special representation, they are called **special characters**.
The most common are `"\n"`, newline, and `"\t"`, tabulation, but you can see the complete list by requesting help on `"`: `?'"'`
Special characters: ## String operation
The most common are `"\n"`, newline, and `"\t"`, tab, but you can see the complete list by requesting help on `"`: `?'"'` You can perform basic operation on strings like
## String basics
- String length - String length
...@@ -87,9 +111,8 @@ x <- c("Apple", "Banana", "Pear") ...@@ -87,9 +111,8 @@ x <- c("Apple", "Banana", "Pear")
str_sub(x, 1, 3) str_sub(x, 1, 3)
``` ```
## String basics
- Subsetting strings - Subsetting strings
negative numbers count backwards from end negative numbers count backwards from the end
```{r str_sub2, eval=T, message=FALSE, cache=T} ```{r str_sub2, eval=T, message=FALSE, cache=T}
str_sub(x, -3, -1) str_sub(x, -3, -1)
``` ```
...@@ -104,13 +127,25 @@ str_to_lower(x) ...@@ -104,13 +127,25 @@ str_to_lower(x)
str_sort(x) str_sort(x)
``` ```
## Matching patterns with regular expressions # Matching patterns with regular expressions
Regexps are a very terse language that allow you to describe patterns in strings. Regexps are a very terse language that allows you to describe patterns in strings.
To learn regular expressions, we’ll use `str_view()` and `str_view_all()`. These functions take a character vector and a regular expression, and show you how they match. To learn regular expressions, we’ll use `str_view()` and `str_view_all()`. These functions take a character vector and a regular expression, and show you how they match.
## Matching patterns with regular expressions <div class="pencadre">
You need to install the `htmlwidgets` packages to use these functions
</div>
<details><summary>Solution</summary>
<p>
```{r load_htmlwidgets, eval=T, message=F}
library(htmlwidgets)
```
</p>
</details>
The most basic regular expression is the exact match.
```{r str_view, eval=T, message=FALSE, cache=T} ```{r str_view, eval=T, message=FALSE, cache=T}
x <- c("apple", "banana", "pear") x <- c("apple", "banana", "pear")
...@@ -124,12 +159,14 @@ x <- c("apple", "banana", "pear") ...@@ -124,12 +159,14 @@ x <- c("apple", "banana", "pear")
str_view(x, ".a.") str_view(x, ".a.")
``` ```
But if “`.`” matches any character, how do you match the character “`.`”?
You need to use an “escape” to tell the regular expression you want to match it exactly, not use its special behavior.
## Matching patterns with regular expressions Like strings, regexps use the backslash, `\`, to escape special behaviour.
So to match an `.`, you need the regexp `\.`. Unfortunately this creates a problem.
But if “`.`” matches any character, how do you match the character “`.`”? You need to use an “escape” to tell the regular expression you want to match it exactly, not use its special behaviour. Like strings, regexps use the backslash, `\`, to escape special behaviour. So to match an ., you need the regexp `\.`. Unfortunately this creates a problem. We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings. So to create the regular expression `\.` we need the string "`\\.`". We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings.
So to create the regular expression `\.` we need the string "`\\.`".
## Matching patterns with regular expressions
```{r str_viewdotescape, eval=T, message=FALSE, cache=T} ```{r str_viewdotescape, eval=T, message=FALSE, cache=T}
dot <- "\\." dot <- "\\."
...@@ -137,12 +174,7 @@ writeLines(dot) ...@@ -137,12 +174,7 @@ writeLines(dot)
str_view(c("abc", "a.c", "bef"), "a\\.c") str_view(c("abc", "a.c", "bef"), "a\\.c")
``` ```
## Matching patterns with regular expressions If `\` is used as an escape character in regular expressions, how do you match a literal `\`? Well, you need to escape it, creating the regular expression `\\`. To create that regular expression, you need to use a string, which also needs to escape `\`. That means to match a literal `\` you need to write "`\\\\`" — you need four backslashes to match one!
If `\` is used as an escape character in regular expressions, how do you match a literal `\`? Well you need to escape it, creating the regular expression `\\`. To create that regular expression, you need to use a string, which also needs to escape `\`. That means to match a literal `\` you need to write "`\\\\`" — you need four backslashes to match one!
## Matching patterns with regular expressions
```{r str_viewbackslashescape, eval=T, message=FALSE, cache=T} ```{r str_viewbackslashescape, eval=T, message=FALSE, cache=T}
x <- "a\\b" x <- "a\\b"
...@@ -152,34 +184,26 @@ str_view(x, "\\\\") ...@@ -152,34 +184,26 @@ str_view(x, "\\\\")
## Exercises ## Exercises
- Explain why each of these strings don’t match a \: "`\`", "`\\`", "`\\\`". - Explain why each of these strings doesn’t match a \: "`\`", "`\\`", "`\\\`".
- How would you match the sequence `"'\`? - How would you match the sequence `"'\`?
- What patterns will the regular expression `\..\..\..` match? How would you represent it as a string? - What patterns will the regular expression `\..\..\..` match? How would you represent it as a string?
## Anchors ## Anchors
- `^` match the start of the string. Until now we searched for patterns anywhere in the target string. But we can use anchors to be more precise.
- `$` match the end of the string.
- `^` Match the start of the string.
- `$` Match the end of the string.
```{r str_viewanchors, eval=T, cache=T} ```{r str_viewanchors, eval=T, cache=T}
x <- c("apple", "banana", "pear") x <- c("apple", "banana", "pear")
str_view(x, "^a") str_view(x, "^a")
``` ```
## Anchors
- `^` match the start of the string.
- `$` match the end of the string.
```{r str_viewanchorsend, eval=T, cache=T} ```{r str_viewanchorsend, eval=T, cache=T}
str_view(x, "a$") str_view(x, "a$")
``` ```
## Anchors
- `^` match the start of the string.
- `$` match the end of the string.
```{r str_viewanchorsstartend, eval=T, cache=T} ```{r str_viewanchorsstartend, eval=T, cache=T}
x <- c("apple pie", "apple", "apple cake") x <- c("apple pie", "apple", "apple cake")
str_view(x, "^apple$") str_view(x, "^apple$")
...@@ -187,36 +211,33 @@ str_view(x, "^apple$") ...@@ -187,36 +211,33 @@ str_view(x, "^apple$")
## Exercices ## Exercices
- How would you match the literal string `"$^$"`? - How would you match the literal string `"$^$"`?
- Given the corpus of common words in stringr::words, create regular expressions that find all words that: - Given the corpus of common words in stringr::words, create regular expressions that find all words that:
-Start with “y”. -Start with “y”.
- End with “x” - End with “x”
- Are exactly three letters long. (Don’t cheat by using `str_length()`!) - Are exactly three letters long. (Don’t cheat by using `str_length()`!)
- Have seven letters or more. - Have seven letters or more.
Since this list is long, you might want to use the match argument to str_view() to show only the matching or non-matching words. Since this list is long, you might want to use the match argument to `str_view()` to show only the matching or non-matching words.
## Character classes and alternatives ## Character classes and alternatives
In regular expression we have special character and patterns that match groups of characters.
- `\d`: matches any digit. - `\d`: matches any digit.
- `\s`: matches any whitespace (e.g. space, tab, newline). - `\s`: matches any whitespace (e.g. space, tab, newline).
- `[abc]`: matches a, b, or c. - `[abc]`: matches a, b, or c.
- `[^abc]`: matches anything except a, b, or c. - `[^abc]`: matches anything except a, b, or c.
``` ```{r str_viewanchorsstartend_b, eval=T, cache=T}
str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c") str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c")
str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c") str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c")
str_view(c("abc", "a.c", "a*c", "a c"), "a[ ]") str_view(c("abc", "a.c", "a*c", "a c"), "a[ ]")
``` ```
## Character classes and alternatives You can use alternations to pick between one or more alternative patterns. For example, `abc|d..f` will match either `abc`, or `deaf`. Note that the precedent for `|` is low, so that `abc|xyz` matches `abc` or `xyz` not `abcyz` or `abxyz`. Like with mathematical expressions, if presidents ever get confusing, use parentheses to make it clear what you want:
You can use alternation to pick between one or more alternative patterns. For example, abc|d..f will match either ‘“abc”’, or "deaf". Note that the precedence for | is low, so that abc|xyz matches abc or xyz not abcyz or abxyz. Like with mathematical expressions, if precedence ever gets confusing, use parentheses to make it clear what you want:
``` ```{r str_viewanchorsstartend_c, eval=T, cache=T}
str_view(c("grey", "gray"), "gr(e|a)y") str_view(c("grey", "gray"), "gr(e|a)y")
``` ```
...@@ -225,25 +246,25 @@ str_view(c("grey", "gray"), "gr(e|a)y") ...@@ -225,25 +246,25 @@ str_view(c("grey", "gray"), "gr(e|a)y")
Create regular expressions to find all words that: Create regular expressions to find all words that:
- Start with a vowel. - Start with a vowel.
- That only contain consonants. (Hint: thinking about matching “not”-vowels.) - That only contains consonants. (Hint: thinking about matching “not”-vowels.)
- End with ed, but not with eed. - End with ed, but not with eed.
- End with ing or ise. - End with ing or ise.
## Repetition ## Repetition
Now that you know how to search for groups of characters you can define the number of times you want to see them.
- `?`: 0 or 1 - `?`: 0 or 1
- `+`: 1 or more - `+`: 1 or more
- `*`: 0 or more - `*`: 0 or more
``` ```{r str_view_repetition, eval=T, cache=T}
x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII" x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII"
str_view(x, "CC?") str_view(x, "CC?")
str_view(x, "CC+") str_view(x, "CC+")
str_view(x, 'C[LX]+') str_view(x, 'C[LX]+')
``` ```
## Repetition
You can also specify the number of matches precisely: You can also specify the number of matches precisely:
- `{n}`: exactly n - `{n}`: exactly n
...@@ -251,7 +272,7 @@ You can also specify the number of matches precisely: ...@@ -251,7 +272,7 @@ You can also specify the number of matches precisely:
- `{,m}`: at most m - `{,m}`: at most m
- `{n,m}`: between n and m - `{n,m}`: between n and m
``` ```{r str_view_repetition_b, eval=T, cache=T}
str_view(x, "C{2}") str_view(x, "C{2}")
str_view(x, "C{2,}") str_view(x, "C{2,}")
str_view(x, "C{2,3}") str_view(x, "C{2,3}")
...@@ -272,16 +293,14 @@ str_view(x, "C{2,3}") ...@@ -272,16 +293,14 @@ str_view(x, "C{2,3}")
## Grouping ## Grouping
You learned about parentheses as a way to disambiguate complex expressions. Parentheses also create a numbered capturing group (number 1, 2 etc.). A capturing group stores the part of the string matched by the part of the regular expression inside the parentheses. You can refer to the same text as previously matched by a capturing group with backreferences, like `\1`, `\2` etc. You learned about parentheses as a way to disambiguate complex expressions. Parentheses also create a numbered capturing group (number 1, 2 etc.). A capturing group stores the part of the string matched by the part of the regular expression inside the parentheses. You can refer to the same text as previously matched by a capturing group with back references, like `\1`, `\2` etc.
``` ```{r str_view_grouping, eval=T, cache=T}
str_view(fruit, "(..)\\1", match = TRUE) str_view(fruit, "(..)\\1", match = TRUE)
``` ```
## Exercices ## Exercices
- Describe, in words, what these expressions will match: - Describe, in words, what these expressions will match:
- `"(.)\1\1"` - `"(.)\1\1"`
- `"(.)(.)\\2\\1"` - `"(.)(.)\\2\\1"`
...@@ -295,32 +314,34 @@ str_view(fruit, "(..)\\1", match = TRUE) ...@@ -295,32 +314,34 @@ str_view(fruit, "(..)\\1", match = TRUE)
## Detect matches ## Detect matches
``` ```{r str_view_match, eval=T, cache=T}
x <- c("apple", "banana", "pear") x <- c("apple", "banana", "pear")
str_detect(x, "e") str_detect(x, "e")
``` ```
How many common words start with t? How many common words start with t?
``` ```{r str_view_match_b, eval=T, cache=T}
sum(str_detect(words, "^t")) sum(str_detect(words, "^t"))
``` ```
What proportion of common words end with a vowel? What proportion of common words ends with a vowel?
``` ```{r str_view_match_c, eval=T, cache=T}
mean(str_detect(words, "[aeiou]$")) mean(str_detect(words, "[aeiou]$"))
``` ```
## Combining detection ## Combining detection
Find all words containing at least one vowel, and negate Find all words containing at least one vowel, and negate
```
```{r str_view_detection, eval=T, cache=T}
no_vowels_1 <- !str_detect(words, "[aeiou]") no_vowels_1 <- !str_detect(words, "[aeiou]")
``` ```
Find all words consisting only of consonants (non-vowels) Find all words consisting only of consonants (non-vowels)
```
```{r str_view_detection_b, eval=T, cache=T}
no_vowels_2 <- str_detect(words, "^[^aeiou]+$") no_vowels_2 <- str_detect(words, "^[^aeiou]+$")
identical(no_vowels_1, no_vowels_2) identical(no_vowels_1, no_vowels_2)
``` ```
...@@ -373,8 +394,6 @@ has_noun %>% ...@@ -373,8 +394,6 @@ has_noun %>%
str_extract(noun) str_extract(noun)
``` ```
## Grouped matches
`str_extract()` gives us the complete match; `str_match()` gives each individual component. `str_extract()` gives us the complete match; `str_match()` gives each individual component.
```{r noun_regex_match, eval=T, cache=T} ```{r noun_regex_match, eval=T, cache=T}
...@@ -384,11 +403,11 @@ has_noun %>% ...@@ -384,11 +403,11 @@ has_noun %>%
## Exercises ## Exercises
- Find all words that come after a number like one, two, three etc. Pull out both the number and the word. - Find all words that come after a `number` like `one`, `two`, `three` etc. Pull out both the number and the word.
## Replacing matches ## Replacing matches
Instead of replacing with a fixed string you can use backreferences to insert components of the match. In the following code, I flip the order of the second and third words. Instead of replacing with a fixed string, you can use back references to insert components of the match. In the following code, I flip the order of the second and third words.
```{r replacing_matches, eval=T, cache=T} ```{r replacing_matches, eval=T, cache=T}
sentences %>% sentences %>%
...@@ -408,4 +427,6 @@ sentences %>% ...@@ -408,4 +427,6 @@ sentences %>%
sentences %>% sentences %>%
head(5) %>% head(5) %>%
str_split("\\s") str_split("\\s")
``` ```
\ No newline at end of file
## See you in [R.8: Factors](http://perso.ens-lyon.fr/laurent.modolo/R/session_8/)
\ No newline at end of file
--- ---
title: '#8 Factors' title: "R.8: Factors"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)" author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "31 Jan 2020" date: "2021"
always_allow_html: yes
output: output:
slidy_presentation: rmdformats::downcute:
highlight: tango self_contain: true
beamer_presentation: use_bookdown: true
theme: metropolis default_style: "light"
slide_level: 3 lightbox: true
fig_caption: no css: "http://perso.ens-lyon.fr/laurent.modolo/R/src/style.css"
df_print: tibble
highlight: tango
latex_engine: xelatex
--- ---
```{r setup, include=FALSE, cache=TRUE}
```{r setup, include=FALSE}
rm(list=ls())
knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
```
```{r klippy, echo=FALSE, include=TRUE}
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
# Introduction
In this session, you will learn more about the factor type in R.
Factors can be very useful, but you have to be mindful of the implicit conversions from simple vector to factor !
They are the source of loot of pain for R programmers.
<div class="pencadre">
As usual we will need the `tidyverse` library.
</div>
<details><summary>Solution</summary>
<p>
```{r load_data, eval=T, message=F}
library(tidyverse) library(tidyverse)
``` ```
</p>
</details>
## Creating factors # Creating factors
Imagine that you have a variable that records month: Imagine that you have a variable that records month:
...@@ -41,8 +64,6 @@ x2 <- c("Dec", "Apr", "Jam", "Mar") ...@@ -41,8 +64,6 @@ x2 <- c("Dec", "Apr", "Jam", "Mar")
sort(x1) sort(x1)
``` ```
## Creating factors
You can fix both of these problems with a factor. You can fix both of these problems with a factor.
```{r sort_month_factor, eval=T, cache=T} ```{r sort_month_factor, eval=T, cache=T}
...@@ -55,8 +76,6 @@ y1 ...@@ -55,8 +76,6 @@ y1
sort(y1) sort(y1)
``` ```
## Creating factors
And any values not in the set will be converted to NA: And any values not in the set will be converted to NA:
```{r sort_month_factor2, eval=T, cache=T} ```{r sort_month_factor2, eval=T, cache=T}
...@@ -72,16 +91,14 @@ f2 ...@@ -72,16 +91,14 @@ f2
levels(f2) levels(f2)
``` ```
## General Social Survey # General Social Survey
```{r race_count, eval=T, cache=T} ```{r race_count, eval=T, cache=T}
gss_cat %>% gss_cat %>%
count(race) count(race)
``` ```
## General Social Survey By default, `ggplot2` will drop levels that don’t have any values. You can force them to display with:
By default, ggplot2 will drop levels that don’t have any values. You can force them to display with:
```{r race_plot, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r race_plot, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(gss_cat, aes(race)) + ggplot(gss_cat, aes(race)) +
...@@ -89,7 +106,7 @@ ggplot(gss_cat, aes(race)) + ...@@ -89,7 +106,7 @@ ggplot(gss_cat, aes(race)) +
scale_x_discrete(drop = FALSE) scale_x_discrete(drop = FALSE)
``` ```
## Modifying factor order # Modifying factor order
It’s often useful to change the order of the factor levels in a visualisation. It’s often useful to change the order of the factor levels in a visualisation.
...@@ -104,27 +121,17 @@ relig_summary <- gss_cat %>% ...@@ -104,27 +121,17 @@ relig_summary <- gss_cat %>%
ggplot(relig_summary, aes(tvhours, relig)) + geom_point() ggplot(relig_summary, aes(tvhours, relig)) + geom_point()
``` ```
**8_a**
## Modifying factor order
It is difficult to interpret this plot because there’s no overall pattern. We can improve it by reordering the levels of relig using `fct_reorder()`. `fct_reorder()` takes three arguments: It is difficult to interpret this plot because there’s no overall pattern. We can improve it by reordering the levels of relig using `fct_reorder()`. `fct_reorder()` takes three arguments:
- `f`, the factor whose levels you want to modify. - `f`, the factor whose levels you want to modify.
- `x`, a numeric vector that you want to use to reorder the levels. - `x`, a numeric vector that you want to use to reorder the levels.
- Optionally, `fun`, a function that’s used if there are multiple values of `x` for each value of `f`. The default value is `median`. - Optionally, `fun`, a function that’s used if there are multiple values of `x` for each value of `f`. The default value is `median`.
## Modifying factor order
```{r tv_hour_order, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r tv_hour_order, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) + ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) +
geom_point() geom_point()
``` ```
**8_b**
## Modifying factor order
As you start making more complicated transformations, I’d recommend moving them out of `aes()` and into a separate `mutate()` step. For example, you could rewrite the plot above as: As you start making more complicated transformations, I’d recommend moving them out of `aes()` and into a separate `mutate()` step. For example, you could rewrite the plot above as:
```{r tv_hour_order_mutate, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r tv_hour_order_mutate, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
...@@ -133,9 +140,8 @@ relig_summary %>% ...@@ -133,9 +140,8 @@ relig_summary %>%
ggplot(aes(tvhours, relig)) + ggplot(aes(tvhours, relig)) +
geom_point() geom_point()
``` ```
**8_c**
## `fct_reorder2()` # `fct_reorder2()`
Another type of reordering is useful when you are colouring the lines on a plot. `fct_reorder2()` reorders the factor by the `y` values associated with the largest `x` values. This makes the plot easier to read because the line colours line up with the legend. Another type of reordering is useful when you are colouring the lines on a plot. `fct_reorder2()` reorders the factor by the `y` values associated with the largest `x` values. This makes the plot easier to read because the line colours line up with the legend.
...@@ -146,23 +152,14 @@ by_age <- gss_cat %>% ...@@ -146,23 +152,14 @@ by_age <- gss_cat %>%
group_by(age) %>% group_by(age) %>%
mutate(prop = n / sum(n)) mutate(prop = n / sum(n))
``` ```
**8_d**
## `fct_reorder2()`
```{r fct_reorder2a, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r fct_reorder2a, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(by_age, aes(age, prop, colour = marital)) + ggplot(by_age, aes(age, prop, colour = marital)) +
geom_line(na.rm = TRUE) geom_line(na.rm = TRUE)
``` ```
**8_e**
## `fct_reorder2()`
```{r fct_reorder2b, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE} ```{r fct_reorder2b, cache = TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) + ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
geom_line() + geom_line() +
labs(colour = "marital") labs(colour = "marital")
``` ```
\ No newline at end of file
**8_f**
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment