-
Laurent Modolo authoredLaurent Modolo authored
title: '#8 Factors'
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "31 Jan 2020"
always_allow_html: yes
output:
slidy_presentation:
highlight: tango
beamer_presentation:
theme: metropolis
slide_level: 3
fig_caption: no
df_print: tibble
highlight: tango
latex_engine: xelatex
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
Creating factors
Imagine that you have a variable that records month:
x1 <- c("Dec", "Apr", "Jan", "Mar")
Using a string to record this variable has two problems:
- There are only twelve possible months, and there’s nothing saving you from typos:
x2 <- c("Dec", "Apr", "Jam", "Mar")
- It doesn’t sort in a useful way:
sort(x1)
Creating factors
You can fix both of these problems with a factor.
month_levels <- c(
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
)
y1 <- factor(x1, levels = month_levels)
y1
sort(y1)
Creating factors
And any values not in the set will be converted to NA:
y2 <- parse_factor(x2, levels = month_levels)
y2
Sometimes you’d prefer that the order of the levels match the order of the first appearance in the data.
f2 <- x1 %>% factor() %>% fct_inorder()
f2
levels(f2)
General Social Survey
gss_cat %>%
count(race)
General Social Survey
By default, ggplot2 will drop levels that don’t have any values. You can force them to display with:
ggplot(gss_cat, aes(race)) +
geom_bar() +
scale_x_discrete(drop = FALSE)
Modifying factor order
It’s often useful to change the order of the factor levels in a visualisation.
relig_summary <- gss_cat %>%
group_by(relig) %>%
summarise(
age = mean(age, na.rm = TRUE),
tvhours = mean(tvhours, na.rm = TRUE),
n = n()
)
ggplot(relig_summary, aes(tvhours, relig)) + geom_point()
8_a
Modifying factor order
It is difficult to interpret this plot because there’s no overall pattern. We can improve it by reordering the levels of relig using fct_reorder()
. fct_reorder()
takes three arguments:
-
f
, the factor whose levels you want to modify. -
x
, a numeric vector that you want to use to reorder the levels. - Optionally,
fun
, a function that’s used if there are multiple values ofx
for each value off
. The default value ismedian
.
Modifying factor order
ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) +
geom_point()
8_b
Modifying factor order
As you start making more complicated transformations, I’d recommend moving them out of aes()
and into a separate mutate()
step. For example, you could rewrite the plot above as:
relig_summary %>%
mutate(relig = fct_reorder(relig, tvhours)) %>%
ggplot(aes(tvhours, relig)) +
geom_point()
8_c
fct_reorder2()
Another type of reordering is useful when you are colouring the lines on a plot. fct_reorder2()
reorders the factor by the y
values associated with the largest x
values. This makes the plot easier to read because the line colours line up with the legend.
by_age <- gss_cat %>%
filter(!is.na(age)) %>%
count(age, marital) %>%
group_by(age) %>%
mutate(prop = n / sum(n))
8_d
fct_reorder2()
ggplot(by_age, aes(age, prop, colour = marital)) +
geom_line(na.rm = TRUE)
8_e
fct_reorder2()
ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
geom_line() +
labs(colour = "marital")
8_f