Skip to content
Snippets Groups Projects
Commit 9adeb3c1 authored by Gilquin's avatar Gilquin
Browse files

fix: correct code formatting and warnings

correct unconsistent code formatting with styler package
correct deprecated warnings
parent 4f0ed879
No related branches found
No related tags found
1 merge request!9fix: correct code formatting and warnings
......@@ -71,8 +71,11 @@ if (! require("rvest")) {
}
library(rvest)
url <- 'https://www.bioconductor.org/packages/release/bioc/'
biocPackages <- url %>% read_html() %>% html_table() %>%.[[1]]
url <- "https://www.bioconductor.org/packages/release/bioc/"
biocPackages <- url %>%
read_html() %>%
html_table() %>%
.[[1]]
bioconductor_packages <- nrow(biocPackages)
```
......
......@@ -7,8 +7,9 @@ date: "2022"
```{r include=FALSE}
library(fontawesome)
if("conflicted" %in% .packages())
if ("conflicted" %in% .packages()) {
conflicted::conflicts_prefer(dplyr::filter)
}
```
```{r setup, include=FALSE}
......@@ -22,7 +23,8 @@ library("tidyverse")
tmp <- tempfile(fileext = ".zip")
download.file("http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip",
tmp,
quiet = TRUE)
quiet = TRUE
)
unzip(tmp, exdir = "data-raw")
new_class_level <- c(
"Compact Cars",
......@@ -632,13 +634,13 @@ p2
```{r,fig.width=8, fig.height=4.5, message=FALSE}
plot_grid(p1, p2, labels = c('A', 'B'), label_size = 12)
plot_grid(p1, p2, labels = c("A", "B"), label_size = 12)
```
You can also save it in a file.
```{r, eval=F}
p_final = plot_grid(p1, p2, labels = c('A', 'B'), label_size = 12)
p_final <- plot_grid(p1, p2, labels = c("A", "B"), label_size = 12)
ggsave("test_plot_1_and_2.png", p_final, width = 20, height = 8, units = "cm")
```
......@@ -650,13 +652,15 @@ Use the `cowplot` documentation to reproduce this plot and save it.
```{r, echo=F}
p1 <- ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point() + theme_bw()
geom_point() +
theme_bw()
p2 <- ggplot(data = new_mpg, mapping = aes(x = cty, y = hwy, color = class)) +
geom_point() + theme_bw()
geom_point() +
theme_bw()
p_row <- plot_grid(p1 + theme(legend.position = "none"), p2 + theme(legend.position = "none"), labels = c('A', 'B'), label_size = 12)
p_legend <- get_legend(p1 + theme(legend.position = "top"))
p_row <- plot_grid(p1 + theme(legend.position = "none"), p2 + theme(legend.position = "none"), labels = c("A", "B"), label_size = 12)
p_legend <- get_plot_component(p1, "guide-box-top", return_all = TRUE)
plot_grid(p_row, p_legend, nrow = 2, rel_heights = c(1, 0.2))
```
......@@ -665,13 +669,15 @@ plot_grid(p_row, p_legend, nrow = 2, rel_heights = c(1,0.2))
<p>
```{r , echo = TRUE, eval = F}
p1 <- ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point() + theme_bw()
geom_point() +
theme_bw()
p2 <- ggplot(data = new_mpg, mapping = aes(x = cty, y = hwy, color = class)) +
geom_point() + theme_bw()
geom_point() +
theme_bw()
p_row <- plot_grid(p1 + theme(legend.position = "none"), p2 + theme(legend.position = "none"), labels = c('A', 'B'), label_size = 12)
p_legend <- get_legend(p1 + theme(legend.position = "top"))
p_row <- plot_grid(p1 + theme(legend.position = "none"), p2 + theme(legend.position = "none"), labels = c("A", "B"), label_size = 12)
p_legend <- get_plot_component(p1, "guide-box-top", return_all = TRUE)
p_final <- plot_grid(p_row, p_legend, nrow = 2, rel_heights = c(1, 0.2))
p_final
......
......@@ -125,7 +125,7 @@ ggplot(data = demo, mapping = aes(x = cut, y = freq)) +
You might want to override the default mapping from transformed variables to aesthetics ( e.g., proportion).
```{r 3_b, include=TRUE, fig.width=8, fig.height=4.5}
ggplot(data = diamonds, mapping = aes(x = cut, y = ..prop.., group = 1)) +
ggplot(data = diamonds, mapping = aes(x = cut, y = after_stat(prop), group = 1)) +
geom_bar()
```
......@@ -136,7 +136,7 @@ In our proportion bar chart, we need to set `group = 1`. Why?
<details><summary>Solution</summary>
<p>
```{r diamonds_stats_challenge, include=TRUE, message=FALSE, fig.width=8, fig.height=4.5}
ggplot(data = diamonds, mapping = aes(x = cut, y = ..prop..)) +
ggplot(data = diamonds, mapping = aes(x = cut, y = after_stat(prop))) +
geom_bar()
```
......@@ -155,7 +155,6 @@ value, to draw attention to the summary that you are computing.
<details><summary>Solution</summary>
<p>
```{r 3_c, include=TRUE, fig.width=8, fig.height=4.5, message=FALSE}
ggplot(data = diamonds, mapping = aes(x = cut, y = depth)) +
stat_summary()
```
......@@ -421,7 +420,7 @@ ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent)) +
geom_point() +
scale_x_log10() +
transition_time(year) +
labs(title = 'Year: {as.integer(frame_time)}')
labs(title = "Year: {as.integer(frame_time)}")
```
</p>
</details>
\ No newline at end of file
......@@ -7,8 +7,9 @@ date: "2022"
```{r include=FALSE}
library(fontawesome)
if("conflicted" %in% .packages())
if ("conflicted" %in% .packages()) {
conflicted::conflicts_prefer(dplyr::filter)
}
```
```{r setup, include=FALSE}
......@@ -49,7 +50,7 @@ library("nycflights13")
### Data set : nycflights13
`nycflights13::flights` contains all $336 \ 776$ flights that departed from New York City in 2013.
`nycflights13::flights` contains all 336,776 flights that departed from New York City in 2013.
The data comes from the US Bureau of Transportation Statistics, and is documented in `?flights`
```R
......@@ -233,7 +234,8 @@ is.na(NA)
`filter()` only includes rows where the condition is `TRUE`; it excludes both `FALSE` and `NA` values. If you want to preserve missing values, ask for them explicitly:
```{r filter_logical_operators_test_NA2, include=TRUE}
df <- tibble( x = c("A","B","C"),
df <- tibble(
x = c("A", "B", "C"),
y = c(1, NA, 3)
)
df
......@@ -308,7 +310,8 @@ arrange(flights, distance, desc(dep_delay))
Missing values are always sorted at the end:
```{r arrange_NA, include=TRUE}
df <- tibble( x = c("A","B","C"),
df <- tibble(
x = c("A", "B", "C"),
y = c(1, NA, 3)
)
df
......@@ -485,7 +488,7 @@ It's often useful to add new columns that are functions of existing columns. Tha
First let's create a thinner dataset to work on `flights_thin` that contains:
- columns from `year` to `day`
- columns that ends with `delays`
- columns that ends with `delay`
- the `distance` and `air_time` columns
- the `dep_time` and `sched_dep_time` columns
......@@ -585,7 +588,8 @@ mutate(
HH = dep_time %/% 100,
MM = dep_time %% 100,
dep_time2 = HH * 60 + MM,
.after = "dep_time" )
.after = "dep_time"
)
```
or `.keep = "used"` to keep only the columns used for the calculus which can be usefull for debugging,
......@@ -596,7 +600,8 @@ mutate(
HH = dep_time %/% 100,
MM = dep_time %% 100,
dep_time2 = HH * 60 + MM,
.keep = "used" )
.keep = "used"
)
```
In one row (or you can also remove columns HH and MM using select):
......@@ -605,7 +610,8 @@ In one row (or you can also remove columns HH and MM using select):
mutate(
flights_thin_toy,
dep_time2 = dep_time %/% 100 * 60 + dep_time %% 100,
.after = "dep_time" )
.after = "dep_time"
)
```
**Note**: You can also directly replace a column by the result of the mutate operation,
......@@ -613,7 +619,8 @@ mutate(
```{r mutate_challenges_a4, include=TRUE, eval = F}
mutate(
flights_thin_toy,
dep_time = dep_time * 60 + dep_time)
dep_time = dep_time * 60 + dep_time
)
```
</p>
</details>
......@@ -624,10 +631,8 @@ mutate(
```{r mutate_challenges_b, eval=F, message=F, cache=T}
mutate(
flights,
dep_time = (dep_time %/% 100) * 60 +
dep_time %% 100,
sched_dep_time = (sched_dep_time %/% 100) * 60 +
sched_dep_time %% 100
dep_time = (dep_time %/% 100) * 60 + dep_time %% 100,
sched_dep_time = (sched_dep_time %/% 100) * 60 + sched_dep_time %% 100
)
```
......@@ -825,10 +830,13 @@ With `mutate()` and `ifelse()` [fonctions](https://dplyr.tidyverse.org/reference
<p>
```{r sig}
(tab.sig <- mutate(tab,
(
tab.sig <- mutate(
tab,
sig = baseMean > 20 & padj < 0.05 & abs(log2FoldChange) >= 1.5,
UpDown = ifelse(sig, ### we can use in the same mutate a column created by a previous line
ifelse(log2FoldChange > 0, "Up", "Down"), "NO")
ifelse(log2FoldChange > 0, "Up", "Down"), "NO"
)
)
)
```
......
......@@ -7,8 +7,9 @@ date: "2022"
```{r include=FALSE}
library(fontawesome)
if("conflicted" %in% .packages())
if ("conflicted" %in% .packages()) {
conflicted::conflicts_prefer(dplyr::filter)
}
```
```{r setup, include=FALSE}
......@@ -49,8 +50,10 @@ Find the 10 most delayed flights using the ranking function `min_rank()`.
<details><summary>Solution</summary>
<p>
```{r pipe_example_a, include=TRUE}
flights_md <- mutate(flights,
most_delay = min_rank(desc(dep_delay)))
flights_md <- mutate(
flights,
most_delay = min_rank(desc(dep_delay))
)
flights_md <- filter(flights_md, most_delay < 10)
flights_md <- arrange(flights_md, most_delay)
```
......@@ -164,7 +167,7 @@ summ_delay_filghts <- flights %>%
ggplot(summ_delay_filghts, mapping = aes(x = avg_distance, y = avg_delay, size = n_flights)) +
geom_point() +
geom_smooth(method = lm, se = FALSE) +
theme(legend.position='none')
theme(legend.position = "none")
```
<div class="pencadre">
......@@ -195,7 +198,7 @@ flights %>%
ggplot(mapping = aes(x = avg_distance, y = avg_delay, size = n_flights)) +
geom_point() +
geom_smooth(method = lm, se = FALSE) +
theme(legend.position='none')
theme(legend.position = "none")
```
</p>
</details>
......@@ -246,7 +249,7 @@ flights %>%
canceled = is.na(dep_time) | is.na(arr_time)
) %>%
filter(canceled) %>%
mutate(wday = strftime(time_hour,'%A')) %>%
mutate(wday = strftime(time_hour, "%A")) %>%
group_by(wday) %>%
summarise(
cancel_day = n()
......@@ -270,7 +273,7 @@ flights %>%
mutate(
canceled = is.na(dep_time) | is.na(arr_time)
) %>%
mutate(wday = strftime(time_hour,'%A')) %>%
mutate(wday = strftime(time_hour, "%A")) %>%
group_by(wday) %>%
summarise(
prop_cancel_day = sum(canceled) / n(),
......@@ -297,7 +300,7 @@ flights %>%
mutate(
canceled = is.na(dep_time) | is.na(arr_time)
) %>%
mutate(wday = strftime(time_hour,'%A')) %>%
mutate(wday = strftime(time_hour, "%A")) %>%
group_by(day) %>%
mutate(
prop_cancel_day = sum(canceled) / sum(!canceled),
......@@ -312,14 +315,18 @@ flights %>%
) %>%
ggplot(mapping = aes(x = mean_av_delay, y = mean_cancel_day, color = wday)) +
geom_point() +
geom_errorbarh(mapping = aes(
geom_errorbarh(
mapping = aes(
xmin = -sd_av_delay + mean_av_delay,
xmax = sd_av_delay + mean_av_delay
)) +
geom_errorbar(mapping = aes(
)
) +
geom_errorbar(
mapping = aes(
ymin = -sd_cancel_day + mean_cancel_day,
ymax = sd_cancel_day + mean_cancel_day
))
)
)
```
</p>
</details>
......@@ -338,14 +345,19 @@ flights %>%
sd_delay = sd(arr_delay, na.rm = T),
) %>%
ggplot() +
geom_errorbar(mapping = aes(
geom_errorbar(
mapping = aes(
x = hour,
ymax = mean_delay + sd_delay,
ymin = mean_delay - sd_delay)) +
geom_point(mapping = aes(
ymin = mean_delay - sd_delay
)
) +
geom_point(
mapping = aes(
x = hour,
y = mean_delay,
))
)
)
```
</p>
</details>
......
......@@ -72,10 +72,12 @@ knitr::include_graphics('img/pivot_longer.png')
```
```{r, eval = F}
wide_example <- tibble(X1 = c("A","B"),
wide_example <- tibble(
X1 = c("A", "B"),
X2 = c(1, 2),
X3 = c(0.1, 0.2),
X4 = c(10,20))
X4 = c(10, 20)
)
```
If you have a wide dataset, such as `wide_example`, that you want to make longer, you will use the `pivot_longer()` function.
......@@ -90,7 +92,8 @@ wide_example %>%
... or the reverse selection (-X1):
```{r, eval = F}
wide_example %>% pivot_longer(-X1)
wide_example %>%
pivot_longer(-X1)
```
You can specify the names of the columns where the data will be tidy (by default, it is `names` and `value`):
......@@ -132,7 +135,8 @@ For this we need to :
table4a %>%
pivot_longer(-country,
names_to = "year",
values_to = "case")
values_to = "case"
)
```
</p>
</details>
......@@ -148,8 +152,10 @@ If you have a long dataset, that you want to make wider, you will use the `pivot
You have to specify which column contains the name of the output column (`names_from`), and which column contains the cell values from (`values_from`).
```{r, eval = F}
long_example %>% pivot_wider(names_from = V1,
values_from = V2)
long_example %>% pivot_wider(
names_from = V1,
values_from = V2
)
```
......@@ -168,8 +174,10 @@ You can use the `pivot_wider` function to make your table wider and have one obs
```{r pivot_wider, eval=T, message=T}
table2 %>%
pivot_wider(names_from = type,
values_from = count)
pivot_wider(
names_from = type,
values_from = count
)
```
</p>
</details>
......
......@@ -7,8 +7,9 @@ date: "2022"
```{r include=FALSE}
library(fontawesome)
if("conflicted" %in% .packages())
if ("conflicted" %in% .packages()) {
conflicted::conflicts_prefer(dplyr::filter)
}
```
```{r setup, include=FALSE}
......
......@@ -7,8 +7,9 @@ date: "2022"
```{r include=FALSE}
library(fontawesome)
if("conflicted" %in% .packages())
if ("conflicted" %in% .packages()) {
conflicted::conflicts_prefer(dplyr::filter)
}
```
```{r setup, include=FALSE}
......@@ -79,7 +80,9 @@ y2
Sometimes you'd prefer that the order of the levels match the order of the first appearance in the data.
```{r inorder_month_factor, eval=T, cache=T}
f2 <- x1 %>% factor() %>% fct_inorder()
f2 <- x1 %>%
factor() %>%
fct_inorder()
f2
levels(f2)
```
......@@ -111,7 +114,8 @@ relig_summary <- gss_cat %>%
tvhours = mean(tvhours, na.rm = TRUE),
n = n()
)
ggplot(relig_summary, aes(x = tvhours, y = relig)) + geom_point()
ggplot(relig_summary, aes(x = tvhours, y = relig)) +
geom_point()
```
It is difficult to interpret this plot because there's no overall pattern. We can improve it by reordering the levels of the factor relig using `fct_reorder()`. `fct_reorder()` takes three arguments:
......@@ -174,6 +178,6 @@ For example [rmarkdown](https://rmarkdown.rstudio.com/) is a great way to turn y
- [a comprehensive guide](https://bookdown.org/yihui/rmarkdown/)
- [the cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/rmarkdown-2.0.pdf)
In addition most packages will provide **vignette**s on how to perform an analysis from scratch. On the [cran.r-project.org](https://cran.r-project.org/web/packages/ggplot2/index.html) or [bioconductor.org](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html) websites (specialised on R packages for biologists), you will have direct links to a package vignettes.
In addition most packages will provide **vignette**s on how to perform an analysis from scratch. On the [cran.r-project.org](https://cran.r-project.org/web/packages/ggplot2/index.html) or [bioconductor.org](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html) (specialised on R packages for biologists) websites, you will have direct links to a package vignettes.
Finally, don't forget to search the web for your problems or error in R, for instance [stackoverflow](https://stackoverflow.com/) contains high quality and well-curated answers.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment