diff --git a/session_4/slides.Rmd b/session_4/slides.Rmd index f8de0a4aa3a9dc2205b68815dae52d0d6ce80fad..3aec2bdfbe8fc8e5e577a2b19de4ce82a9f9012d 100644 --- a/session_4/slides.Rmd +++ b/session_4/slides.Rmd @@ -3,8 +3,6 @@ title: "R#4: data transformation" author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)" date: "08 Nov 2019" output: - slidy_presentation: - highlight: tango beamer_presentation: theme: metropolis slide_level: 3 @@ -12,6 +10,8 @@ output: df_print: tibble highlight: tango latex_engine: xelatex + slidy_presentation: + highlight: tango --- ```{r setup, include=FALSE, cache=TRUE} knitr::opts_chunk$set(echo = FALSE) @@ -277,7 +277,7 @@ filter(flights_md, most_delay < 10) ## Combining multiple operations with the pipe -We don't want to create useless intermediate variables so we can use the pipe opperator: `%>%` +We don't want to create useless intermediate variables so we can use the pipe operator: `%>%` (`ctrl + shift + M`). ```{r pipe_example_a, eval=F, message=F, cache=T} @@ -289,7 +289,7 @@ flights_md <- arrange(flights_md, most_delay) ## Combining multiple operations with the pipe -We don't want to create useless intermediate variables so we can use the pipe opperator: `%>%` +We don't want to create useless intermediate variables so we can use the pipe operator: `%>%` (`ctrl + shift + M`). ```{r pipe_example_b, eval=F, message=F, cache=T} @@ -303,7 +303,7 @@ flights %>% Behind the scenes, `x %>% f(y)` turns into `f(x, y)`, and `x %>% f(y) %>% g(z)` turns into `g(f(x, y), z)` and so on. You can use the pipe to rewrite multiple operations in a way that you can read left-to-right, top-to-bottom. -You can access the transmited variables with `.` +You can access the transmitted variables with `.` ```{r pipe_example_c, eval=F, message=F, cache=T} flights %>% diff --git a/web/slides_4.html b/web/slides_4.html index a255f1302fe33393805f078095fce1f548078d83..555c913bfacb114a4fd05d04ff0a305eaff5d9fd 100644 --- a/web/slides_4.html +++ b/web/slides_4.html @@ -10,6 +10,69 @@ <title>R#4: data transformation</title> <style type="text/css">code{white-space: pre;}</style> <style type="text/css"> +pre > code.sourceCode { white-space: pre; position: relative; } +pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } +pre > code.sourceCode > span:empty { height: 1.2em; } +code.sourceCode > span { color: inherit; text-decoration: inherit; } +div.sourceCode { margin: 1em 0; } +pre.sourceCode { margin: 0; } +@media screen { +div.sourceCode { overflow: auto; } +} +@media print { +pre > code.sourceCode { white-space: pre-wrap; } +pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +} +pre.numberSource code + { counter-reset: source-line 0; } +pre.numberSource code > span + { position: relative; left: -4em; counter-increment: source-line; } +pre.numberSource code > span > a:first-child::before + { content: counter(source-line); + position: relative; left: -1em; text-align: right; vertical-align: baseline; + border: none; display: inline-block; + -webkit-touch-callout: none; -webkit-user-select: none; + -khtml-user-select: none; -moz-user-select: none; + -ms-user-select: none; user-select: none; + padding: 0 4px; width: 4em; + color: #aaaaaa; + } +pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } +div.sourceCode + { background-color: #f8f8f8; } +@media screen { +pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } +} +code span.al { color: #ef2929; } /* Alert */ +code span.an { color: #8f5902; font-weight: bold; font-style: italic; } /* Annotation */ +code span.at { color: #c4a000; } /* Attribute */ +code span.bn { color: #0000cf; } /* BaseN */ +code span.cf { color: #204a87; font-weight: bold; } /* ControlFlow */ +code span.ch { color: #4e9a06; } /* Char */ +code span.cn { color: #000000; } /* Constant */ +code span.co { color: #8f5902; font-style: italic; } /* Comment */ +code span.cv { color: #8f5902; font-weight: bold; font-style: italic; } /* CommentVar */ +code span.do { color: #8f5902; font-weight: bold; font-style: italic; } /* Documentation */ +code span.dt { color: #204a87; } /* DataType */ +code span.dv { color: #0000cf; } /* DecVal */ +code span.er { color: #a40000; font-weight: bold; } /* Error */ +code span.ex { } /* Extension */ +code span.fl { color: #0000cf; } /* Float */ +code span.fu { color: #000000; } /* Function */ +code span.im { } /* Import */ +code span.in { color: #8f5902; font-weight: bold; font-style: italic; } /* Information */ +code span.kw { color: #204a87; font-weight: bold; } /* Keyword */ +code span.op { color: #ce5c00; font-weight: bold; } /* Operator */ +code span.ot { color: #8f5902; } /* Other */ +code span.pp { color: #8f5902; font-style: italic; } /* Preprocessor */ +code span.sc { color: #000000; } /* SpecialChar */ +code span.ss { color: #4e9a06; } /* SpecialString */ +code span.st { color: #4e9a06; } /* String */ +code span.va { color: #000000; } /* Variable */ +code span.vs { color: #4e9a06; } /* VerbatimString */ +code span.wa { color: #8f5902; font-weight: bold; font-style: italic; } /* Warning */ + </style> + <style type="text/css"> body { margin: 0 0 0 0; @@ -383,9 +446,12 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <div id="nycflights13" class="slide section level2"> <h1><strong>nycflights13</strong></h1> <p><code>nycflights13::flights</code>contains all 336,776 flights that departed from New York City in 2013. The data comes from the US Bureau of Transportation Statistics, and is documented in <code>?flights</code></p> +<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1"></a><span class="kw">library</span>(nycflights13)</span> +<span id="cb1-2"><a href="#cb1-2"></a><span class="kw">library</span>(tidyverse)</span></code></pre></div> </div> <div id="nycflights13-1" class="slide section level2"> <h1><strong>nycflights13</strong></h1> +<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1"></a>flights</span></code></pre></div> <ul> <li><strong>int</strong> stands for integers.</li> <li><strong>dbl</strong> stands for doubles, or real numbers.</li> @@ -399,6 +465,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <div id="filter-rows-with-filter" class="slide section level2"> <h1>Filter rows with <code>filter()</code></h1> <p><code>filter()</code> allows you to subset observations based on their values.</p> +<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1"></a><span class="kw">filter</span>(flights, month <span class="op">==</span><span class="st"> </span><span class="dv">1</span>, day <span class="op">==</span><span class="st"> </span><span class="dv">1</span>)</span></code></pre></div> <pre><code>## # A tibble: 842 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> @@ -420,7 +487,9 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <div id="filter-rows-with-filter-1" class="slide section level2"> <h1>Filter rows with <code>filter()</code></h1> <p><code>dplyr</code> functions never modify their inputs, so if you want to save the result, you’ll need to use the assignment operator, <code><-</code></p> +<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1"></a>jan1 <-<span class="st"> </span><span class="kw">filter</span>(flights, month <span class="op">==</span><span class="st"> </span><span class="dv">1</span>, day <span class="op">==</span><span class="st"> </span><span class="dv">1</span>)</span></code></pre></div> <p>R either prints out the results, or saves them to a variable.</p> +<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1"></a>(dec25 <-<span class="st"> </span><span class="kw">filter</span>(flights, month <span class="op">==</span><span class="st"> </span><span class="dv">12</span>, day <span class="op">==</span><span class="st"> </span><span class="dv">25</span>))</span></code></pre></div> <pre><code>## # A tibble: 719 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> @@ -447,6 +516,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <div id="logical-operators-1" class="slide section level2"> <h1>Logical operators</h1> <p>Test the following operations:</p> +<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1"></a><span class="kw">filter</span>(flights, month <span class="op">==</span><span class="st"> </span><span class="dv">11</span> <span class="op">|</span><span class="st"> </span>month <span class="op">==</span><span class="st"> </span><span class="dv">12</span>)</span></code></pre></div> <pre><code>## # A tibble: 55,403 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> @@ -464,6 +534,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly ## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, ## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, ## # minute <dbl>, time_hour <dttm></code></pre> +<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1"></a><span class="kw">filter</span>(flights, month <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="dv">11</span>, <span class="dv">12</span>))</span></code></pre></div> <pre><code>## # A tibble: 55,403 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> @@ -481,6 +552,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly ## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, ## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, ## # minute <dbl>, time_hour <dttm></code></pre> +<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1"></a><span class="kw">filter</span>(flights, <span class="op">!</span>(arr_delay <span class="op">></span><span class="st"> </span><span class="dv">120</span> <span class="op">|</span><span class="st"> </span>dep_delay <span class="op">></span><span class="st"> </span><span class="dv">120</span>))</span></code></pre></div> <pre><code>## # A tibble: 316,050 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> @@ -498,6 +570,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly ## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, ## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, ## # minute <dbl>, time_hour <dttm></code></pre> +<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1"></a><span class="kw">filter</span>(flights, arr_delay <span class="op"><=</span><span class="st"> </span><span class="dv">120</span>, dep_delay <span class="op"><=</span><span class="st"> </span><span class="dv">120</span>)</span></code></pre></div> <pre><code>## # A tibble: 316,050 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> @@ -519,14 +592,20 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <div id="missing-values" class="slide section level2"> <h1>Missing values</h1> <p>One important feature of R that can make comparison tricky are missing values, or <code>NA</code>s (“not availablesâ€).</p> +<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1"></a><span class="ot">NA</span> <span class="op">></span><span class="st"> </span><span class="dv">5</span></span></code></pre></div> <pre><code>## [1] NA</code></pre> +<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1"></a><span class="dv">10</span> <span class="op">==</span><span class="st"> </span><span class="ot">NA</span></span></code></pre></div> <pre><code>## [1] NA</code></pre> +<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1"></a><span class="ot">NA</span> <span class="op">+</span><span class="st"> </span><span class="dv">10</span></span></code></pre></div> <pre><code>## [1] NA</code></pre> +<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1"></a><span class="ot">NA</span> <span class="op">/</span><span class="st"> </span><span class="dv">2</span></span></code></pre></div> <pre><code>## [1] NA</code></pre> </div> <div id="missing-values-1" class="slide section level2"> <h1>Missing values</h1> +<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1"></a><span class="ot">NA</span> <span class="op">==</span><span class="st"> </span><span class="ot">NA</span></span></code></pre></div> <pre><code>## [1] NA</code></pre> +<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1"></a><span class="kw">is.na</span>(<span class="ot">NA</span>)</span></code></pre></div> <pre><code>## [1] TRUE</code></pre> </div> <div id="filter-challenges" class="slide section level2"> @@ -544,8 +623,12 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <div id="arrange-rows-with-arrange" class="slide section level2"> <h1>Arrange rows with <code>arrange()</code></h1> <p><code>arrange()</code> works similarly to <code>filter()</code> except that instead of selecting rows, it changes their order.</p> +<div class="sourceCode" id="cb28"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1"></a><span class="kw">arrange</span>(flights, year, month, day)</span></code></pre></div> <p>Use <code>desc()</code> to re-order by a column in descending order:</p> +<div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1"></a><span class="kw">arrange</span>(flights, <span class="kw">desc</span>(dep_delay))</span></code></pre></div> <p>Missing values are always sorted at the end:</p> +<div class="sourceCode" id="cb30"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb30-1"><a href="#cb30-1"></a><span class="kw">arrange</span>(<span class="kw">tibble</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">2</span>, <span class="ot">NA</span>)), x)</span> +<span id="cb30-2"><a href="#cb30-2"></a><span class="kw">arrange</span>(<span class="kw">tibble</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">2</span>, <span class="ot">NA</span>)), <span class="kw">desc</span>(x))</span></code></pre></div> </div> <div id="arrange-challenges" class="slide section level2"> <h1>Arrange challenges</h1> @@ -558,6 +641,9 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <div id="select-columns-with-select" class="slide section level2"> <h1>Select columns with <code>select()</code></h1> <p><code>select()</code> allows you to rapidly zoom in on a useful subset using operations based on the names of the variables.</p> +<div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1"></a><span class="kw">select</span>(flights, year, month, day)</span> +<span id="cb31-2"><a href="#cb31-2"></a><span class="kw">select</span>(flights, year<span class="op">:</span>day)</span> +<span id="cb31-3"><a href="#cb31-3"></a><span class="kw">select</span>(flights, <span class="op">-</span>(year<span class="op">:</span>day))</span></code></pre></div> </div> <div id="select-columns-with-select-1" class="slide section level2"> <h1>Select columns with <code>select()</code></h1> @@ -574,19 +660,38 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <div id="select-challenges" class="slide section level2"> <h1>Select challenges</h1> <ul> -<li><p>Brainstorm as many ways as possible to select <code>dep_time</code>, <code>dep_delay</code>, <code>arr_time</code>, and <code>arr_delay</code> from <code>flights</code>.</p></li> -<li><p>What does the <code>one_of()</code> function do? Why might it be helpful in conjunction with this vector?</p></li> -<li><p>Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?</p></li> +<li>Brainstorm as many ways as possible to select <code>dep_time</code>, <code>dep_delay</code>, <code>arr_time</code>, and <code>arr_delay</code> from <code>flights</code>.</li> +<li>What does the <code>one_of()</code> function do? Why might it be helpful in conjunction with this vector?</li> +</ul> +<div class="sourceCode" id="cb32"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb32-1"><a href="#cb32-1"></a>vars <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"year"</span>, <span class="st">"month"</span>, <span class="st">"day"</span>, <span class="st">"dep_delay"</span>, <span class="st">"arr_delay"</span>)</span></code></pre></div> +<ul> +<li>Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?</li> </ul> +<div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb33-1"><a href="#cb33-1"></a><span class="kw">select</span>(flights, <span class="kw">contains</span>(<span class="st">"TIME"</span>))</span></code></pre></div> </div> <div id="add-new-variables-with-mutate" class="slide section level2"> <h1>Add new variables with <code>mutate()</code></h1> <p>It’s often useful to add new columns that are functions of existing columns. That’s the job of <code>mutate()</code>.</p> +<div class="sourceCode" id="cb34"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb34-1"><a href="#cb34-1"></a>flights_sml <-<span class="st"> </span><span class="kw">select</span>(flights, </span> +<span id="cb34-2"><a href="#cb34-2"></a> year<span class="op">:</span>day, </span> +<span id="cb34-3"><a href="#cb34-3"></a> <span class="kw">ends_with</span>(<span class="st">"delay"</span>), </span> +<span id="cb34-4"><a href="#cb34-4"></a> distance, </span> +<span id="cb34-5"><a href="#cb34-5"></a> air_time</span> +<span id="cb34-6"><a href="#cb34-6"></a>)</span> +<span id="cb34-7"><a href="#cb34-7"></a><span class="kw">mutate</span>(flights_sml,</span> +<span id="cb34-8"><a href="#cb34-8"></a> <span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay,</span> +<span id="cb34-9"><a href="#cb34-9"></a> <span class="dt">speed =</span> distance <span class="op">/</span><span class="st"> </span>air_time <span class="op">*</span><span class="st"> </span><span class="dv">60</span></span> +<span id="cb34-10"><a href="#cb34-10"></a>)</span></code></pre></div> <p><strong>4_a</strong></p> </div> <div id="add-new-variables-with-mutate-1" class="slide section level2"> <h1>Add new variables with <code>mutate()</code></h1> <p>You can refer to columns that you’ve just created:</p> +<div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb35-1"><a href="#cb35-1"></a><span class="kw">mutate</span>(flights,</span> +<span id="cb35-2"><a href="#cb35-2"></a> <span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay,</span> +<span id="cb35-3"><a href="#cb35-3"></a> <span class="dt">hours =</span> air_time <span class="op">/</span><span class="st"> </span><span class="dv">60</span>,</span> +<span id="cb35-4"><a href="#cb35-4"></a> <span class="dt">gain_per_hour =</span> gain <span class="op">/</span><span class="st"> </span>hours</span> +<span id="cb35-5"><a href="#cb35-5"></a>)</span></code></pre></div> </div> <div id="useful-creation-functions" class="slide section level2"> <h1>Useful creation functions</h1> @@ -603,6 +708,13 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <li>Currently <code>dep_time</code> and <code>sched_dep_time</code> are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight.</li> </ul> +<div class="sourceCode" id="cb36"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb36-1"><a href="#cb36-1"></a><span class="kw">mutate</span>(</span> +<span id="cb36-2"><a href="#cb36-2"></a> flights,</span> +<span id="cb36-3"><a href="#cb36-3"></a> <span class="dt">dep_time =</span> (dep_time <span class="op">%/%</span><span class="st"> </span><span class="dv">100</span>) <span class="op">*</span><span class="st"> </span><span class="dv">60</span> <span class="op">+</span></span> +<span id="cb36-4"><a href="#cb36-4"></a><span class="st"> </span>dep_time <span class="op">%%</span><span class="st"> </span><span class="dv">100</span>,</span> +<span id="cb36-5"><a href="#cb36-5"></a> <span class="dt">sched_dep_time =</span> (sched_dep_time <span class="op">%/%</span><span class="st"> </span><span class="dv">100</span>) <span class="op">*</span><span class="st"> </span><span class="dv">60</span> <span class="op">+</span></span> +<span id="cb36-6"><a href="#cb36-6"></a><span class="st"> </span>sched_dep_time <span class="op">%%</span><span class="st"> </span><span class="dv">100</span></span> +<span id="cb36-7"><a href="#cb36-7"></a>)</span></code></pre></div> <p><strong>4_b</strong></p> </div> <div id="mutate-challenges-1" class="slide section level2"> @@ -611,6 +723,13 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <li>Compare <code>dep_time</code>, <code>sched_dep_time</code>, and <code>dep_delay</code>. How would you expect those three numbers to be related?</li> </ul> +<div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1"></a><span class="kw">mutate</span>(</span> +<span id="cb37-2"><a href="#cb37-2"></a> flights,</span> +<span id="cb37-3"><a href="#cb37-3"></a> <span class="dt">dep_time =</span> (dep_time <span class="op">%/%</span><span class="st"> </span><span class="dv">100</span>) <span class="op">*</span><span class="st"> </span><span class="dv">60</span> <span class="op">+</span><span class="st"> </span></span> +<span id="cb37-4"><a href="#cb37-4"></a><span class="st"> </span>dep_time <span class="op">%%</span><span class="st"> </span><span class="dv">100</span>,</span> +<span id="cb37-5"><a href="#cb37-5"></a> <span class="dt">sched_dep_time =</span> (sched_dep_time <span class="op">%/%</span><span class="st"> </span><span class="dv">100</span>) <span class="op">*</span><span class="st"> </span><span class="dv">60</span> <span class="op">+</span></span> +<span id="cb37-6"><a href="#cb37-6"></a><span class="st"> </span>sched_dep_time <span class="op">%%</span><span class="st"> </span><span class="dv">100</span></span> +<span id="cb37-7"><a href="#cb37-7"></a>)</span></code></pre></div> <p><strong>4_c</strong></p> </div> <div id="mutate-challenges-2" class="slide section level2"> @@ -619,20 +738,34 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly <li>Find the 10 most delayed flights using a ranking function. How do you want to handle ties? Carefully read the documentation for <code>min_rank()</code></li> </ul> +<div class="sourceCode" id="cb38"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb38-1"><a href="#cb38-1"></a>flights_md <-<span class="st"> </span><span class="kw">mutate</span>(flights, <span class="dt">most_delay =</span> <span class="kw">min_rank</span>(<span class="kw">desc</span>(dep_delay)))</span> +<span id="cb38-2"><a href="#cb38-2"></a><span class="kw">filter</span>(flights_md, most_delay <span class="op"><</span><span class="st"> </span><span class="dv">10</span>)</span></code></pre></div> <p><strong>4_d</strong></p> </div> <div id="combining-multiple-operations-with-the-pipe" class="slide section level2"> <h1>Combining multiple operations with the pipe</h1> -<p>We don’t want to create useless intermediate variables so we can use the pipe opperator: <code>%>%</code> (<code>ctrl + shift + M</code>).</p> +<p>We don’t want to create useless intermediate variables so we can use the pipe operator: <code>%>%</code> (<code>ctrl + shift + M</code>).</p> +<div class="sourceCode" id="cb39"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb39-1"><a href="#cb39-1"></a>flights_md <-<span class="st"> </span><span class="kw">mutate</span>(flights,</span> +<span id="cb39-2"><a href="#cb39-2"></a> <span class="dt">most_delay =</span> <span class="kw">min_rank</span>(<span class="kw">desc</span>(dep_delay)))</span> +<span id="cb39-3"><a href="#cb39-3"></a>flights_md <-<span class="st"> </span><span class="kw">filter</span>(flights_md, most_delay <span class="op"><</span><span class="st"> </span><span class="dv">10</span>)</span> +<span id="cb39-4"><a href="#cb39-4"></a>flights_md <-<span class="st"> </span><span class="kw">arrange</span>(flights_md, most_delay)</span></code></pre></div> </div> <div id="combining-multiple-operations-with-the-pipe-1" class="slide section level2"> <h1>Combining multiple operations with the pipe</h1> -<p>We don’t want to create useless intermediate variables so we can use the pipe opperator: <code>%>%</code> (<code>ctrl + shift + M</code>).</p> +<p>We don’t want to create useless intermediate variables so we can use the pipe operator: <code>%>%</code> (<code>ctrl + shift + M</code>).</p> +<div class="sourceCode" id="cb40"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb40-1"><a href="#cb40-1"></a>flights <span class="op">%>%</span></span> +<span id="cb40-2"><a href="#cb40-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">most_delay =</span> <span class="kw">min_rank</span>(<span class="kw">desc</span>(dep_delay))) <span class="op">%>%</span><span class="st"> </span></span> +<span id="cb40-3"><a href="#cb40-3"></a><span class="st"> </span><span class="kw">filter</span>(most_delay <span class="op"><</span><span class="st"> </span><span class="dv">10</span>) <span class="op">%>%</span><span class="st"> </span></span> +<span id="cb40-4"><a href="#cb40-4"></a><span class="st"> </span><span class="kw">arrange</span>(most_delay)</span></code></pre></div> </div> <div id="combining-multiple-operations-with-the-pipe-2" class="slide section level2"> <h1>Combining multiple operations with the pipe</h1> <p>Behind the scenes, <code>x %>% f(y)</code> turns into <code>f(x, y)</code>, and <code>x %>% f(y) %>% g(z)</code> turns into <code>g(f(x, y), z)</code> and so on. You can use the pipe to rewrite multiple operations in a way that you can read left-to-right, top-to-bottom.</p> -<p>You can access the transmited variables with <code>.</code></p> +<p>You can access the transmitted variables with <code>.</code></p> +<div class="sourceCode" id="cb41"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb41-1"><a href="#cb41-1"></a>flights <span class="op">%>%</span></span> +<span id="cb41-2"><a href="#cb41-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">most_delay =</span> <span class="kw">min_rank</span>(<span class="kw">desc</span>(dep_delay))) <span class="op">%>%</span><span class="st"> </span></span> +<span id="cb41-3"><a href="#cb41-3"></a><span class="st"> </span><span class="kw">filter</span>(., most_delay <span class="op"><</span><span class="st"> </span><span class="dv">10</span>) <span class="op">%>%</span><span class="st"> </span></span> +<span id="cb41-4"><a href="#cb41-4"></a><span class="st"> </span><span class="kw">arrange</span>(., most_delay)</span></code></pre></div> <p>Working with the pipe is one of the key criteria for belonging to the <code>tidyverse</code>. The only exception is <code>ggplot2</code>: it was written before the pipe was discovered. Unfortunately, the next iteration of <code>ggplot2</code>, <code>ggvis</code>, which does use the pipe, isn’t quite ready for prime time yet.</p> </div>