Skip to content
Snippets Groups Projects
Unverified Commit 292540a1 authored by Laurent Modolo's avatar Laurent Modolo
Browse files

typo fix in session4

parent f4879e7b
No related branches found
No related tags found
No related merge requests found
......@@ -3,8 +3,6 @@ title: "R#4: data transformation"
author: "Laurent Modolo [laurent.modolo@ens-lyon.fr](mailto:laurent.modolo@ens-lyon.fr)"
date: "08 Nov 2019"
output:
slidy_presentation:
highlight: tango
beamer_presentation:
theme: metropolis
slide_level: 3
......@@ -12,6 +10,8 @@ output:
df_print: tibble
highlight: tango
latex_engine: xelatex
slidy_presentation:
highlight: tango
---
```{r setup, include=FALSE, cache=TRUE}
knitr::opts_chunk$set(echo = FALSE)
......@@ -277,7 +277,7 @@ filter(flights_md, most_delay < 10)
## Combining multiple operations with the pipe
We don't want to create useless intermediate variables so we can use the pipe opperator: `%>%`
We don't want to create useless intermediate variables so we can use the pipe operator: `%>%`
(`ctrl + shift + M`).
```{r pipe_example_a, eval=F, message=F, cache=T}
......@@ -289,7 +289,7 @@ flights_md <- arrange(flights_md, most_delay)
## Combining multiple operations with the pipe
We don't want to create useless intermediate variables so we can use the pipe opperator: `%>%`
We don't want to create useless intermediate variables so we can use the pipe operator: `%>%`
(`ctrl + shift + M`).
```{r pipe_example_b, eval=F, message=F, cache=T}
......@@ -303,7 +303,7 @@ flights %>%
Behind the scenes, `x %>% f(y)` turns into `f(x, y)`, and `x %>% f(y) %>% g(z)` turns into `g(f(x, y), z)` and so on. You can use the pipe to rewrite multiple operations in a way that you can read left-to-right, top-to-bottom.
You can access the transmited variables with `.`
You can access the transmitted variables with `.`
```{r pipe_example_c, eval=F, message=F, cache=T}
flights %>%
......
......@@ -10,6 +10,69 @@
<title>R#4: data transformation</title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ background-color: #f8f8f8; }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ef2929; } /* Alert */
code span.an { color: #8f5902; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #c4a000; } /* Attribute */
code span.bn { color: #0000cf; } /* BaseN */
code span.cf { color: #204a87; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4e9a06; } /* Char */
code span.cn { color: #000000; } /* Constant */
code span.co { color: #8f5902; font-style: italic; } /* Comment */
code span.cv { color: #8f5902; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #8f5902; font-weight: bold; font-style: italic; } /* Documentation */
code span.dt { color: #204a87; } /* DataType */
code span.dv { color: #0000cf; } /* DecVal */
code span.er { color: #a40000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #0000cf; } /* Float */
code span.fu { color: #000000; } /* Function */
code span.im { } /* Import */
code span.in { color: #8f5902; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #204a87; font-weight: bold; } /* Keyword */
code span.op { color: #ce5c00; font-weight: bold; } /* Operator */
code span.ot { color: #8f5902; } /* Other */
code span.pp { color: #8f5902; font-style: italic; } /* Preprocessor */
code span.sc { color: #000000; } /* SpecialChar */
code span.ss { color: #4e9a06; } /* SpecialString */
code span.st { color: #4e9a06; } /* String */
code span.va { color: #000000; } /* Variable */
code span.vs { color: #4e9a06; } /* VerbatimString */
code span.wa { color: #8f5902; font-weight: bold; font-style: italic; } /* Warning */
</style>
<style type="text/css">
body
{
margin: 0 0 0 0;
......@@ -383,9 +446,12 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<div id="nycflights13" class="slide section level2">
<h1><strong>nycflights13</strong></h1>
<p><code>nycflights13::flights</code>contains all 336,776 flights that departed from New York City in 2013. The data comes from the US Bureau of Transportation Statistics, and is documented in <code>?flights</code></p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1"></a><span class="kw">library</span>(nycflights13)</span>
<span id="cb1-2"><a href="#cb1-2"></a><span class="kw">library</span>(tidyverse)</span></code></pre></div>
</div>
<div id="nycflights13-1" class="slide section level2">
<h1><strong>nycflights13</strong></h1>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1"></a>flights</span></code></pre></div>
<ul>
<li><strong>int</strong> stands for integers.</li>
<li><strong>dbl</strong> stands for doubles, or real numbers.</li>
......@@ -399,6 +465,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<div id="filter-rows-with-filter" class="slide section level2">
<h1>Filter rows with <code>filter()</code></h1>
<p><code>filter()</code> allows you to subset observations based on their values.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1"></a><span class="kw">filter</span>(flights, month <span class="op">==</span><span class="st"> </span><span class="dv">1</span>, day <span class="op">==</span><span class="st"> </span><span class="dv">1</span>)</span></code></pre></div>
<pre><code>## # A tibble: 842 x 19
## year month day dep_time sched_dep_time dep_delay arr_time
## &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt;
......@@ -420,7 +487,9 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<div id="filter-rows-with-filter-1" class="slide section level2">
<h1>Filter rows with <code>filter()</code></h1>
<p><code>dplyr</code> functions never modify their inputs, so if you want to save the result, you’ll need to use the assignment operator, <code>&lt;-</code></p>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1"></a>jan1 &lt;-<span class="st"> </span><span class="kw">filter</span>(flights, month <span class="op">==</span><span class="st"> </span><span class="dv">1</span>, day <span class="op">==</span><span class="st"> </span><span class="dv">1</span>)</span></code></pre></div>
<p>R either prints out the results, or saves them to a variable.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1"></a>(dec25 &lt;-<span class="st"> </span><span class="kw">filter</span>(flights, month <span class="op">==</span><span class="st"> </span><span class="dv">12</span>, day <span class="op">==</span><span class="st"> </span><span class="dv">25</span>))</span></code></pre></div>
<pre><code>## # A tibble: 719 x 19
## year month day dep_time sched_dep_time dep_delay arr_time
## &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt;
......@@ -447,6 +516,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<div id="logical-operators-1" class="slide section level2">
<h1>Logical operators</h1>
<p>Test the following operations:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1"></a><span class="kw">filter</span>(flights, month <span class="op">==</span><span class="st"> </span><span class="dv">11</span> <span class="op">|</span><span class="st"> </span>month <span class="op">==</span><span class="st"> </span><span class="dv">12</span>)</span></code></pre></div>
<pre><code>## # A tibble: 55,403 x 19
## year month day dep_time sched_dep_time dep_delay arr_time
## &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt;
......@@ -464,6 +534,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
## # arr_delay &lt;dbl&gt;, carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;,
## # origin &lt;chr&gt;, dest &lt;chr&gt;, air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;,
## # minute &lt;dbl&gt;, time_hour &lt;dttm&gt;</code></pre>
<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1"></a><span class="kw">filter</span>(flights, month <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="dv">11</span>, <span class="dv">12</span>))</span></code></pre></div>
<pre><code>## # A tibble: 55,403 x 19
## year month day dep_time sched_dep_time dep_delay arr_time
## &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt;
......@@ -481,6 +552,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
## # arr_delay &lt;dbl&gt;, carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;,
## # origin &lt;chr&gt;, dest &lt;chr&gt;, air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;,
## # minute &lt;dbl&gt;, time_hour &lt;dttm&gt;</code></pre>
<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1"></a><span class="kw">filter</span>(flights, <span class="op">!</span>(arr_delay <span class="op">&gt;</span><span class="st"> </span><span class="dv">120</span> <span class="op">|</span><span class="st"> </span>dep_delay <span class="op">&gt;</span><span class="st"> </span><span class="dv">120</span>))</span></code></pre></div>
<pre><code>## # A tibble: 316,050 x 19
## year month day dep_time sched_dep_time dep_delay arr_time
## &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt;
......@@ -498,6 +570,7 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
## # arr_delay &lt;dbl&gt;, carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;,
## # origin &lt;chr&gt;, dest &lt;chr&gt;, air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;,
## # minute &lt;dbl&gt;, time_hour &lt;dttm&gt;</code></pre>
<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1"></a><span class="kw">filter</span>(flights, arr_delay <span class="op">&lt;=</span><span class="st"> </span><span class="dv">120</span>, dep_delay <span class="op">&lt;=</span><span class="st"> </span><span class="dv">120</span>)</span></code></pre></div>
<pre><code>## # A tibble: 316,050 x 19
## year month day dep_time sched_dep_time dep_delay arr_time
## &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt;
......@@ -519,14 +592,20 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<div id="missing-values" class="slide section level2">
<h1>Missing values</h1>
<p>One important feature of R that can make comparison tricky are missing values, or <code>NA</code>s (“not availables”).</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1"></a><span class="ot">NA</span> <span class="op">&gt;</span><span class="st"> </span><span class="dv">5</span></span></code></pre></div>
<pre><code>## [1] NA</code></pre>
<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1"></a><span class="dv">10</span> <span class="op">==</span><span class="st"> </span><span class="ot">NA</span></span></code></pre></div>
<pre><code>## [1] NA</code></pre>
<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1"></a><span class="ot">NA</span> <span class="op">+</span><span class="st"> </span><span class="dv">10</span></span></code></pre></div>
<pre><code>## [1] NA</code></pre>
<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1"></a><span class="ot">NA</span> <span class="op">/</span><span class="st"> </span><span class="dv">2</span></span></code></pre></div>
<pre><code>## [1] NA</code></pre>
</div>
<div id="missing-values-1" class="slide section level2">
<h1>Missing values</h1>
<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1"></a><span class="ot">NA</span> <span class="op">==</span><span class="st"> </span><span class="ot">NA</span></span></code></pre></div>
<pre><code>## [1] NA</code></pre>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1"></a><span class="kw">is.na</span>(<span class="ot">NA</span>)</span></code></pre></div>
<pre><code>## [1] TRUE</code></pre>
</div>
<div id="filter-challenges" class="slide section level2">
......@@ -544,8 +623,12 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<div id="arrange-rows-with-arrange" class="slide section level2">
<h1>Arrange rows with <code>arrange()</code></h1>
<p><code>arrange()</code> works similarly to <code>filter()</code> except that instead of selecting rows, it changes their order.</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1"></a><span class="kw">arrange</span>(flights, year, month, day)</span></code></pre></div>
<p>Use <code>desc()</code> to re-order by a column in descending order:</p>
<div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1"></a><span class="kw">arrange</span>(flights, <span class="kw">desc</span>(dep_delay))</span></code></pre></div>
<p>Missing values are always sorted at the end:</p>
<div class="sourceCode" id="cb30"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb30-1"><a href="#cb30-1"></a><span class="kw">arrange</span>(<span class="kw">tibble</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">2</span>, <span class="ot">NA</span>)), x)</span>
<span id="cb30-2"><a href="#cb30-2"></a><span class="kw">arrange</span>(<span class="kw">tibble</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">2</span>, <span class="ot">NA</span>)), <span class="kw">desc</span>(x))</span></code></pre></div>
</div>
<div id="arrange-challenges" class="slide section level2">
<h1>Arrange challenges</h1>
......@@ -558,6 +641,9 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<div id="select-columns-with-select" class="slide section level2">
<h1>Select columns with <code>select()</code></h1>
<p><code>select()</code> allows you to rapidly zoom in on a useful subset using operations based on the names of the variables.</p>
<div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1"></a><span class="kw">select</span>(flights, year, month, day)</span>
<span id="cb31-2"><a href="#cb31-2"></a><span class="kw">select</span>(flights, year<span class="op">:</span>day)</span>
<span id="cb31-3"><a href="#cb31-3"></a><span class="kw">select</span>(flights, <span class="op">-</span>(year<span class="op">:</span>day))</span></code></pre></div>
</div>
<div id="select-columns-with-select-1" class="slide section level2">
<h1>Select columns with <code>select()</code></h1>
......@@ -574,19 +660,38 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<div id="select-challenges" class="slide section level2">
<h1>Select challenges</h1>
<ul>
<li><p>Brainstorm as many ways as possible to select <code>dep_time</code>, <code>dep_delay</code>, <code>arr_time</code>, and <code>arr_delay</code> from <code>flights</code>.</p></li>
<li><p>What does the <code>one_of()</code> function do? Why might it be helpful in conjunction with this vector?</p></li>
<li><p>Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?</p></li>
<li>Brainstorm as many ways as possible to select <code>dep_time</code>, <code>dep_delay</code>, <code>arr_time</code>, and <code>arr_delay</code> from <code>flights</code>.</li>
<li>What does the <code>one_of()</code> function do? Why might it be helpful in conjunction with this vector?</li>
</ul>
<div class="sourceCode" id="cb32"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb32-1"><a href="#cb32-1"></a>vars &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;year&quot;</span>, <span class="st">&quot;month&quot;</span>, <span class="st">&quot;day&quot;</span>, <span class="st">&quot;dep_delay&quot;</span>, <span class="st">&quot;arr_delay&quot;</span>)</span></code></pre></div>
<ul>
<li>Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?</li>
</ul>
<div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb33-1"><a href="#cb33-1"></a><span class="kw">select</span>(flights, <span class="kw">contains</span>(<span class="st">&quot;TIME&quot;</span>))</span></code></pre></div>
</div>
<div id="add-new-variables-with-mutate" class="slide section level2">
<h1>Add new variables with <code>mutate()</code></h1>
<p>It’s often useful to add new columns that are functions of existing columns. That’s the job of <code>mutate()</code>.</p>
<div class="sourceCode" id="cb34"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb34-1"><a href="#cb34-1"></a>flights_sml &lt;-<span class="st"> </span><span class="kw">select</span>(flights, </span>
<span id="cb34-2"><a href="#cb34-2"></a> year<span class="op">:</span>day, </span>
<span id="cb34-3"><a href="#cb34-3"></a> <span class="kw">ends_with</span>(<span class="st">&quot;delay&quot;</span>), </span>
<span id="cb34-4"><a href="#cb34-4"></a> distance, </span>
<span id="cb34-5"><a href="#cb34-5"></a> air_time</span>
<span id="cb34-6"><a href="#cb34-6"></a>)</span>
<span id="cb34-7"><a href="#cb34-7"></a><span class="kw">mutate</span>(flights_sml,</span>
<span id="cb34-8"><a href="#cb34-8"></a> <span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay,</span>
<span id="cb34-9"><a href="#cb34-9"></a> <span class="dt">speed =</span> distance <span class="op">/</span><span class="st"> </span>air_time <span class="op">*</span><span class="st"> </span><span class="dv">60</span></span>
<span id="cb34-10"><a href="#cb34-10"></a>)</span></code></pre></div>
<p><strong>4_a</strong></p>
</div>
<div id="add-new-variables-with-mutate-1" class="slide section level2">
<h1>Add new variables with <code>mutate()</code></h1>
<p>You can refer to columns that you’ve just created:</p>
<div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb35-1"><a href="#cb35-1"></a><span class="kw">mutate</span>(flights,</span>
<span id="cb35-2"><a href="#cb35-2"></a> <span class="dt">gain =</span> dep_delay <span class="op">-</span><span class="st"> </span>arr_delay,</span>
<span id="cb35-3"><a href="#cb35-3"></a> <span class="dt">hours =</span> air_time <span class="op">/</span><span class="st"> </span><span class="dv">60</span>,</span>
<span id="cb35-4"><a href="#cb35-4"></a> <span class="dt">gain_per_hour =</span> gain <span class="op">/</span><span class="st"> </span>hours</span>
<span id="cb35-5"><a href="#cb35-5"></a>)</span></code></pre></div>
</div>
<div id="useful-creation-functions" class="slide section level2">
<h1>Useful creation functions</h1>
......@@ -603,6 +708,13 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<li>Currently <code>dep_time</code> and <code>sched_dep_time</code> are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight.</li>
</ul>
<div class="sourceCode" id="cb36"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb36-1"><a href="#cb36-1"></a><span class="kw">mutate</span>(</span>
<span id="cb36-2"><a href="#cb36-2"></a> flights,</span>
<span id="cb36-3"><a href="#cb36-3"></a> <span class="dt">dep_time =</span> (dep_time <span class="op">%/%</span><span class="st"> </span><span class="dv">100</span>) <span class="op">*</span><span class="st"> </span><span class="dv">60</span> <span class="op">+</span></span>
<span id="cb36-4"><a href="#cb36-4"></a><span class="st"> </span>dep_time <span class="op">%%</span><span class="st"> </span><span class="dv">100</span>,</span>
<span id="cb36-5"><a href="#cb36-5"></a> <span class="dt">sched_dep_time =</span> (sched_dep_time <span class="op">%/%</span><span class="st"> </span><span class="dv">100</span>) <span class="op">*</span><span class="st"> </span><span class="dv">60</span> <span class="op">+</span></span>
<span id="cb36-6"><a href="#cb36-6"></a><span class="st"> </span>sched_dep_time <span class="op">%%</span><span class="st"> </span><span class="dv">100</span></span>
<span id="cb36-7"><a href="#cb36-7"></a>)</span></code></pre></div>
<p><strong>4_b</strong></p>
</div>
<div id="mutate-challenges-1" class="slide section level2">
......@@ -611,6 +723,13 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<li>Compare <code>dep_time</code>, <code>sched_dep_time</code>, and <code>dep_delay</code>. How would you expect those three numbers to be related?</li>
</ul>
<div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1"></a><span class="kw">mutate</span>(</span>
<span id="cb37-2"><a href="#cb37-2"></a> flights,</span>
<span id="cb37-3"><a href="#cb37-3"></a> <span class="dt">dep_time =</span> (dep_time <span class="op">%/%</span><span class="st"> </span><span class="dv">100</span>) <span class="op">*</span><span class="st"> </span><span class="dv">60</span> <span class="op">+</span><span class="st"> </span></span>
<span id="cb37-4"><a href="#cb37-4"></a><span class="st"> </span>dep_time <span class="op">%%</span><span class="st"> </span><span class="dv">100</span>,</span>
<span id="cb37-5"><a href="#cb37-5"></a> <span class="dt">sched_dep_time =</span> (sched_dep_time <span class="op">%/%</span><span class="st"> </span><span class="dv">100</span>) <span class="op">*</span><span class="st"> </span><span class="dv">60</span> <span class="op">+</span></span>
<span id="cb37-6"><a href="#cb37-6"></a><span class="st"> </span>sched_dep_time <span class="op">%%</span><span class="st"> </span><span class="dv">100</span></span>
<span id="cb37-7"><a href="#cb37-7"></a>)</span></code></pre></div>
<p><strong>4_c</strong></p>
</div>
<div id="mutate-challenges-2" class="slide section level2">
......@@ -619,20 +738,34 @@ Laurent Modolo <a href="mailto:laurent.modolo@ens-lyon.fr">laurent.modolo@ens-ly
<li>Find the 10 most delayed flights using a ranking function. How do you want to handle ties? Carefully read the documentation for <code>min_rank()</code></li>
</ul>
<div class="sourceCode" id="cb38"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb38-1"><a href="#cb38-1"></a>flights_md &lt;-<span class="st"> </span><span class="kw">mutate</span>(flights, <span class="dt">most_delay =</span> <span class="kw">min_rank</span>(<span class="kw">desc</span>(dep_delay)))</span>
<span id="cb38-2"><a href="#cb38-2"></a><span class="kw">filter</span>(flights_md, most_delay <span class="op">&lt;</span><span class="st"> </span><span class="dv">10</span>)</span></code></pre></div>
<p><strong>4_d</strong></p>
</div>
<div id="combining-multiple-operations-with-the-pipe" class="slide section level2">
<h1>Combining multiple operations with the pipe</h1>
<p>We don’t want to create useless intermediate variables so we can use the pipe opperator: <code>%&gt;%</code> (<code>ctrl + shift + M</code>).</p>
<p>We don’t want to create useless intermediate variables so we can use the pipe operator: <code>%&gt;%</code> (<code>ctrl + shift + M</code>).</p>
<div class="sourceCode" id="cb39"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb39-1"><a href="#cb39-1"></a>flights_md &lt;-<span class="st"> </span><span class="kw">mutate</span>(flights,</span>
<span id="cb39-2"><a href="#cb39-2"></a> <span class="dt">most_delay =</span> <span class="kw">min_rank</span>(<span class="kw">desc</span>(dep_delay)))</span>
<span id="cb39-3"><a href="#cb39-3"></a>flights_md &lt;-<span class="st"> </span><span class="kw">filter</span>(flights_md, most_delay <span class="op">&lt;</span><span class="st"> </span><span class="dv">10</span>)</span>
<span id="cb39-4"><a href="#cb39-4"></a>flights_md &lt;-<span class="st"> </span><span class="kw">arrange</span>(flights_md, most_delay)</span></code></pre></div>
</div>
<div id="combining-multiple-operations-with-the-pipe-1" class="slide section level2">
<h1>Combining multiple operations with the pipe</h1>
<p>We don’t want to create useless intermediate variables so we can use the pipe opperator: <code>%&gt;%</code> (<code>ctrl + shift + M</code>).</p>
<p>We don’t want to create useless intermediate variables so we can use the pipe operator: <code>%&gt;%</code> (<code>ctrl + shift + M</code>).</p>
<div class="sourceCode" id="cb40"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb40-1"><a href="#cb40-1"></a>flights <span class="op">%&gt;%</span></span>
<span id="cb40-2"><a href="#cb40-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">most_delay =</span> <span class="kw">min_rank</span>(<span class="kw">desc</span>(dep_delay))) <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb40-3"><a href="#cb40-3"></a><span class="st"> </span><span class="kw">filter</span>(most_delay <span class="op">&lt;</span><span class="st"> </span><span class="dv">10</span>) <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb40-4"><a href="#cb40-4"></a><span class="st"> </span><span class="kw">arrange</span>(most_delay)</span></code></pre></div>
</div>
<div id="combining-multiple-operations-with-the-pipe-2" class="slide section level2">
<h1>Combining multiple operations with the pipe</h1>
<p>Behind the scenes, <code>x %&gt;% f(y)</code> turns into <code>f(x, y)</code>, and <code>x %&gt;% f(y) %&gt;% g(z)</code> turns into <code>g(f(x, y), z)</code> and so on. You can use the pipe to rewrite multiple operations in a way that you can read left-to-right, top-to-bottom.</p>
<p>You can access the transmited variables with <code>.</code></p>
<p>You can access the transmitted variables with <code>.</code></p>
<div class="sourceCode" id="cb41"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb41-1"><a href="#cb41-1"></a>flights <span class="op">%&gt;%</span></span>
<span id="cb41-2"><a href="#cb41-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">most_delay =</span> <span class="kw">min_rank</span>(<span class="kw">desc</span>(dep_delay))) <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb41-3"><a href="#cb41-3"></a><span class="st"> </span><span class="kw">filter</span>(., most_delay <span class="op">&lt;</span><span class="st"> </span><span class="dv">10</span>) <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb41-4"><a href="#cb41-4"></a><span class="st"> </span><span class="kw">arrange</span>(., most_delay)</span></code></pre></div>
<p>Working with the pipe is one of the key criteria for belonging to the <code>tidyverse</code>. The only exception is <code>ggplot2</code>: it was written before the pipe was discovered. Unfortunately, the next iteration of <code>ggplot2</code>, <code>ggvis</code>, which does use the pipe, isn’t quite ready for prime time yet.</p>
</div>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment