Skip to content
Snippets Groups Projects
7_streams_and_pipes.Rmd 4.63 KiB
Newer Older
Laurent Modolo's avatar
Laurent Modolo committed
---
title: Unix Streams and pipes
author: "Laurent Modolo"
---
```{r include = FALSE}
if (!require("fontawesome")) {
  install.packages("fontawesome")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
```
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Laurent Modolo's avatar
Laurent Modolo committed

Objective: Understand function of streams and pipes in Unix systems

When you read a file you start at the top from left to right, you read a flux of information which stops at the end of the file.

Unix streams are much the same things instead of opening a file as a whole bunch of data, process can process it as a flux. There are 3 standard Unix streams:

0. **stdin** the **st**an**d**ard **in**put
1. **stdout** the  **st**an**d**ard **out**put
2. **sterr** the  **st**an**d**ard **err**or

Historically, **stdin** has been the card reader or the keyboard, while the two others where the card puncher or the display. 

The command `cat `simply read from **stdin** and displays the results on **stdout**

```sh
cat
I can talk with
myself
```

It can also read files and display the results on **stdout**

```sh
cat .bashrc
```



Ghislain Durif's avatar
Ghislain Durif committed
## Streams manipulation
You can use the `>` character to redirect a flux toward a file. The following command makes a copy of your `.bashrc` files.
Laurent Modolo's avatar
Laurent Modolo committed

```sh
cat .bashrc > my_bashrc
```

Check the results of your command with `less`.

Following the same principle create a `my_cal` file containing the **cal**endar of this month. Check the results with the command `less`

Reuse the same command with the unnamed option `1999`. Check the results with the command `less`. What happened ?

Try the following command

```sh
cal -N 2 > my_cal
```

What is the content of `my_cal` what happened ?

The `>` command can have an argument, the syntax to redirect **stdout** to a file is `1>` it's also the default option (equivalent to `>`). Here the `-N` option doesn't exist, `cal` throws an error. Errors are sent to **stderr** which have the number 2.
Laurent Modolo's avatar
Laurent Modolo committed

Save the error message in `my_cal` and check the results with `less`.

We have seen that `>` overwrite the content of the file. Try the following commands:
Laurent Modolo's avatar
Laurent Modolo committed

```sh
cal 2020 > my_cal
cal >> my_cal
cal -N 2 2>> my_cal
```

Check the results with the command `less`.

The command `>` sends the stream from the left to the file on the right. Try the following:
Laurent Modolo's avatar
Laurent Modolo committed

```sh
cat < my_cal
```

What is the function of the command `<`?

You can use different redirection on the same process. Try the following command:

```sh
cat <<EOF > my_notes
```

Type some text and type `EOF` on a new line. `EOF` stand for **e**nd **o**f **f**ile, it's a conventional sequence to use to indicate the start and the end of a file in a stream.

What happened ? Can you check the content of `my_notes` ? How would you modify this command to add new notes?

Finally, you can redirect a stream toward another stream with the following syntax:
Laurent Modolo's avatar
Laurent Modolo committed

```sh
cal -N2 2&> my_redirection
cal 2&>> my_redirection
```



Ghislain Durif's avatar
Ghislain Durif committed
## Pipes
Laurent Modolo's avatar
Laurent Modolo committed

The last stream manipulation that we are going to see is the pipe which transforms the **stdout** of a process into the **stding** of the next. Pipes are useful to chain multiples simple operations. The pipe operator is `| `

```sh
cal 2020 | less
```

What is the difference between with this command ?

```sh
cal 2020 | cat | cat | less
```



The command `zcat` has the same function as the command `cat` but for compressed files in [`gzip` format](https://en.wikipedia.org/wiki/Gzip).

The command `wget` download files from a url to the corresponding file. Don't run the following command which would download the human genome:

```sh
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
```

We are going to use the `-q` switch which silence `wget` (no download progress bar or such), and the option `-O` which allows use to set the name of the output file. In Unix setting the output file to `-` allow you to write the output on the **stdout** stream.

Analyze the following command, what would it do ?

```sh
wget -q -O - http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz | gzip -dc | less
```

Remember that most Unix command process input and output line by line. Which means that you can process huge datasets without intermediate files or huge RAM capacity.
Laurent Modolo's avatar
Laurent Modolo committed

> We have users the following commands:
>
> - `cat`/ `zcat` to display information in **stdout**
> - `>` / `>>` / `<` / `<<` to redirect a flux
> - `|` the pipe operator to connect processes
> - `wget` to download files

[You can head to the next session to apply pipe and stream manipulation.](./8_text_manipulation.html)