--- title: Unix Streams and pipes author: "Laurent Modolo" --- ```{r include = FALSE} if (!require("fontawesome")) { install.packages("fontawesome") } library(fontawesome) knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(comment = NA) ``` <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"> <img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /> </a> Objective: Understand function of streams and pipes in Unix systems When you read a file you start at the top from left to right, you read a flux of information which stops at the end of the file. Unix streams are much the same things instead of opening a file as a whole bunch of data, process can process it as a flux. There are 3 standard Unix streams: 0. **stdin** the **st**an**d**ard **in**put 1. **stdout** the **st**an**d**ard **out**put 2. **sterr** the **st**an**d**ard **err**or Historically, **stdin** has been the card reader or the keyboard, while the two others where the card puncher or the display. The command `cat `simply read from **stdin** and displays the results on **stdout** ```sh cat I can talk with myself ``` It can also read files and display the results on **stdout** ```sh cat .bashrc ``` ## Streams manipulation You can use the `>` character to redirect a flux toward a file. The following command makes a copy of your `.bashrc` files. ```sh cat .bashrc > my_bashrc ``` Check the results of your command with `less`. Following the same principle create a `my_cal` file containing the **cal**endar of this month. Check the results with the command `less` Reuse the same command with the unnamed option `1999`. Check the results with the command `less`. What happened ? Try the following command ```sh cal -N 2 > my_cal ``` What is the content of `my_cal` what happened ? The `>` command can have an argument, the syntax to redirect **stdout** to a file is `1>` it's also the default option (equivalent to `>`). Here the `-N` option doesn't exist, `cal` throws an error. Errors are sent to **stderr** which have the number 2. Save the error message in `my_cal` and check the results with `less`. We have seen that `>` overwrite the content of the file. Try the following commands: ```sh cal 2020 > my_cal cal >> my_cal cal -N 2 2>> my_cal ``` Check the results with the command `less`. The command `>` sends the stream from the left to the file on the right. Try the following: ```sh cat < my_cal ``` What is the function of the command `<`? You can use different redirection on the same process. Try the following command: ```sh cat <<EOF > my_notes ``` Type some text and type `EOF` on a new line. `EOF` stand for **e**nd **o**f **f**ile, it's a conventional sequence to use to indicate the start and the end of a file in a stream. What happened ? Can you check the content of `my_notes` ? How would you modify this command to add new notes? Finally, you can redirect a stream toward another stream with the following syntax: ```sh cal -N2 2&> my_redirection cal 2&>> my_redirection ``` ## Pipes The last stream manipulation that we are going to see is the pipe which transforms the **stdout** of a process into the **stding** of the next. Pipes are useful to chain multiples simple operations. The pipe operator is `| ` ```sh cal 2020 | less ``` What is the difference between with this command ? ```sh cal 2020 | cat | cat | less ``` The command `zcat` has the same function as the command `cat` but for compressed files in [`gzip` format](https://en.wikipedia.org/wiki/Gzip). The command `wget` download files from a url to the corresponding file. Don't run the following command which would download the human genome: ```sh wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz ``` We are going to use the `-q` switch which silence `wget` (no download progress bar or such), and the option `-O` which allows use to set the name of the output file. In Unix setting the output file to `-` allow you to write the output on the **stdout** stream. Analyze the following command, what would it do ? ```sh wget -q -O - http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz | gzip -dc | less ``` Remember that most Unix command process input and output line by line. Which means that you can process huge datasets without intermediate files or huge RAM capacity. > We have users the following commands: > > - `cat`/ `zcat` to display information in **stdout** > - `>` / `>>` / `<` / `<<` to redirect a flux > - `|` the pipe operator to connect processes > - `wget` to download files [You can head to the next session to apply pipe and stream manipulation.](./8_text_manipulation.html)