Skip to content
Snippets Groups Projects
7_streams_and_pipes.md 4.43 KiB
Newer Older
Laurent Modolo's avatar
Laurent Modolo committed
---
title: Unix Streams and pipes



---

# Steams and pipes

[![cc_by_sa](/Users/laurent/Documents/formations/2020_08_UNIX/img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)

Objective: Understand function of streams and pipes in Unix systems

When you read a file you start at the top from left to right, you read a flux of information which stops at the end of the file.

Unix streams are much the same things instead of opening a file as a whole bunch of data, process can process it as a flux. There are 3 standard Unix streams:

0. **stdin** the **st**an**d**ard **in**put
1. **stdout** the  **st**an**d**ard **out**put
2. **sterr** the  **st**an**d**ard **err**or

Historically, **stdin** has been the card reader or the keyboard, while the two others where the card puncher or the display. 

The command `cat `simply read from **stdin** and displays the results on **stdout**

```sh
cat
I can talk with
myself
```

It can also read files and display the results on **stdout**

```sh
cat .bashrc
```



## Streams manipulation

You can use the `>` character to redirect a flux toward a file. The following command make a copy of your `.bashrc` files.

```sh
cat .bashrc > my_bashrc
```

Check the results of your command with `less`.

Following the same principle create a `my_cal` file containing the **cal**endar of this month. Check the results with the command `less`

Reuse the same command with the unnamed option `1999`. Check the results with the command `less`. What happened ?

Try the following command

```sh
cal -N 2 > my_cal
```

What is the content of `my_cal` what happened ?

The `>` command can have an argument, the syntax to redirect **stdout** to a file is `1>` it's also the default option (equivalent to `>`). Here the `-N` option doesn't exists, `cal` throws an error. Errors are sent to **stderr** which have the number 2.

Save the error message in `my_cal` and check the results with `less`.

We have seen tha `>` overwrite the content of the file. Try the following commands:

```sh
cal 2020 > my_cal
cal >> my_cal
cal -N 2 2>> my_cal
```

Check the results with the command `less`.

The command `>` send the stream from the left to the file on the right. Try the following:

```sh
cat < my_cal
```

What is the function of the command `<`?

You can use different redirection on the same process. Try the following command:

```sh
cat <<EOF > my_notes
```

Type some text and type `EOF` on a new line. `EOF` stand for **e**nd **o**f **f**ile, it's a conventional sequence to use to indicate the start and the end of a file in a stream.

What happened ? Can you check the content of `my_notes` ? How would you modify this command to add new notes?

Finaly you can redirect a stream toward another stream with the following syntax:

```sh
cal -N2 2&> my_redirection
cal 2&>> my_redirection
```



## Pipes

The last stream manipulation that we are going to see is the pipe which transforms the **stdout** of a process into the **stding** of the next. Pipes are useful to chain multiples simple operations. The pipe operator is `| `

```sh
cal 2020 | less
```

What is the difference between with this command ?

```sh
cal 2020 | cat | cat | less
```



The command `zcat` has the same function as the command `cat` but for compressed files in [`gzip` format](https://en.wikipedia.org/wiki/Gzip).

The command `wget` download files from a url to the corresponding file. Don't run the following command which would download the human genome:

```sh
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
```

We are going to use the `-q` switch which silence `wget` (no download progress bar or such), and the option `-O` which allows use to set the name of the output file. In Unix setting the output file to `-` allow you to write the output on the **stdout** stream.

Analyze the following command, what would it do ?

```sh
wget -q -O - http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz | gzip -dc | less
```

Remember that most Unix command process input and output line by line. Which means that you can process huge dataset without intermediate files or huge RAM capacity.

> We have users the following commands:
>
> - `cat`/ `zcat` to display information in **stdout**
> - `>` / `>>` / `<` / `<<` to redirect a flux
> - `|` the pipe operator to connect processes
> - `wget` to download files

[You can head to the next session to apply pipe and stream manipulation.](https://http://perso.ens-lyon.fr/laurent.modolo/unix/8_text_manipulation.html)