Skip to content
Snippets Groups Projects
Select Git revision
  • 61be22abab1c39ca9de1d616cb0644df47b6e3c5
  • quarto-refactor default
  • main protected
  • master
4 results

7_streams_and_pipes.Rmd

Blame
  • Forked from CAN / UNIX command line
    40 commits behind the upstream repository.
    7_streams_and_pipes.Rmd 4.90 KiB
    title: Unix Streams and pipes
    author: "Laurent Modolo"
    output:
      rmdformats::downcute:
        self_contain: true
        use_bookdown: true
        default_style: "light"
        lightbox: true
        css: "./www/style_Rmd.css"
    
    if (!require("fontawesome")) {
      install.packages("fontawesome")
    }
    if (!require("klippy")) {
      install.packages("remotes")
      remotes::install_github("rlesur/klippy")
    }
    library(fontawesome)
    knitr::opts_chunk$set(echo = TRUE)
    knitr::opts_chunk$set(comment = NA)
    klippy::klippy(
      position = c('top', 'right'),
      color = "white",
      tooltip_message = 'Click to copy',
      tooltip_success = 'Copied !')
    

    cc_by_sa

    Objective: Understand function of streams and pipes in Unix systems

    When you read a file you start at the top from left to right, you read a flux of information which stops at the end of the file.

    Unix streams are much the same things instead of opening a file as a whole bunch of data, process can process it as a flux. There are 3 standard Unix streams:

    1. stdin the standard input
    2. stdout the standard output
    3. sterr the standard error

    Historically, stdin has been the card reader or the keyboard, while the two others where the card puncher or the display.

    The command cat simply read from stdin and displays the results on stdout

    cat
    I can talk with
    myself

    It can also read files and display the results on stdout

    cat .bashrc

    Streams manipulation

    You can use the > character to redirect a flux toward a file. The following command makes a copy of your .bashrc files.

    cat .bashrc > my_bashrc

    Check the results of your command with less.

    Following the same principle create a my_cal file containing the calendar of this month. Check the results with the command less

    Reuse the same command with the unnamed option 1999. Check the results with the command less. What happened ?

    Try the following command

    cal -N 2 > my_cal

    What is the content of my_cal what happened ?

    The > command can have an argument, the syntax to redirect stdout to a file is 1> it's also the default option (equivalent to >). Here the -N option doesn't exist, cal throws an error. Errors are sent to stderr which have the number 2.

    Save the error message in my_cal and check the results with less.

    We have seen that > overwrite the content of the file. Try the following commands:

    cal 2020 > my_cal
    cal >> my_cal
    cal -N 2 2>> my_cal

    Check the results with the command less.

    The command > sends the stream from the left to the file on the right. Try the following:

    cat < my_cal

    What is the function of the command <?

    You can use different redirection on the same process. Try the following command:

    cat <<EOF > my_notes

    Type some text and type EOF on a new line. EOF stand for end of file, it's a conventional sequence to use to indicate the start and the end of a file in a stream.

    What happened ? Can you check the content of my_notes ? How would you modify this command to add new notes?

    Finally, you can redirect a stream toward another stream with the following syntax:

    cal -N2 2&> my_redirection
    cal 2&>> my_redirection

    Pipes

    The last stream manipulation that we are going to see is the pipe which transforms the stdout of a process into the stding of the next. Pipes are useful to chain multiples simple operations. The pipe operator is |

    cal 2020 | less

    What is the difference between with this command ?

    cal 2020 | cat | cat | less

    The command zcat has the same function as the command cat but for compressed files in gzip format.

    The command wget download files from a url to the corresponding file. Don't run the following command which would download the human genome:

    wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

    We are going to use the -q switch which silence wget (no download progress bar or such), and the option -O which allows use to set the name of the output file. In Unix setting the output file to - allow you to write the output on the stdout stream.

    Analyze the following command, what would it do ?

    wget -q -O - http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz | gzip -dc | less

    Remember that most Unix command process input and output line by line. Which means that you can process huge datasets without intermediate files or huge RAM capacity.

    We have users the following commands:

    • cat/ zcat to display information in stdout
    • > / >> / < / << to redirect a flux
    • | the pipe operator to connect processes
    • wget to download files

    You can head to the next session to apply pipe and stream manipulation.