Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision

Target

Select target project
  • can/unix-command-line
  • gdurif/unix-command-line_dev
2 results
Select Git revision
Show changes
Commits on Source (50)
Showing
with 1728 additions and 829 deletions
*.html
.DS_Store
.Rproj.user
/.quarto/
/_book/
*_cache/
# This file is a template, and might need editing before it works on your project.
# Full project: https://gitlab.com/pages/plain-html
pages:
stage: deploy
image: carinerey/r_for_beginners
image: ghcr.io/quarto-dev/quarto
script:
- mkdir -p public/
- cp -R img public/
- make
- quarto -v
- |
quarto render
mkdir public
cp -r _book/* public/
interruptible: true
artifacts:
paths:
- public
rules:
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
- if: $CI_PIPELINE_SOURCE == 'merge_request_event'
---
title: SSH
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Objective: Learn basics of ssh connection in GNU/Linux
In the previous section, we have seen how to run scripts and complex commands on your computer. In this session we are going to learn to do that over the network.
In the previous section, we have seen how to run scripts and complex commands on your computer. In this session, we are going to learn to do that over the network.
Most of the content from this session are from [wikipedia.org](https://wikipedia.org)
Most of the content from this session is from [wikipedia.org](https://wikipedia.org)
# Network
## Network
First before talking about how to communicate over a network, we need to define what is a network in computational science. We can distinguish between two types of network, **circuit switching** networks and **packet switching** networks.
First, before talking about how to communicate over a network, we need to define what is a network in computational science. We can distinguish between two types of network, **circuit switching** networks and **packet switching** networks.
## circuit switching
### circuit switching
Circuit switching is the historical telephonic network architecture. When device A wants to communicate with device B, it has to establish a connection over the network. In a circuit switching network, the connections between a chain of nodes (hopefully the shortest chain) are established and fixed. Device A connects to the closest node and ask connection to Device B, this node will do the same thing to the closest node to Device B, so on and so forth until the connection reach Device B.
Circuit switching is the historical telephonic network architecture. When device A wants to communicate with device B, it has to establish a connection over the network. In a circuit switching network, the connections between a chain of nodes (hopefully the shortest chain) are established and fixed. Device A connects to the closest node and asks connection to device B. This node will do the same thing to the closest node to device B, so on and so forth until the connection reaches device B.
If you try to call someone, who is already in a phone conversation, the line will be occupied.
If you try to call someone who is already in a phone conversation, the line will be occupied.
![http://www.tcpipguide.com/free/diagrams/funcircuitswitching.png](./img/funcircuitswitching.png)
## packet switching
### packet switching
Packet switching is a method of grouping data over the network into packets. Each packet has a header and a payload. The header data can be read by each node to direct the packet to its destination. The header data also inform the Host 2 of the packets order. The payload contains the data that we want to transmit over the network. In packet switching, the network bandwidth is not pre-allocated like in circuit switching. Each packet is called a datagram.
Packet switching is a method of grouping data over the network into packets. Each packet has a header and a payload. The header data can be read by each node to direct the packet to its destination. The header data also informs the Host 2 of the packets order. The payload contains the data that we want to transmit over the network. In packet switching, the network bandwidth is not pre-allocated like in circuit switching. Each packet is called a datagram.
> “A self-contained, independent entity of data carrying sufficient information to be routed from the source to the destination computer without reliance on earlier exchanges between this source and destination computer and the transporting network.”
![https://en.wikipedia.org/wiki/Packet_switching#/media/File:Packet_Switching.gif](./img/packet_switching.gif)
In a packet switching network when you send a flux of data (video, sound, etc.), you have the illusion of continuity like for process switching handled by the scheduler.
In a packet switching network, when you send a flux of data (video, sound, etc.), you have the illusion of continuity like for process switching handled by the scheduler.
# **Internet Protocol** (IP)
## **Internet Protocol** (IP)
> The **Internet Protocol** (**IP**) is the principal [communications protocol](https://en.wikipedia.org/wiki/Communications_protocol) in the [Internet protocol suite](https://en.wikipedia.org/wiki/Internet_protocol_suite) for relaying [datagrams](https://en.wikipedia.org/wiki/Datagram) across network boundaries. Its [routing](https://en.wikipedia.org/wiki/Routing) function enables [internetworking](https://en.wikipedia.org/wiki/Internetworking), and essentially establishes the [Internet](https://en.wikipedia.org/wiki/Internet).
......@@ -68,7 +43,7 @@ IP has the task of delivering [packets](https://en.wikipedia.org/wiki/Packet_(in
The first major version of IP, [Internet Protocol Version 4](https://en.wikipedia.org/wiki/IPv4) (IPv4), is the dominant protocol of the Internet. Its successor is [Internet Protocol Version 6](https://en.wikipedia.org/wiki/IPv6) (IPv6), which has been in increasing [deployment](https://en.wikipedia.org/wiki/IPv6_deployment) on the public Internet since c. 2006.
## IPv4
### IPv4
An **IPv4** is composed of 4 digits ranging from 0 to 255 separated by `.` , which gives an address space of 4294967296 (2^32) addresses. Some combinations of **IPv4** are restricted:
......@@ -91,32 +66,37 @@ An **IPv4** is composed of 4 digits ranging from 0 to 255 separated by `.` , whi
| 240.0.0.0/4 | 240.0.0.0–255.255.255.254 | 268435455 | Internet | Reserved for future use.[[15\]](https://en.wikipedia.org/wiki/IPv4#cite_note-rfc3232-15) (Former Class E network). |
| 255.255.255.255/32 | 255.255.255.255 | 1 | Subnet | Reserved for the "limited [broadcast](https://en.wikipedia.org/wiki/Broadcast_address)" destination address.[[6\]](https://en.wikipedia.org/wiki/IPv4#cite_note-rfc6890-6)[[16\]](https://en.wikipedia.org/wiki/IPv4#cite_note-rfc919-16) |
## IPv6
### IPv6
An **IPv6** is composed of 8 groups of 4 digits long number separated by `:`. The numbers are in hexadecimal format (number of base 16, randing from 0 to 9 and A to F). Compared to **IPv4**, **IPv6** allows for 2^128 = 340,282,366,920,938,463,463,374,607,431,768,211,456 addresses (approximately 3.4×10^38). For example, an IP address is: *2001:0db8:0000:0000:0000:ff00:0042:8329*
An **IPv6** is composed of 8 groups of 4 digits long number separated by `:`.
The numbers are in hexadecimal format (number of base 16, randing from 0 to 9 and A to F).
Compared to **IPv4**, **IPv6** allows for 2^128 = 340,282,366,920,938,463,463,374,607,431,768,211,456 addresses (approximately 3.4×10^38).
For example, an IP address is: **2001:0db8:0000:0000:0000:ff00:0042:8329**
To display your VM IP addresses you can use the following command: `ip address show`
Local **IPv6** addresses start with **fe80::**
## **Domain Name System** (**DNS**)
### **Domain Name System** (**DNS**)
Instead of using IP addresses in your everyday life, you often use the domain name. The DNS is composed of many DNS servers that are hierarchically organized and decentralized. By querying the DNS with a particular domain name, the correct name server will return the corresponding IP address. For most network tools, you can use domain names (URL) or IP addresses.
![dns resolver](./img/dns_resolver.svg)
## Transmission Control Protocol (**TCP**)
### Transmission Control Protocol (**TCP**)
The **Transmission Control Protocol** (**TCP**) is one of the main [protocols](https://en.wikipedia.org/wiki/Communications_protocol) of the [Internet protocol suite](https://en.wikipedia.org/wiki/Internet_protocol_suite). TCP provide, reliable, ordered, and error-checked delivery of a stream of data between applications running on hosts communincating over an IP network.
The **Transmission Control Protocol** (**TCP**) is one of the main [protocols](https://en.wikipedia.org/wiki/Communications_protocol) of the [Internet protocol suite](https://en.wikipedia.org/wiki/Internet_protocol_suite). TCP provide, reliable, ordered, and error-checked delivery of a stream of data between applications running on hosts communicating over an IP network.
- data arrives in-order
- data has minimal error (i.e., correctness)
- duplicate data is discarded
- lost or discarded packets are resent
- includes traffic congestion control
- Heavtweight (no ordering of messages, no tracking connections, etc. It is a very simple transport layer designed on top of IP)
- Heavyweight (loots of checks)
## **User Datagram Protocol** (**UDP**)
### **User Datagram Protocol** (**UDP**)
UDP uses a simple [connectionless communication](https://en.wikipedia.org/wiki/Connectionless_communication) model with a minimum of protocol mechanisms.
......@@ -126,11 +106,11 @@ UDP uses a simple [connectionless communication](https://en.wikipedia.org/wiki/C
- Multicast (a single datagram packet can be automatically routed without duplication to a group of subscribers)
- Lightweight (no ordering of messages, no tracking connections, etc. It is a very simple transport layer designed on top of IP)
## Port
### Port
Higher, communication protocols like TCP and UDP, also define **port**. A **port** is a communication endpoint. When software wants to communicate overt TCP or UDP it will do so using a specific **port**. Each system has **port** numbers ranging from 0 to 65535. **Port** numbered from 0 through 1023 are system **ports** used by well-known processes (you need specific rights to use them).
Higher, communication protocols like TCP and UDP, also define **port**. A **port** is a communication endpoint. When a software wants to communicate over TCP or UDP, it will do so using a specific **port**. Each system has **port** numbers ranging from **0** to **65535**. **Port** numbered from **0** through **1023** are system **ports** used by well-known processes (you need specific rights to use them).
Here are a list of notable port numbers:
Here is a list of notable port numbers:
| Number | Assignment |
| ------ | ------------------------------------------------------------ |
......@@ -138,7 +118,7 @@ Here are a list of notable port numbers:
| 21 | [File Transfer Protocol](https://en.wikipedia.org/wiki/File_Transfer_Protocol) (FTP) Command Control |
| 22 | [Secure Shell](https://en.wikipedia.org/wiki/Secure_Shell) (SSH) Secure Login |
| 23 | [Telnet](https://en.wikipedia.org/wiki/Telnet) remote login service, unencrypted text messages |
| 25 | [Simple Mail Transfer Protocol](https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol) (SMTP) E-mail routing |
| 25 | [Simple Mail Transfer Protocol](https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol) (SMTP) e-mail routing |
| 53 | [Domain Name System](https://en.wikipedia.org/wiki/Domain_Name_System) (DNS) service |
| 67, 68 | [Dynamic Host Configuration Protocol](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol) (DHCP) |
| 80 | [Hypertext Transfer Protocol](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) (HTTP) used in the [World Wide Web](https://en.wikipedia.org/wiki/World_Wide_Web) |
......@@ -150,15 +130,15 @@ Here are a list of notable port numbers:
| 194 | [Internet Relay Chat](https://en.wikipedia.org/wiki/Internet_Relay_Chat) (IRC) |
| 443 | [HTTP Secure](https://en.wikipedia.org/wiki/HTTP_Secure) (HTTPS) HTTP over TLS/SSL |
Nowadays, **ports** provide multiplexing, which means that multiple service or communication session can use the same **port** number.
Nowadays, **ports** provide multiplexing, so multiple service or communication session can use the same **port** number.
# SSH
## SSH
There are numerous other protocols ([RTP](https://en.wikipedia.org/wiki/Real-time_Transport_Protocol) for example). But most of them run over the TCP and UDP protocols. **SSH** or **Secure Shell** is one of them. SSH is a [cryptographic](https://en.wikipedia.org/wiki/Cryptography) [network protocol](https://en.wikipedia.org/wiki/Network_protocol) for operating network services securely over an unsecured network.
There are numerous other protocols ([RTP](https://en.wikipedia.org/wiki/Real-time_Transport_Protocol) for example). But most of them run over the **TCP** and **UDP** protocols. **SSH** or **Secure Shell** is one of them. **SSH** is a [cryptographic](https://en.wikipedia.org/wiki/Cryptography) [network protocol](https://en.wikipedia.org/wiki/Network_protocol) for operating network services [securely over an unsecured network](https://noratrieb.dev/blog/posts/ssh-security/).
SSH use a client-server architecture, you use an SSH client to connect to an SSH server. By default most Linux distribution dont comes with an SSH server installed. For the IFB, SSH connection is the default way to connect to your VMs, so you should have an SSH sever up and running.
**SSH** use a client-server architecture, you use an **SSH client** to connect to an **SSH server**. By default, most Linux distributions don't come with an **SSH server** installed. For the IFB, **SSH** connection is the default way to connect to your VMs, so you should have an **SSH** server running.
Find the name of the SSH server process
Find the name of the **SSH** server process
<details><summary>Solution</summary>
<p>
......@@ -168,37 +148,37 @@ ps -el | grep "ssh"
</p>
</details>
SSH uses [Public-key cryptography (or asymmetric cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography)), to secure its communications.
SSH uses [Public-key cryptography (or asymmetric cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography)), to secure its communications.
## Public-key cryptography
### Public-key cryptography
[Public-key cryptography (or asymmetric cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography)), is a cryptographic system which uses pairs of [keys](https://en.wikipedia.org/wiki/Cryptographic_key): *public keys* (which may be known to others), and *private keys* (which may never be known by any except the owner).
[Public-key cryptography (or asymmetric cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography)), is a cryptographic system which uses pairs of [keys](https://en.wikipedia.org/wiki/Cryptographic_key): *public keys* (which may be known to others), and *private keys* (which may never be known by anyone except the owner).
A cryptographic algorithm is used to generate a pair of *public* and *private* keys from a large random number. Then, the 3 following scheme can be used to secure communication:
## Communicate with the server
### Communicate with the server
The server sent a public key to the client on the first connection.
![public_key_encryption](./img/public_key_encryption.png)
## Share a secret
### Share a secret
Can be used to share public keys (see [Diffie-Hellman)](https://fr.wikipedia.org/wiki/%C3%89change_de_cl%C3%A9s_Diffie-Hellman).
![public_key_shared_secret](./img/public_key_shared_secret.png)
## Authentification
### Authentification
- The server sends a random string of characters to the client
- The client crypt the random string and send it back to the server
- The server decrypt the message with the client public key and compare it to the random string
- The client crypts the random string and send it back to the server
- The server decrypts the message with the client public key and compares it to the random string
![private_key_signing](./img/private_key_signing.png)
# SSH Server
## SSH Server
By default, on the IFB, password authentication is disabled to enforce the use of public key based authentication. To learn `ssh` command we are going to enable this option on your VMs. Find the`sshd` configuration file and open it with the editor of your choice.
By default, on the IFB, password authentication is disabled to enforce the use of public key-based authentication. To learn `ssh` commands, we are going to enable this option on your VMs. Find the `sshd` configuration file and open it with the editor of your choice.
<details><summary>Solution</summary>
<p>
......@@ -208,7 +188,7 @@ vim /etc/ssh/sshd_config
</p>
</details>
This file is own by **root**, you need to get **root** access to your account.
This file is owned by **root**, you need to get **root** access to your account.
<details><summary>Solution</summary>
<p>
......@@ -218,11 +198,15 @@ docker run -it --volume /:/root/chroot alpine sh -c "chroot /root/chroot /bin/ba
</p>
</details>
Using the `sudo` command edit the configuration file to set **PasswordAuthentication** to **yes** and add the following lines:
**AllowUsers etudiant student**
Using the `sudo` command, edit the configuration file to:
**PermitRootLogin no**
* add a `#` in front of the line `Include /etc/ssh/sshd_config.d/*.conf` to comment it
* remove the `#` in front of the line `PasswordAuthentication yes` to uncomment it
* add the following lines at the end of the file:
```
AllowUsers etudiant student
PermitRootLogin no
```
The `sshd` (SSH Daemon) process in launched and managed by `systemd`. You can manage `systemd` service with the `systemctl` command. Try this command without any arguments. You can search for `sshd` by typing `/sshd` and pressing `enter`. You can leave the `systemctl` view by pressing `q`.
......@@ -244,50 +228,50 @@ sudo systemctl status sshd
</p>
</details>
You are going to create an account for another member of the formation to connect on your VM.
You are going to create an account for another member of the formation to connect to your VM.
```sh
sudo useradd -m -s /bin/bash -g users student
sudo passwd student
```
Give the password and your IP on the chat.
Give the password and your IP address to another member of your choice. The IP address is the one you used to connect to your VM.
# SSH client
## SSH client
To connect of an SSH server you can use the following command:
To connect to an SSH server you can use the following command:
```sh
ssh login@IP_adress
```
Use this command to connect to another student VM.
Use this command to connect to another student's VM.
On the first connection, `ssh` ask you to accept the public key of the server (key fingerprint). With that in the future if someone try to fool you by impersonating the ssh server, he wont be able to do it without the corresponding private key.
On the first connection, `ssh` prompts you to accept the public key of the server (key fingerprint). Thanks to that, if someone tries in the future to fool you by impersonating the ssh server, he won't be able to do it without the corresponding private key.
You can close the connection by pressing `ctrl` + `d` or with the command `exit`.
Check the content of the `~/.ssh/` folder, where is saved the server public key ?
Check the content of the `~/.ssh/` folder. Where is saved the server public key?
Congratualtion you are connected on a VM through another VM !
Congratulations, you are connected on a VM through another VM!
## Key authentication
### Key authentication
Every time, that you want to connect to the ssh server, you have to type your account password, this password is encrypted and send over the network. Instead you can use a pair of private and public key to authenticate yourself.
Every time, that you want to connect to the ssh server, you have to type your account password, this password is encrypted and send over the network. Instead, you can use a pair of private and public key to authenticate yourself.
First you have to generate a pair of key with the command:
First, you have to generate a pair of key with the command:
```sh
ssh-keygen -t ed25519 -C "your.mail@ens-lyon.fr"
```
The option `-t` specifies the algorithm to use while `-C` specify comment associated with the key (generally the email of the person generating the key). You can check the **man**ual and internet to compare the different available algorithms.
The option `-t` specifies the algorithm to use while `-C` specify comment associated with the key (generally, the email of the person generating the key). You can check the **man**ual and internet to compare the different algorithms available.
It is a good practice to name a given pair of keys after the name of the server, on which you want to use those keys.
It is a good practice to name a pair of keys after the name of the server on which you want to use those keys.
You can use the name `/home/etudiant/.ssh/id_ed25519_otherVM`
Then as an additional security measure, you can restrict the usage of your private key by defining a password. You will need the password and the key file to authenticate yourself.
Then, as an additional security measure, you can restrict the usage of your private key by defining a password. You will need the password and the key file to authenticate yourself.
The generated keys are in the folder `~/.ssh/`
......@@ -297,7 +281,7 @@ Then you need to make a copy of your public key (`.pub`) on the sshd server.
ssh-copy-id -i ~/.ssh/id_ed25519_otherVM.pub login@IP_adresse
```
Note that for security reason, only you should be able to read and write within your `.ssh` folder (you dont want someone else to mitigate with your keys). You can use the command `chmod 600 .ssh/*`
Note that for security reason, only you should be able to read and write within your `.ssh` folder (you don't want someone else to mitigate with your keys). You can use the command `chmod 600 .ssh/*`
You can try to log on the server using the key with the following command:
......@@ -305,35 +289,35 @@ You can try to log on the server using the key with the following command:
ssh login@IP_adress -i ~/.ssh/id_ed25519_otherVM
```
`ssh` Should ask for your key password instead of the student account password.
`ssh` should ask for your key password instead of the student account password.
Congratulations, you authenticated yourself on a remote server without sending your password over the network !
Congratulations, you authenticated yourself on a remote server without sending your password over the network!
# SSH based tools
## SSH based tools
Sometime, you want to do other things than executing commands on a remove computer. For example, you may want to transfer files over the network.
Sometime, you want to do other things than execute commands on a remote computer. For example, transfer files over the network.
## scp
### scp
The `scp` command comes with the `ssh` client installation you can use it to transfer file from your computer to the ssh sever:
The `scp` command comes with the `ssh` client installation. You can use it to transfer files from your computer to the ssh server:
```sh
scp local/path login@IP_adress:remote/path
```
> You can use a relative remote path, where the ":" correspond to your home folder on the remote server.
> You can use a relative remote path, where the ":" corresponds to your home folder on the remote server.
You can also retrieve file from the server:
You can also retrieve files from the server:
```sh
scp login@IP_adress:remote/path local/path
```
To transfer directory you can use the `-r` witch
To transfer directory, you can use the `-r` witch
## rsync
### rsync
`scp` Is a basic command for file transfer. If you want advanced process bar and file integrity checking, you can use the `rsync` command instead.
`scp` is a basic command for file transfer. If you want advanced process bar and file integrity checking, you can use the `rsync` command instead.
For example
......@@ -343,13 +327,13 @@ rsync -auv local/path login@IP_adress:remote/path
Will only transfer files from `local/path` not already present in `remote/path`. The `-c` switch will compute a checksum of the file locally and remotely to be certain that they are identical.
## sshfs
### sshfs
You can use the `sshfs` command to mount a remote folder over ssh on your computer.
# SSH tips
## SSH tips
## IFB authentication
### IFB authentication
The default authentication method for the IFB uses keys generated with the `rsa` algorithm
......@@ -357,17 +341,17 @@ The default authentication method for the IFB uses keys generated with the `rsa`
ssh-keygen -t rsa -b 4096 -C "your.mail@ens-lyon.fr"
```
The `-b` option set the size of the key.
The `-b` option sets the size of the key.
Instead of using the `ssh-copy-id` command, you are going to copy paste your public key into your [IFB configuration page.](https://biosphere.france-bioinformatique.fr/cloudweb_account/settings/edit)
Instead of using the `ssh-copy-id` command, you are going to copy-paste your public key into your [IFB configuration page.](https://biosphere.france-bioinformatique.fr/cloudweb_account/settings/edit)
You can now use the [RainBio catalogue](https://biosphere.france-bioinformatique.fr/catalogue/) to launch any available VMs and connect to is with SSH from your current VM.
## SSH configuration
### SSH configuration
Long ssh command can be tedious to use. This is why we are now going to explore the last file in the `.ssh` folder: `.ssh/config`.
Long ssh command can be tedious to use. Therefore, we are going to exploit the last file in the `.ssh` folder: `.ssh/config`.
This file is decomposed in different `Host` sections like the following to connect yourself to [the ssh server of the ens](https://instella.ens-lyon.fr/stella/intra/ent-ssh.html).
This file is decomposed in different `Host` sections, like the following to connect yourself to [the ssh server of the ens](https://instella.ens-lyon.fr/stella/intra/ent-ssh.html):
```yaml
Host ens
......@@ -378,12 +362,12 @@ Host ens
PreferredAuthentications publickey,password,
```
- `HostName` define the server url or IP address
- `User` the login to use
- `Identit*` define the key authentication mechanism
- `HostName` defines the server url or IP address
- `User` is the login to use
- `Identit*` defines the key authentication mechanism
- `PreferredAuthentications` tells the order of the authentication mechanism to try
With this configuration you can use the command:
With this configuration, you can use the command:
```sh
ssh ens
......@@ -404,7 +388,7 @@ Host *
Here we say that we want to enable compression for all the connections. And that we want each connection to stay alive 3600 seconds. The connection is maintained with socked files in the `~/.ssh/` folder with names starting with `socket-`. This also means that if you connect more than once to the same server, the same connection will be used.
Sometime you want to connect to a ssh server from an intermediate (or many intermediate) ssh server. To do that you can use the **ProxyJump** option. For example, you can connect to a computer running a ssh server within the ens with the following config.
Sometime you want to connect to a ssh server from an intermediate (or many intermediate) ssh server. To do that, you can use the **ProxyJump** option. For example, you can connect to a computer running a ssh server within the ens with the following config.
```yaml
Host work-ens
......@@ -426,4 +410,4 @@ With the command `ssh work-ens`, the `ssh` client is going to first connect to `
> - scp to copy files
> - rsync to copy files
In the next session, we are going to learn how to [install system-wide programs](./11_install_system_programs.html) like the one managed by `systemd`
In the next session, we are going to learn how to [install systemwide programs](./11_install_system_programs.html) like the one managed by `systemd`.
---
title: Install system programs
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Objective: Learn how to install programs in GNU/Linux
As we have seen in the [4 unix file system](http://perso.ens-lyon.fr/laurent.modolo/unix/4_unix_file_system.html#lib-and-usrlib) session, programs are files that contain instruction for the computer to do things. Those files can be in binary or text format (with a [shebang](http://perso.ens-lyon.fr/laurent.modolo/unix/9_batch_processing.html#shebang)). Any of those files, present in a folder of the [**PATH**](http://perso.ens-lyon.fr/laurent.modolo/unix/9_batch_processing.html#path) variable are executable anywhere by the user. For system wide installation, the program files are copied within shared folder path containained in the [**PATH**](http://perso.ens-lyon.fr/laurent.modolo/unix/9_batch_processing.html#path) variable.
As we have seen in the [4 Unix file system](http://perso.ens-lyon.fr/laurent.modolo/unix/4_unix_file_system.html#lib-and-usrlib) session, programs are files that contain instruction for the computer to do things. Those files can be in binary or text format (with a [shebang](http://perso.ens-lyon.fr/laurent.modolo/unix/9_batch_processing.html#shebang)). Any of those files, present in a folder of the [**PATH**](http://perso.ens-lyon.fr/laurent.modolo/unix/9_batch_processing.html#path) variable are executable anywhere by the user. For system-wide installation, the program files are copied within shared folder path contained in the [**PATH**](http://perso.ens-lyon.fr/laurent.modolo/unix/9_batch_processing.html#path) variable.
Developers dont want to reinvent the wheel each time they want to write complex instruction in their programs, this is why they use shared library of pre-written complex instruction. This allows for quicker development, fewer bugs (we only have to debug the library once and use it many times), and also [better memory management](http://perso.ens-lyon.fr/laurent.modolo/unix/6_unix_processes.html#processes-tree) (we only load the library once and it can be used by different programs).
Developers don't want to reinvent the wheel each time they want to write complex instruction in their programs, this is why they use a shared library of pre-written complex instruction. This allows for quicker development, fewer bugs (we only have to debug the library once and use it many times), and also [better memory management](http://perso.ens-lyon.fr/laurent.modolo/unix/6_unix_processes.html#processes-tree) (we only load the library once and it can be used by different programs).
# Package Manager
## Package Manager
However, interdependencies between programs and libraries can be a nightmare to handle manually this is why most of the time when you install a program you will use a [package manager](https://en.wikipedia.org/wiki/Package_manager). [Package manager](https://en.wikipedia.org/wiki/Package_manager) are system tools that will handle automatically all the dependencies of a program. They rely on **repositories** of programs and library which contains all the information about the trees of dependence and the corresponding files (**packages**).
However, interdependencies between programs and libraries can be a nightmare to handle manually this is why most of the time when you install a program you will use a [package manager](https://en.wikipedia.org/wiki/Package_manager). [Package managers](https://en.wikipedia.org/wiki/Package_manager) are system tools that will handle automatically all the dependencies of a program. They rely on **repositories** of programs and library which contain all the information about the trees of dependence and the corresponding files (**packages**).
System-wide installation steps:
- The user asks the package manager to install a program
- The **package manager** queries its repository lists to search for the most recent **package** version of the program (or a specific version)
- The **package manager** construct the dependency tree of the program
- The **package manager** check that the new dependency tree is compatible with every other installed program
- The **package manager** install the program **package** and all its dependencies **packages** in their correct version
- The **package manager** constructs the dependency tree of the program
- The **package manager** checks that the new dependency tree is compatible with every other installed program
- The **package manager** installs the program **package** and all its dependencies **packages** in their correct version
The main difference between GNU/Linux distribution is the package manager they use
......@@ -59,7 +34,7 @@ The main difference between GNU/Linux distribution is the package manager they u
- Gentoo: [portage](https://en.wikipedia.org/wiki/Portage_(software))
- Alpine: [apk](https://wiki.alpinelinux.org/wiki/Alpine_newbie_apk_packages)
Packages manager install the packages in **root** owned folders, you need **root** access to be able to use them.
Packages manager install the packages in **root** owned folders, you need **root** access to use them.
<details><summary>Solution</summary>
<p>
......@@ -69,11 +44,11 @@ docker run -it --volume /:/root/chroot alpine sh -c "chroot /root/chroot /bin/ba
</p>
</details>
## Installing R
### Installing R
**R** is a complex program that relies on loots of dependencies. Your current VM run on Ubuntu, so we are going to use the `apt` tool (`apt-get` is the older version of the `apt` command, `synaptic` is a graphical interface for `apt-get`)
**R** is a complex program that relies on loots of dependencies. Your current VM run on Ubuntu, so we are going to use the `apt` tool (`apt-get` is the older version of the `apt` command, `synaptic` is a graphical interface for `apt-get`).
You can check the **r-base** package dependencies on the website [packages.ubuntu.com](https://packages.ubuntu.com/focal/r-base). Not too much dependency ? Check the sub-package **r-base-core**.
You can check the **r-base** package dependencies on the website [packages.ubuntu.com](https://packages.ubuntu.com/focal/r-base). Not too much dependency? Check the sub-package **r-base-core**.
You can check the **man**ual of the `apt` command to install **r-base-core**.
......@@ -85,19 +60,19 @@ sudo apt install r-base-core
</p>
</details>
What is the **R** version that you installed ? Is there a newer version of **R** ?
What is the **R** version that you installed? Is there a newer version of **R**?
## Adding a new repository
### Adding a new repository
You can check the list of repositories that `apt` checks in the file `/etc/apt/sources.list`.
You can add the official cran repository to your repositories list:
```sh
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu <release_name>-cran40/'
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
```
You can use the command `lsb_release -sc` to get your **release name**.
The command `lsb_release -sc` automatically fetches your distribution **release name**.
Then you must add the public key of this repository:
......@@ -105,9 +80,9 @@ Then you must add the public key of this repository:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
```
## Updating the repository list
### Updating the repository list
You can now use `apt` to update your repository list dans try to reinstall **r-base-core**
You can now use `apt` to update your repository list and try to reinstall **r-base-core**
<details><summary>Solution</summary>
<p>
......@@ -117,7 +92,7 @@ sudo apt update
</p>
</details>
The command gives you a way to list all the upgradable **packages**, which version of **R** can you install now ?
The command gives you a way to list all the upgradable **packages**, which version of **R** can you install now?
You can upgrade all the upgradable **packages**.
......@@ -129,7 +104,7 @@ sudo apt upgrade
</p>
</details>
With the combination of `update` and `upgrade` you can keep your whole system up to date the even the kernel files is just another package. You can use `apt` to search for the various versions of the `linux-image`.
With the combination of `update` and `upgrade`, you can keep your whole system up to date the even the kernel files are just another package. You can use `apt` to search for the various versions of the `linux-image`.
<details><summary>Solution</summary>
<p>
......@@ -140,10 +115,9 @@ sudo apt search linux-image
</details>
### Language specific package manager
## Language specific package manager
If it’s not a good idea to have different **package manager** on the same system (they don’t know how the dependencies are handled by the other’s manager). You will also encounter language specific package manager:
If it's not a good idea to have different **package manager** on the same system (they don't know how the dependencies are handled by the other managers). You will also encounter language specific package manager:
- `ppm` for Perl
- `pip` for Python
......@@ -152,9 +126,9 @@ If it’s not a good idea to have different **package manager** on the same syst
- `install.packages` for R
- ...
These **package managers** allows your to make installation local to the user, which is advisable to avoid any conflict with the **packages manager** of the system.
These **package managers** allow you to make installation local to the user, which is advisable to avoid any conflict with the **packages manager** of the system.
For example, you can use the following command to install `glances` system wide with `pip`
For example, you can use the following command to install `glances` system-wide with `pip`
```sh
sudo pip3 install glances
......@@ -162,15 +136,14 @@ sudo pip3 install glances
You can now try to install `glances` with `apt`
What is the `glances` version installed with `apt`, what is the one installed with `pip` ? What is the version of the `glances` of your **PATH** ?
What is the `glances` version installed with `apt`, what is the one installed with `pip`? What is the version of the `glances` in your **PATH**?
Next-time use `pip` with the `--user` switch.
## Manual installation
# Manual installation
Sometimes, a specific tool that you want to use will not be available through a **package manager**. If you are lucky, you will find a **package** for your distribution. For `apt` the **package** are `.deb` files.
Sometimes, a specific tool that you want to use will not be available through a **package manager**. If you are lucky, you will find a **package** for your distribution. For `apt`, the **package** are `.deb` files.
For example, you can download `simplenote` version 2.7.0 for your architecture [here](https://github.com/Automattic/simplenote-electron/releases/tag/v2.7.0).
......@@ -184,7 +157,7 @@ wget https://github.com/Automattic/simplenote-electron/releases/download/v2.7.0/
You can then use `apt` to install this file.
# From sources
## From sources
If the program is open source, you can also [download the sources](https://github.com/Automattic/simplenote-electron/archive/v2.7.0.tar.gz) and build them.
......@@ -198,9 +171,9 @@ wget https://github.com/Automattic/simplenote-electron/archive/v2.7.0.tar.gz
You can use the command `tar -xvf` to extract this archive
When you go into the `simplenote-electron-2.7.0` folder, you can see a `Makefile` this means that you can use the `make` command to build Simplenote from those files. `make` Is a tool that read recipes (`Makefiles`) to build programs.
When you go into the `simplenote-electron-2.7.0` folder, you can see a `Makefile`. This means that you can use the `make` command to build Simplenote from those files. `make` is a tool that read recipes (`Makefiles`) to build programs.
You can try to install `node` and `npx` with `apt`. What happened ?
You can try to install `node` and `npx` with `apt`. What happened?
<details><summary>Solution</summary>
<p>
......@@ -220,7 +193,7 @@ sudo apt install libnss3
</p>
</details>
What now ? Installing dependencies manually is an iterative process...
What now? Installing dependencies manually is an iterative process...
<details><summary>Solution</summary>
<p>
......@@ -230,9 +203,9 @@ sudo apt install libnss3 libatk1.0-dev libatk-bridge2.0-0 libgdk-pixbuf2.0-0 lib
</p>
</details>
Yay we should have every lib !
Yay we should have every lib!
What now ? A nodejs dependency is missing... After, some query on the internet we can find the solution...
What now? A nodejs dependency is missing... After, some query on the internet we can find the solution...
<details><summary>Solution</summary>
<p>
......@@ -243,9 +216,9 @@ npm install --save-dev electron-window-state
</p>
</details>
And now you understand why program packaging takes time in a project, and why its important !
And now you understand why program packaging takes time in a project, and why it's important!
You can finalize the installation with the command `make install`. Usually the command to build a tool is available in the `README.md` file of the project.
You can finalize the installation with the command `make install`. Usually, the command to build a tool is available in the `README.md` file of the project.
Read the `README` file of the [fastp](https://github.com/OpenGene/fastp) program to see which methods of installation are available.
......@@ -258,5 +231,5 @@ Read the `README` file of the [fastp](https://github.com/OpenGene/fastp) program
> - make to build programs from sources
Installing programs and maintain different versions of a program on the same system, is a difficult task. In the next session, we will learn how to use [virtualization](./12_virtualization.html) to facilitate our job.
Installing programs and maintain different versions of a program on the same system is a difficult task. In the next session, we will learn how to use [virtualization](./12_virtualization.html) to facilitate our job.
---
title: Virtualization
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Objective: Learn how to build virtual images or container of a system
......@@ -39,28 +14,28 @@ If a computer can run any programs, it can also run a program simulating another
You can save the state of the whole **guest** system using a **snapshot**. The **snapshots** can then be executed on any other **hypervisor**. This as several benefits:
- If the **host** has a hardware failure, the **snapshots** can be executed on another **host** to avoid service interruption
- For scalable system, as many **guest** systems as necessary can be launched adaptively on many **host** systems to handle peak consumption. When the peak is over, we can easily stop the additional **guest** systems.
- For computing science a **snapshot** of a suite of tools allows to you run the same computation as it also captures all the software (and simulated) hardware environment.
- For scalable systems, as many **guest** systems as necessary can be launched adaptively on many **host** systems to handle peak consumption. When the peak is over, we can easily stop the additional **guest** systems.
- For computing sciences, a **snapshot** of a suite of tools allows to you run the same computation as it also captures all the software (and simulated) hardware environment.
To avoid the overhead of simulating every component of the **guest** system, which means that the **hypervisor** programs must run code that simulates a given hardware and code that simulate the **guest** programs running on this hardware, some part of the **host** system can be shared (with control) with the **guest** system.
There are different levels of virtualisation which correspond to different levels of isolation between the virtual machine (**guest**) and the real computer (**host**).
# Full virtualization
## Full virtualization
A key challenge for full virtualization is the interception and simulation of privileged operations, such as I/O instructions. The effects of every operation performed within a given virtual machine must be kept within that virtual machine virtual operation cannot be allowed to alter the state of any other virtual machine, the control program, or the hardware. Some machine instructions can be executed directly by the hardware, since their effects are entirely contained within the elements managed by the control program, such as memory locations and arithmetic registers. But other instructions that would "pierce the virtual machine" cannot be allowed to execute directly; they must instead be trapped and simulated. Such instructions either access or affect state information that is outside the virtual machine.
A key challenge for full virtualization is the interception and simulation of privileged operations, such as I/O instructions. The effects of every operation performed within a given virtual machine must be kept within that virtual machine; virtual operations cannot be allowed to alter the state of any other virtual machine, the control program, or the hardware. Some machine instructions can be executed directly by the hardware, since their effects are entirely contained within the elements managed by the control program, such as memory locations and arithmetic registers. But other instructions that would "pierce the virtual machine" cannot be allowed to execute directly; they must instead be trapped and simulated. Such instructions either access or affect state information that is outside the virtual machine.
# Paravirtualization
## Paravirtualization
In paravitualization, the virtual hardware of the **guest** system is similar to the hardware of the **host**. The goal is to reduce the portion of the **guest** execution time spent to simulate hardware which is the same as the **host** hardware. The paravirtualization provides specially defined **hooks** to allow the **guest** and **host** to request and acknowledge these tasks, which would otherwise be executed in the virtual domain (where execution performance is worse).
A hypervisor provides the virtualization of the underlying computer system. In [full virtualization](https://en.wikipedia.org/wiki/Full_virtualization), a guest operating system runs unmodified on a hypervisor. However, improved performance and efficiency is achieved by having the guest operating system communicate with the hypervisor. By allowing the guest operating system to indicate its intent to the hypervisor, each can cooperate to obtain better performance when running in a virtual machine. This type of communication is referred to as paravirtualization.
A hypervisor provides the virtualization of the underlying computer system. In [full virtualization](https://en.wikipedia.org/wiki/Full_virtualization), a guest operating system runs unmodified on a hypervisor. However, improved performance and efficiency is achieved by having the guest operating system communicate with the hypervisor. By allowing the guest operating system to indicate its intent to the hypervisor, each can cooperate to obtain better performance when running in a virtual machine. This type of communication is referred to as paravirtualization.
# OS-level virtualization
## OS-level virtualization
**OS-level virtualization** is an [operating system](https://en.wikipedia.org/wiki/Operating_system) paradigm in which the [kernel](https://en.wikipedia.org/wiki/Kernel_(computer_science)) allows the existence of multiple isolated [user space](https://en.wikipedia.org/wiki/User_space) instances. Such instances, called **containers** may look like real computers from the point of view of programs running in them. Programs running inside a container can only see the container's contents and devices assigned to the container.
# VirtualBox
## VirtualBox
VirtualBox is own by oracle, you can add the following repository to get the last version:
......@@ -79,13 +54,13 @@ sudo apt install virtualbox
sudo usermod -G vboxusers -a $USER
```
The first things that we need to do with virtual box is to create a new virtual machine. We want to install Ubuntu 20.04 on it.
The first thing that we need to do with a virtual box is to create a new virtual machine. We want to install Ubuntu 20.04 on it.
```sh
VBoxManage createvm --name Ubuntu20.04 --register
```
We the create a virtual hard disk for this VM:
We create a virtual hard disk for this VM:
```sh
VBoxManage createhd --filename Ubuntu20.04 --size 14242
......@@ -103,20 +78,20 @@ We set the virtual RAM
VBoxManage modifyvm Ubuntu20.04 --memory 1024
```
We add an vitual IDE periferic storage on which we can boot on
We add a virtual IDE storage device on which we can boot on.
```sh
VBoxManage storagectl Ubuntu20.04 --name IDE --add ide --controller PIIX4 --bootable on
```
And add an ubuntu image to this IDE device
And add an Ubuntu image to this IDE device
```sh
wget https://releases.ubuntu.com/20.10/ubuntu-20.10-live-server-amd64.iso
VBoxManage storageattach Ubuntu20.04 --storagectl IDE --port 0 --device 0 --type dvddrive --medium "/home/etudiant/ubuntu-20.10-live-server-amd64.iso"
```
Add a network interface
Then, add a network interface
```sh
VBoxManage modifyvm Ubuntu20.04 --nic1 nat --nictype1 82540EM --cableconnected1 on
......@@ -128,9 +103,10 @@ And then start the VM to launch the `ubuntu-20.10-live-server-amd64.iso` install
VBoxManage startvm Ubuntu20.04
```
Why did this last command fail ? Which kind of virtualisation VirtualBox is using ?
Why did this last command fail? Which kind of virtualisation VirtualBox is using?
# Docker
## Docker
Docker is an **OS-level virtualization** system where the virtualization is managed by the `docker` daemon.
......@@ -138,7 +114,7 @@ You can use the `systemctl` command and the `/` key to search for this daemon.
Like VirtualBox, you can install system programs within a container.
Prebuilt container can be found on different sources like [the docker hub](https://hub.docker.com/) or [the biocontainers registry](https://biocontainers.pro/registry).
Prebuilt containers can be found on different sources like [the docker hub](https://hub.docker.com/) or [the biocontainers registry](https://biocontainers.pro/registry).
Launching a container
......@@ -146,7 +122,7 @@ Launching a container
docker run -it alpine:latest
```
You can check your user name
You can check your username
<details><summary>Solution</summary>
<p>
......@@ -162,7 +138,7 @@ Launching a background container
docker run -d -p 8787:8787 -e PASSWORD=yourpasswordhere rocker/rstudio:3.2.0
```
You can check the running container with :
You can check the running container with:
```sh
docker ps
......@@ -192,19 +168,19 @@ Deleting a container image
docker rmi rocker/rstudio:3.2.0
```
Try to run the `mcr.microsoft.com/windows/servercore:ltsc2019` container, what is happening ?
Try to run the `mcr.microsoft.com/windows/servercore:ltsc2019` container. What is happening?
## Building your own container
### Building your own container
You can also create your own container by writing a container recipe. For Docker this file is named `Dockerfile`
You can also create your own container by writing a container recipe. For Docker this file is named `Dockerfile`.
The first line of such recipe is a `FROM` statement. You don't start from scratch like in VirtualBox, but from a bare distribution:
The first line of a `Dockerfile` contains a `FROM` statement. You don't start from scratch like in VirtualBox, but from a bare distribution:
```dockerfile
FROM ubuntu:20.04
```
From this point you can add instructions
From this point, you can add instructions
`COPY` will copy files from the `Dockerfile` directory to a path inside the container
......@@ -212,7 +188,7 @@ From this point you can add instructions
COPY .bashrc /
```
`RUN`will execute command inside the container
`RUN`will execute commands inside the container
```dockerfile
RUN apt updatge && apt install -y htop
......@@ -226,7 +202,7 @@ docker build ./ -t 'ubuntu_with_htop'
# Singularity
## Singularity
Like Docker, Singularity is an **OS-level virtualization**. This main difference with docker is that the user is the same within and outside a container. Singularity is available on the [neuro.debian.net](http://neuro.debian.net/install_pkg.html?p=singularity-container) repository, you can add this source with the following commands:
......@@ -243,7 +219,7 @@ Launching a container
singularity run docker://alpine:latest
```
You can check your user name
You can check your username
<details><summary>Solution</summary>
<p>
......@@ -259,4 +235,3 @@ Executing a command within a container
```sh
singularity exec docker://alpine:latest apk
```
---
title: Understanding a computer
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Objective: understand the relations between the different components of a computer
# Which parts are necessary to define a computer ?
## Which parts are necessary to define a computer?
# Computer components
## Computer components
## CPU (Central Processing Unit)
### CPU (Central Processing Unit)
![CPU](./img/amd-ryzen-7-1700-cpu-inhand1-2-1500x1000.jpg){width=100%}
## Memory
### Memory
### RAM (Random Access Memory)
#### RAM (Random Access Memory)
![RAM](./img/ram.png){width=100%}
### HDD (Hard Disk Drive) / SSD (Solid-State Drive)
#### HDD (Hard Disk Drive) / SSD (Solid-State Drive)
![HDD](./img/SSD.jpeg){width=100%}
![SSD](./img/hdd.png){width=100%}
## Motherboard
### Motherboard
![motherboard](./img/motherboard.jpg){width=100%}
## GPU (Graphical Processing Unit)
### GPU (Graphical Processing Unit)
![GPU](./img/foundation-100046736-orig.jpg){width=100%}
## Alimentation
### Alimentation
![Alim](./img/LD0003357907_2.jpg){width=100%}
---
# Computer model: universal Turing machine
## Computer model: universal Turing machine
![width:20% height:20%](./img/lego_turing_machine.jpg){width=100%}
---
# As simple as a Turing machine ?
## As simple as a Turing machine?
![universal_truing_machine](./img/universal_truing_machine.png){width=100%}
![Universal Turing Machine](./img/universal_truing_machine.png){width=100%}
- A tape divided into cells, one next to the other. Each cell contains a symbol from some finite alphabet.
- A head that can read and write symbols on the tape and move the tape left and right one (and only one) cell at a time.
......@@ -86,17 +59,17 @@ Objective: understand the relations between the different components of a comput
---
# Basic Input Output System (BIOS)
## Basic Input Output System (BIOS)
> Used to perform hardware initialization during the booting process (power-on startup), and to provide runtime services for operating systems and programs.
- comes pre-installed on a personal computer's system board
- the first software to run when powered on
- in modern PCs initializes and tests the system hardware components, and loads a boot loader from a mass memory device
- Comes pre-installed on a personal computer's system board
- The first software to run when powered on
- In modern PCs, it initializes and tests the system hardware components, and loads a boot loader from a mass memory device
---
# Unified Extensible Firmware Interface (UEFI)
## Unified Extensible Firmware Interface (UEFI)
Advantages:
......@@ -113,24 +86,24 @@ Disadvantages:
---
# Operating System (OS)
## Operating System (OS)
> A system software that manages computer hardware, software resources, and provides common services for computer programs.
- The first thing loaded by the BIOS/UEFI
- The first thing on the tape of a Turing machine
## Kernel
### Kernel
> The kernel provides the most basic level of control over all of the computer's hardware devices. It manages memory access for programs in the RAM, it determines which programs get access to which hardware resources, it sets up or resets the CPU's operating states for optimal operation at all times, and it organizes the data for long-term non-volatile storage with file systems on such media as disks, tapes, flash memory, etc.
> The kernel provides the most basic level of control over all of the computer's hardware devices. It manages memory access for programs in the RAM, determines which programs get access to which hardware resources, sets up or resets the CPU's operating states for optimal operation at all times, and it organizes the data for long-term non-volatile storage with file systems on such media as disks, tapes, flash memory, etc.
[Kernel](./img/220px-Kernel_Layout.svg.png){width=100%}
![Kernel](./img/220px-Kernel_Layout.svg.png){width=100%}
---
# UNIX
## UNIX
> Unix is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix,
> Unix is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix.
[![Unix history](./img/1920px-Unix_timeline.en.svg.png){width=100%}](https://upload.wikimedia.org/wikipedia/commons/b/b5/Linux_Distribution_Timeline_21_10_2021.svg)
......@@ -140,18 +113,18 @@ The ones you are likely to encounter:
- [BSD (Berkeley Software Distribution) variant](https://www.freebsd.org/)
- [GNU/Linux](https://www.kernel.org/)
The philosophy of UNIX is to have a large number of small software which do few things but to them well.
The philosophy of UNIX is to have a large number of small software which do a few things but do them well.
# GNU/Linux
## GNU/Linux
Linux is the name of the kernel which software, to get a full OS, Linux is part of the [GNU Project](https://www.gnu.org/).
Linux is the name of the kernel which, combined with GNU, gives a full OS. Linux is part of the [GNU Project](https://www.gnu.org/).
The GNU with Richard Stallman introduced the notion of Free Software:
The GNU project, with Richard Stallman, introduced the notion of Free Software:
1. The freedom to run the program as you wish, for any purpose.
2. The freedom to study how the program works, and change it so it does your computing as you wish. Access to the source code is a precondition for this.
3. The freedom to redistribute copies so you can help others.
4. The freedom to distribute copies of your modified versions to others. By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
4. The freedom to distribute copies of your modified versions to others. By doing this, you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
You can find a [list of software licenses](https://www.gnu.org/licenses/license-list.html)
......@@ -160,5 +133,7 @@ You can find a [list of software licenses](https://www.gnu.org/licenses/license-
Your browser does not support the video tag.
</video>
See this [presentation](https://plmlab.math.cnrs.fr/gdurif/presentation_foss/-/blob/main/presentation/presentation_DURIF_foss.pdf) (in french) for a quick introduction about **software licenses** and **free/open source software**.
[Instead of installing GNU/Linux on your computer, you are going to learn to use the IFB Cloud.](./2_using_the_ifb_cloud.html)
---
title: IFB (Institut Français de bio-informatique) Cloud
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
Objective: Start and connect to an appliance on the IFB cloud
Instead of working on your computer where you don't have an Unix-like OS or have limited right, we are going to use the [IFB (Institut Français de bio-informatique) Cloud]( https://biosphere.france-bioinformatique.fr/).
Instead of working on your computer where you don't have a Unix-like OS or have limited rights, we are going to use the [IFB (Institut Français de bio-informatique) Cloud]( https://biosphere.france-bioinformatique.fr/).
# Creating an IFB account
## Creating an IFB account
1. Access the [**https://biosphere.france-bioinformatique.fr/**](https://biosphere.france-bioinformatique.fr/) website
2. On the top right (First) steps with GNU/Linux
Instead of working on your computer where you don't have an Unix-like OS or have limited right, we are going to use the [IFB (Institut Français de bio-informatique) Cloud]( https://biosphere.france-bioinformatique.fr/). For this you will need:
1. Access the [**https://biosphere.france-bioinformatique.fr/**](https://biosphere.france-bioinformatique.fr/) website
2. On the top right of the screen click on <img src="./img/signin_ifb.png" alt="sign in" style="zoom:150%;" />
3. Then click on ![login](./img/login_ifb.png)
4. Use the **Incremental search field** to select your identity provider (CNRS / ENS de Lyon / etc.)
5. Login
6. Complete the form with your **Name**, **First Name**, **Town** and **Zip Code**. You can ignore the other field and click on **accept**.
6. Complete the form with your **Name**, **First Name**, **Town** and **Zip Code**. You can ignore the other fields and click on **accept**.
7. Go to your **Groups** parameters on the top right ![group_selection_ifb](./img/group_selection_ifb.png)
8. Click on ![join_a_group](./img/join_a_group.png) and type **LBMC Unix 2020**
8. Click on ![join_a_group](./img/join_a_group.png) and type **CAN UNIX 2023**
9. You can click on the **+** sign to register and wait to be accepted in the group
# Starting the LBMC Unix 2020 appliance
## Starting the LBMC Unix 2022 appliance
To follow this practical you will need to start the **[LBMC Unix 2020](https://biosphere.france-bioinformatique.fr/catalogue/appliance/177/)** appliance from the [IFB Cloud](https://biosphere.france-bioinformatique.fr/) and click on the ![start](./img/start_VM.png) button after login with your account.
To follow this practical you will need to start the **[LBMC Unix 2022](https://biosphere.france-bioinformatique.fr/catalogue/appliance/177/)** appliance from the [IFB Cloud](https://biosphere.france-bioinformatique.fr/) and click on the ![start](./img/start_VM.png) button after login with your account.
In the IFB jargon, appliance means **virtual machine** (VM). Remember how a universal Turing machine can run any programs ? A virtual machine, is a simulation program, simulating a physical computer. VM's have the following advantages:
- Copies of the VM will be identical (there will be no differences between your running *LBMC Unix 2020 appliance* and mine )
- Upon starting the VM is reset to the *LBMC Unix 2020 appliance* state
- Copies of the VM will be identical (there will be no differences between your running *LBMC Unix 2022 appliance* and mine )
- Upon starting the VM is reset to the *LBMC Unix 2022 appliance* state
- You can break everything in your VM, terminate it and start a new one.
To access to your appliance you can go to the [**myVM** tab](https://biosphere.france-bioinformatique.fr/cloud/)
......@@ -82,13 +47,13 @@ You will need to start this appliance at the start of each session of this cours
The ![hourglass](./img/wait_my_appliances_ifb.png) symbol indicates that your appliance is starting.
# Accessing the LBMC Unix 2020
## Accessing the LBMC Unix 2022
You can open the **https** link next to the termination button of your appliance in a new tab. You will have the following message
You can open the **https** link next to the termination button of your appliance in a new tab. You will have the following message.
![ssl warning](./img/ssl_warning.png)
This means that the https connection is encrypted with a certificate unknown to your browser. As this certificate is going to be destroyed when you terminate your appliance, we don't want to pay a certification authority to validate it. Therefore you can safely add an exception for this certificate.
This means that the https connection is encrypted with a certificate unknown to your browser. As this certificate is going to be destroyed when you terminate your appliance, we don't want to pay a certification authority to validate it. Therefore, you can safely add an exception for this certificate.
![ssl exception](./img/ssl_exception.png)
......@@ -114,8 +79,8 @@ To copy / paste your password, you will need to perform a right-click and select
![past from browser](./img/shellinabox_past_from_browser.png)
Then paste your password in the dialog box.
Then, paste your password in the dialog box.
Don't worry the password will not be displayed (not even in the form of `*****`, so someone looking at your screen will not be able to guess it's length), you can press **enter** to log on your VM.
Don't worry the password will not be displayed (not even as `*****`, so someone looking at your screen cannot guess it's length), you can press **enter** to log on your VM.
[First steps in a terminal.](./3_first_steps_in_a_terminal.html)
---
title: First step in a terminal
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Objective: learn to use basic terminal commands
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
Objective: learn to use basic terminal command
Congratulations, you are now connected to your VM!
Congratulations you are now connected on your VM !
The first thing that you can see is a welcome message (yes GNU/Linux users are polite and friendly), and information on your distribution.
The first thing that you can see is a welcome message (yes, GNU/Linux users are polite and friendly), and information on your distribution.
> A **Linux distribution** (often abbreviated as **distro**) is an [operating system](https://en.wikipedia.org/wiki/Operating_system) made from a software collection that is based upon the [Linux kernel](https://en.wikipedia.org/wiki/Linux_kernel)
What is the distribution installed on your VM ?
What is the distribution installed on your VM?
You can go to this distribution website and have a look at the list of firms using it.
# Shell
## Shell
A command-line interpreter (or shell), is a software designed to read lines of text entered by a user to interact with an OS.
A command-line interpreter (or shell) is a software designed to read lines of text entered by a user to interact with an OS.
To simplify the shell executes the following infinite loop:
1. read a line
2. translate this line as a program execution with its parameters
3. launch the corresponding program with the parameters
3. wait for the program to finish
1. Read a line
2. Translate this line as a program execution with its parameters
3. Launch the corresponding program with the parameters
3. Wait for the program to finish
4. Go back to 1.
When you open a terminal on an Unix-like OS, you will have a **prompt** displayed: it can end with a **$** or a **%** character depending on your configuration. As long as you see your prompt, it means that you are in step **1.**, if no prompt is visible, you are in step **4.** or you have set up a very minimalist configuration for your shell.
When you open a terminal on a Unix-like OS, you will have a **prompt** displayed: it can end with a `$` or a `%` character depending on your configuration. As long as you see your prompt, it means that you are in step **1.**, if no prompt is visible, you are in step **4.** or you have set up a very minimalist configuration for your shell.
<img src="./img/prompt.png" alt="prompt" style="zoom:150%;" />
The blinking square or vertical bar represents your **cursor**. Shell predates graphical interfaces, so most of the time you wont be able to move this cursor with your mouse, but with the directional arrows (left and right).
The blinking square or vertical bar represents your **cursor**. Shell predates graphical interfaces, so most of the time you won't be able to move this cursor with your mouse, but with the **directional arrows** (left and right).
On the IFB, your prompt is a **$**
On the IFB, your prompt is a `$`:
```sh
etudiant@VM:~$
```
You can identify the following information from your prompt: **etudiant** is your login and **VM** is the name of your VM (**~** is where you are on the computer, but we will come back to that later).
You can identify the following information from your prompt: **etudiant** is your login and **VM** is the name of your VM (`~` is where you are on the computer, i.e. the current working directory, but we will come back to that later).
On Ubuntu 20.04, the default shell is [Bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) while on recent version of macOS its [zsh](https://en.wikipedia.org/wiki/Z_shell). There are [many different shell](https://en.wikipedia.org/wiki/List_of_command-line_interpreters), for example, Ubuntu 20.04 also has [sh](https://en.wikipedia.org/wiki/Bourne_shell) installed.
On Ubuntu 20.04, the default shell is [Bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) while on recent version of macOS it's [zsh](https://en.wikipedia.org/wiki/Z_shell). There are [many different shell](https://en.wikipedia.org/wiki/List_of_command-line_interpreters), for example, Ubuntu 20.04 also has [sh](https://en.wikipedia.org/wiki/Bourne_shell) installed.
# Launching Programs
## Launching Programs
You can launch every program present on your computer from the shell. The syntax will always be the following:
......@@ -84,9 +55,9 @@ You can launch every program present on your computer from the shell. The syntax
etudiant@VM:~$ program_name option_a option_b option_c [...] option_n
```
And pressing **enter** to execute your command.
And pressing the **Enter** key to execute your command.
For example, we can launch the `cal` software by typing the following command and pressing **enter**:
For example, we can launch the `cal` software by typing the following command and pressing **Enter**:
```sh
cal
......@@ -99,24 +70,24 @@ When you launch a command, various things can happen:
- Files can we read or written
- etc.
We can pass argument to the `cal` software the following way.
We can pass an argument to the `cal` software in the following way:
```sh
cal -3
```
What is the effect of the `-3` parameter ?
What is the effect of the `-3` parameter?
You can add as many parameters as you want to your command, try `-3 -1` what is the meaning of the `-1` parameter ?
You can add as many parameters as you want to your command. Try `-3 -1`. What is the meaning of the `-1` parameter?
The `-d` option display the month of a given date in a `yyyy-mm` format. Try to display your month of birth.
The `-d` option displays the month of a date given in a `yyyy-mm` format. Try to display your month of birth.
Traditionally, parameters are *named* which means that they are in the form of:
Traditionally, parameters are *named* in one of two ways:
* `-X` for an on/off option (like `cal -3`)
* `-X something` for an input option (like `cal -d yyyy-mm`)
Here the name of the parameter is `X`, but software can also accept list of unnamed parameters. Try the following:
Here, the name of the parameter is `X`, but software can also accept a list of unnamed parameters. Try the following:
```sh
cal 2
......@@ -124,11 +95,11 @@ cal 1999
cal 2 1999
```
What is the difference for the parameter value `2` in the first and third command ?
What is the difference for the parameter value `2` in the first and third commands?
# Moving around
## Moving around
For the `cal` program, the position in the file system is not important (it’s not going to change the calendar). However, for most tools that are able to read or write files, its important to know where you are. This is the first real difficulty with command line interface: you need to remember where you are.
For the `cal` program, the position in the file system is not important (it will not change the calendar). However, for most tools that can read or write files, it's important to know where you are. This is the first real difficulty with the command line interface: **you need to remember where you are**.
If you are lost, you can **p**rint your **w**orking **d**irectory (i.e., where you are now, working) with the command.
......@@ -136,63 +107,61 @@ If you are lost, you can **p**rint your **w**orking **d**irectory (i.e., where y
pwd
```
Like `cal`, the `pwd` command return textual information
Like `cal`, the `pwd` command returns textual information.
By default when you log on an Unix system, you are in your **HOME** directory. Every user (except one) should have its home directory in the `/home/`folder.
By default when you log on a Unix system, you are in your **HOME** directory. Every user (except one) should have its home directory in the `/home/`folder.
To **c**hange **d**irectory you can type the command `cd`, `cd` take one argument: the path of the directory where you want to go. go to the `/home` directory.
To **c**hange **d**irectory you can type the command `cd`. `cd` takes one argument: the path of the directory where you want to go. To go to the `/home` directory you can use:
```sh
cd /home
```
The `cd` command doesn’t return any textual information, but change the environment of the shell (you can confirm it with ` pwd`) ! You can also see this change in your prompt:
The `cd` command doesn't return any textual information, but changes the environment of the shell (you can confirm it with ` pwd`)! You can also see this change in your prompt:
```sh
`etudiant@VM:/home$`
```
What happens when you type `cd` without any argument ?
What happens when you type `cd` with no arguments?
What is the location shown in your prompt ? Is it coherent with the `pwd` information ? Can you `cd` to the `pwd` path ?
What is the location shown in your prompt? Is it coherent with the `pwd` information? Can you `cd` to the `pwd` path?
When we move around a file system, we often want to see what is in a given folde. We want to **l**i**s**t the directory content. Go back to the `/home` directory and use to the `ls` command see how many have a home directory there.
When we move around a file system, we often want to see what is in a folder. We want to **l**i**s**t its content. Go back to the `/home` directory and use the `ls` command to see how many users have a home directory there.
We will see various options for the `ls` command, throughout this course. Try the `-a` option.
We will see various options for the `ls` command throughout this course. Try the `-a` option.
```sh
ls -a
```
What changed compared to the `ls` command without this option ?
What changed compared to the `ls` command without this option?
Go to your home folder with the bare `cd` command and run the `ls -a` command again. The `-a` option makes the `ls` command list hidden files and folders. On Unix systems, hidden files and folders are all files and folders whose name starts with a "**.**".
Go to your home folder with the bare `cd` command and run the `ls -a` command again. The `-a` option makes the `ls` command list hidden files and folders. On Unix systems, hidden files and folders are all files and folders whose name starts with a `.`.
Can you `cd` to "**.**" ?
Can you `cd` to `.`?
```sh
cd .
```
What happened ?
What happened?
Can you cd to "**..**" ?
Can you cd to `..`?
```sh
cd ..
```
What appended ?
What happened?
Repeat 3 times the previous command (you can use the upper directional arrow to repeat the last command).
Repeat the previous command 3 times (you can use the up arrow key, `↑`, to repeat the last command).
What append ?
What happened?
You can use the `-l` option in combination with the `-a` option to know more about those folders.
> We have seen the commands :
> We have seen the commands:
>
> - `cal` for calendar
> - `cd` for change directory
......
---
title: GNU/Linux file system
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
Objective: Understand how files are organized in Unix
> On a UNIX system, everything is a file ; if something is not a file, it is a process.
> On a UNIX system, everything is a file; if something is not a file, it is a process.
>
> Machtelt Garrels
......@@ -58,19 +29,19 @@ This file system is organized as a tree. As you have seen, every folder has a pa
Every file can be accessed by an **absolute path** starting at the root. Your user home folder can be accessed with the path `/home/etudiant/`. Go to your user home folder.
We can also access file with a **relative path**, using the special folder "**..**". From your home folder, go to the *ubuntu* user home folder without passing by the root (we will see use of the "**.**" folder later).
We can also access files with a **relative path**, using the special folder `..`. From your home folder, go to the *ubuntu* user home folder without passing by the root (we will see use of the `.` folder later).
# File Types
## File Types
As you may have guessed, every file type is not the same. We have already seen that common file and folder are different. Here are the list of file types:
As you may have guessed, every file type is not the same. We have already seen that common file and folder are different. Here is the list of file types:
- **-** common files
- **d** folders
- **l** links
- **b** disk
- **c** special files
- **s** socket
- **p** named pipes
- `-` common files
- `d` folders
- `l` links
- `b` disk
- `c` special files
- `s` socket
- `p` named pipes
To see the file type you can type the command
......@@ -78,128 +49,127 @@ To see the file type you can type the command
ls -la
```
The first column will tell you the type of the file (here we have only the type "**-**" and "**d**" ). We will come back on the other information later. An other less used command to get fine technical information on a file is the command `stat [file_name]`. Can you get the same information as `ls -la` with `stat` ?
The first column will tell you the type of the file (here we have only the types `-` and `d`). We will come back to the other types later. Another less-used command to get fine technical information on a file is the command `stat [file_name]`. Can you get the same information as `ls -la` with `stat`?
# Common Structure
## Common Structure
From the root of the system (**/**), most of the Unix-like distribution will share the same folder arborescence. On macOS, the names will be different because when you sell the most advanced system in the world you need to rename things, with more advanced names.
From the root of the system (`/`), most Unix-like distributions will share the same folder tree structure. On macOS, the names will be different because when you sell the most advanced system in the world you need to rename things, with more advanced names.
## `/home`
### `/home`
You already know this one. You will find all your file and your configuration files here. Which configuration file can you identify in your home ?
You already know this one. You will find all your files and your configuration files here. Which configuration files can you identify in your home?
## `/boot`
### `/boot`
You can find the Linux kernel and the boot manager there. What is the name of your boot manager (process by elimination) ?
You can find the Linux kernel and the boot manager there. What is the name of your boot manager (proceed by elimination)?
You can see a new type of file here, the type "**l**". What it the version of the **vmlinuz** kernel ?
You can see a new type of file here, the type `l`. What it the version of the **vmlinuz** kernel?
## `/root`
### `/root`
The home directory of the super user, also called root (we will go back on him later). Can you check its configuration file ?
The home directory of the superuser, also called root (we will go back to it later). Can you check its configuration file?
## `/bin`, `/sbin`, `/usr/bin` and `/opt`
### `/bin`, `/sbin`, `/usr/bin` and `/opt`
The folder containing the programs used by the system and its users. Programs are simple file readable by a computer, these files are often in **bin**ary format which means that its extremely difficult for a human to read them.
The folder containing the programs used by the system and its users. Programs are simple files readable by a computer. These files are often in **bin**ary format so it's extremely difficult for a human to read them.
What is the difference between **/bin** and **/usr/bin** ?
What is the difference between `/bin` and `/usr/bin`?
**/sbin** stand for system binary. What are the names of the programs to power off and restart your system ?
`/sbin` stands for system binary. What are the names of the programs used to power off and restart your system?
**/opt** is where you will find the installation of non-conventional programs (if you dont follow [the guide of good practice of the LBMC](http://www.ens-lyon.fr/LBMC/intranet/services-communs/pole-bioinformatique/ressources/good_practice_LBMC), you can put your bioinformatics tools with crapy installation procedure there).
`/opt` is where you will find the installation of non-conventional programs (if you don't follow [the guide of good practice of the LBMC](http://www.ens-lyon.fr/LBMC/intranet/services-communs/pole-bioinformatique/ressources/good_practice_LBMC). You can put your bioinformatics tools with crappy installation procedures there).
## `/lib` and `/usr/lib`
### `/lib` and `/usr/lib`
Those folder contains system libraries. Libraries are a collection of pieces of codes usable by programs.
Those folders contain system libraries. Libraries are a collection of pieces of codes usable by programs.
What is the difference between **/lib** and **/usr/lib**.
What is the difference between `/lib` and `/usr/lib`.
Search information on the `/lib/gnupg` library on the net.
Search information about the `/lib/gnupg` library on the net.
## `/etc`
### `/etc`
The place where system configuration file and default configuration file are. What is the name of the default configuration file for `bash` ?
The place where system configuration files and default configuration files are. What is the name of the default configuration file for `bash`?
## `/dev`
### `/dev`
Contains every peripheric
Contains every peripheral devices such as hard disks, disk partitions, and so on.
What is the type of the file `stdout` (you will have to follow the links)?
With the command `ls -l` can you identify files of type "**b**" ?
With the command `ls -l` can you identify files of type `b`?
Using `less` can you visualize the content of the file `urandom` ? What about the file `random` ?
Using `less` can you visualize the content of the file `urandom`? What about the file `random`?
What is the content of `/dev/null`?
## `/var`
### `/var`
Storage space for variables and temporary files, like system logs, locks, or file waiting to be printed...
Storage space for variables and temporary files, like system logs, locks, or files waiting to be printed...
In the file `auth.log` you can see the creation of the `ubuntu` and `etudiant `account. To visualize a file you can use the command
In the file `/var/log/auth.log` you can see the creation of the `ubuntu` and `etudiant `account. To visualize a file you can use the command.
```sh
less [file_path]
```
You can navigate the file with the navigation arrows. Which group the user `ubuntu` belongs to that the user `etudiant`don’t ?
To close the `less` you can press `Q`. Try the opposite of `less`, what are the differences ?
You can navigate the file with the navigation arrows. Which group does the user `ubuntu` belong to that the user `etudiant` doesn't?
What is the type of the file `autofs.fifo-var-autofs-ifb` in the `run` folder ? From **fifo** in the name, can you guess the function of the "**p**" file ?
To close the `less` window you can press `Q`. Try the opposite of `less`. What are the differences?
There are few examples of the last type of file in the `run` folder, in which color the command `ls -l` color them ?
There are few examples of type `p` files in the `run` folder, in which color the command `ls -l` colors them?
## `/tmp`
### `/tmp`
Temporary space. **Erased at each shutdown of the system !**
Temporary space. **Erased at each shutdown of the system!**
## `/proc`
### `/proc`
Information on the system resources. This file system is virtual. What do we mean by that ?
Information about the system resources. This file system is virtual. What do we mean by that?
One of the columns of the command `ls -l` show the size of the files. Try is on the `/etc` folder. You can add the `-h` option to have human readable file size.
One column of the command `ls -l` shows the size of the files. Try is on the `/etc` folder. You can add the `-h` option to get human-readable file sizes.
What are the sizes of the files in the `/proc` folder ?
What are the sizes of the files in the `/proc` folder?
From the `cpuinfo` file get the brand of the cpu simulated by your VM.
From the `cpuinfo` file, get the brand of the CPU simulated by your VM.
From the `meminfo` file retrieve the total size of RAM
From the `meminfo` file, retrieve the total size of RAM.
# Links
## Links
With the command `ls -l` we have seen some links, the command `stat` can give us more information on them
With the command `ls -l` we have seen some links. The command `stat` can give us more information on them
```sh
stat /var/run
```
What is the kind of link for `/var/run` ?
What is the link type of `/var/run`?
Most of the time, when you are going to work with links, you will work with this kind of link. You can create a **l**i**n**k with the command `ln` and the option `-s` for **s**ymbolic.
You will work with this kind of link most of the time. You can create a **l**i**n**k with the command `ln` and the option `-s` for a **s**ymbolic.
The first argument after the option of the `ln` command is the target of the link, the second argument is the link itself:
The first argument after the option of the `ln` command is the target of the link; the second argument is the link itself:
```sh
cd
touch .bash_history
ln -s .bash_history bash_history_slink
ls -la
```
What are the differences between the two following commands ?
What are the differences between the following two commands?
```sh
stat bash_history_slink
stat .bash_history
```
Symbolic links can bridge across, file system, if the target of the link disappears the link will be broken.
Symbolic links can bridge across file system. If the target of the link disappears, the link will be broken.
You can delete a file with the command `rm`
You can delete a file with the command `rm`.
**There is no trash with the command `rm` double-check your command before pressing enter !**
**There is no trash with the command `rm` double-check your command before pressing enter!**
Delete your `.bash_history` file, what happened to the `bash_history_slink` ?
Delete your `.bash_history` file, what happened to the `bash_history_slink`?
The command `ln` without the `-s` option create hard links. Try the following commands:
......@@ -210,15 +180,14 @@ stat .bashrc
ln .bashrc bashrc_linkb
```
Use `stat` to also study `bashrc_linka` and `bashrc_linkb`.
Use `stat` to check both `bashrc_linka` and `bashrc_linkb`.
What happen when you delete `bashrc_linka` ?
What happens when you delete `bashrc_linka`?
To understand the notion of **Inode** we need to know more about storage systems.
# Disk and partition
## Disk and partition
On a computer, the data are physically stored on a media (HDD, SSD, USB key, punch card...)
......@@ -226,15 +195,15 @@ On a computer, the data are physically stored on a media (HDD, SSD, USB key, pun
(Punched cards in storage at a U.S. Federal records center in 1959. All the data visible here would fit on a 4 GB flash drive.)
You cannot dump data directly into the disk, you need to organize things to be able to find them back.
You cannot dump data directly in the disk; you need to organize things to find them back.
![disk](./img/disk.png)
Each media is divided into partition:
Each media is divided into partitions:
![partitions](./img/partition.png)
The media is divided into one or many partition, each of which have a file system type. Examples of file system type are:
The media is divided into one or many partitions, each of which has a file system type. Examples of file system types are:
- fat32, exFAT
- ext3, ext4
......@@ -242,21 +211,21 @@ The media is divided into one or many partition, each of which have a file syste
- NTFS
- ...
The file system handle the physical position of each file on the media. The position of the file in the index of file is called **Inode**.
The file system handles the physical position of each file on the media. The position of a file in the file index is called **Inode**.
The action of attaching a given media to the Unix file system tree, is called mounting a partition or media. To have a complete list of information on what is mounted where, you can use the `mount `command without argument.
The action of attaching a given media to the Unix file system tree is called mounting a partition or media. To have a complete list of information on what is mounted where, you can use the `mount `command without arguments.
```sh
mount
```
Find which disk is mounted at the root of the file tree.
Find out which disk is mounted at the root of the file tree.
> We have seen the commands:
>
> - `stat` to display information on a file
> - `less` to visualise the content of a file
> - `ln` to create link
> - `less` to visualize the content of a file
> - `ln` to create links
> - `mount` to list mount points
[Thats all for the Unix file system, we will come back to it from time to time but for now you can head to the next section.](./5_users_and_rights.html)
[That's all for the Unix file system. We will come back to it from time to time, but for now you can head to the next section.](./5_users_and_rights.html)
---
title: Users and rights
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Objective: Understand how rights works in GNU/Linux
GNU/Linux and other Unix-like OS are multiuser, this means that they are designed to work with multiple users connected simultaneously to the same computer.
GNU/Linux and other Unix-like OS are multi-user, they are designed to work with multiple users connected simultaneously to the same computer.
There is always at least one user: the **root** user
- Its the super user
- It's the super user
- he has every right (we can say that he ignores the rights system)
- this account should only be used to administer the system.
- this account should only be used to administrate the system.
There can also be other users who
......@@ -48,7 +23,7 @@ There can also be other users who
- belong to groups
- the groups also have rights
# File rights
## File rights
Each file is associated with a set of rights:
......@@ -63,7 +38,7 @@ Check your set of rights on your `.bashrc` file
ls -l ~/.bashrc
```
The first column of the `ls -l` output show the status of the rights on the file
The first column of the `ls -l` output shows the status of the rights on the file.
![user_rights](./img/user_right.png)
......@@ -80,8 +55,8 @@ The first column of the `ls -l` output show the status of the rights on the file
- the 1st character is the type of the file (we already know this one)
- he 3 following characters (2 to 4) are the **user** rights on the file
- the characters 5 to 7 are the **group** rights on the file
- the characters 8 to 10 are the **others** rights on the file (anyone not the **user** nor in the **group**)
- the characters 5 to 7 are the **group** rights on the file
- the characters 8 to 10 are the **others'** rights on the file (anyone not the **user** nor in the **group**)
To change the file rights you can use the command `chmod`
......@@ -103,7 +78,7 @@ chmod o+r .bashrc
chmod u-x,g-w,o= .bashrc
```
What can you conclude on the symbols `+` , `=`, `-` and `,` with the `chmod` command ?
What can you conclude on the symbols `+`, `=`, `-` and `,` with the `chmod` command?
> ### Numeric notation
>
......@@ -121,15 +96,15 @@ What can you conclude on the symbols `+` , `=`, `-` and `,` with the `chmod` com
> | `-r--r--r--` | 0444 | read |
> | `-r-xr-xr-x` | 0555 | read & execute |
> | `-rw-rw-rw-` | 0666 | read & write |
> | `-rwxr-----` | 0740 | owner can read, write, & execute; group can only read; others have no permissions |
> | `-rwxr-----` | 0740 | owner can read, write, & execute; group can only read; others have no permission |
The default group of your user is the first in the list of the groups you belong to. You can use the command `groups` to display this list. What is your default group ?
The default group of your user is the first in the list of the groups you belong to. You can use the command `groups` to display this list. What is your default group?
The command `id` show the same information, but with some differences what are they ?
The command `id` shows the same information, but with some differences, what are they?
Can you cross this additional information with the content of the file `/etc/passwd` and `/etc/group` ?
Can you cross this additional information with the content of the file `/etc/passwd` and `/etc/group`?
What is the user *id* of **root** ?
What is the user *id* of **root**?
When you create an empty file, system default rights and your default groups are used. You can use the command `touch` to create a file.
......@@ -137,7 +112,7 @@ When you create an empty file, system default rights and your default groups are
touch my_first_file.txt
```
What are the default rights when you crate a file ?
What are the default rights when you create a file?
You can create folders with the command `mkdir` (**m**a**k**e **dir**ectories).
......@@ -145,23 +120,23 @@ You can create folders with the command `mkdir` (**m**a**k**e **dir**ectories).
mkdir my_first_dir
```
What are the default rights when you create a directory ? Try to remove the execution rights, what appends then ?
What are the default rights when you create a directory? Try to remove the execution rights. What happens then?
You can see the **/root** home directory. Can you see its content ? Why ?
You can see the **/root** home directory. Can you see its content? Why?
Create a symbolic link (`ln -s`) to your **.bashrc** file, what are the default rights to symbolic links ?
Create a symbolic link (`ln -s`) to your **.bashrc** file. What are the default rights to symbolic links?
Can you remove the writing right of this link ? What happened ?
Can you remove the writing right of this link? What happened?
# Users and Groups
## Users and Groups
We have seen how to change the right associated with the group, but what about changing the group itself ? The command `chgrp` allows you to do just that:
We have seen how to change the right associated with the group, but what about changing the group itself? The command `chgrp` allows you to do just that:
```sh
chgrp audio .bashrc
```
Now the next step is to change the owner of a file, you can use the command `chown` for that.
Now the next step is to change the owner of a file. You can use the command `chown` for that.
```sh
chown ubuntu my_first_file.txt
......@@ -173,9 +148,9 @@ You can change the user and the group with this command:
chown ubuntu:audio my_first_file.txt
```
What are the rights on the program `mkdir` (the command `which` can help you find where program file are) ?
What are the rights on the program `mkdir` (the command `which` can help you find the program path)?
Can you remove the execution rights for the others ?
Can you remove the execution rights for the others?
The command `cp` allows you to **c**o**p**y file from one destination to another.
......@@ -183,13 +158,13 @@ The command `cp` allows you to **c**o**p**y file from one destination to another
man cp
```
Copy the `mkdir` tool to your home directory. Can you remove execution rights for the others on your copy of `mkdir` ? Can you read the contentof the `mkdir` file ?
Copy the `mkdir` tool to your home directory. Can you remove execution rights for the others on your copy of `mkdir`? Can you read the contentof the `mkdir` file?
You cannot change the owner of a file, but you can always allow another user to copy it and change the rights on its copy.
# Getting admin access
## Getting admin access
Currently you dont have administrative access to your VM, this means that you dont have the password to the *root* account. Another way to get administrative access in Linux is to use the `sudo` command.
Currently, you don't have administrative access to your VM, this means that you don't have the password to the *root* account. Another way to get administrative access in Linux is to use the `sudo` command.
You can read the documentation (manual) of the `sudo` command with the command `man`
......@@ -201,11 +176,11 @@ Like for the command, `less` you can close `man` by pressing **Q**.
![sandwich](https://imgs.xkcd.com/comics/sandwich.png)
On Ubuntu, only members of the group **sudo** can use the `sudo` command. Are you in this group ?
On Ubuntu, only members of the group **sudo** can use the `sudo` command. Are you in this group?
**The root user can do everything in your VM, for example it can delete everything from the `/` directory but its not a good idea (see the [Peter Parker principle](https://en.wikipedia.org/wiki/With_great_power_comes_great_responsibility))**
**The root user can do everything in your VM, for example, it can delete everything from the `/` directory but it's not a good idea (see the [Peter Parker principle](https://en.wikipedia.org/wiki/With_great_power_comes_great_responsibility))**
One advantage of using a command line interface is that you can easily reuse command written by others. Copy and paste the following command in your terminal to add yourself in the **sudo** group.
One advantage of using a command-line interface is that you can easily reuse command written by others. Copy and paste the following command in your terminal to add yourself to the **sudo** group.
```sh
docker run -it --volume /:/root/chroot alpine sh -c "chroot /root/chroot /bin/bash -c 'usermod -a -G sudo etudiant'"
......@@ -223,9 +198,9 @@ sudo id
You can try again the `chown` command with the `sudo` command.
Check the content of the file `/etc/shadow` , what is the utility of this file (you can get help from the `man` command).
Check the content of the file `/etc/shadow`, what is the utility of this file (you can get help from the `man` command).
# Creating Users
## Creating Users
You can add a new user to your system with the command `useradd`
......@@ -233,14 +208,14 @@ You can add a new user to your system with the command `useradd`
useradd -m -s /bin/bash -g users -G adm,docker student
```
- `-m` create a hone directory
- `-m` create a home directory
- `-s` specify the shell to use
- `-g` the default group
- `-G` the additional groups
To log into another account you can use the command `su`
To log into another account, you can use the command `su`.
What is the difference between the two following command ?
What is the difference between the two following commands?
```sh
su student
......@@ -250,9 +225,9 @@ su student
sudo su student
```
What append when you don't specify a login with the `su` command ?
What append when you don't specify a login with the `su` command?
# Creating groups
## Creating groups
You can add new groups to your system with the command `groupadd`
......@@ -260,7 +235,7 @@ You can add new groups to your system with the command `groupadd`
sudo groupadd dummy
```
Then you can add users to these group with the command `usermod`
Then you can add users to this group with the command `usermod`
```sh
sudo usermod -a -G dummy student
......@@ -272,7 +247,7 @@ And check the result:
groups student
```
To remove an user from a group you can rewrite it's list of group with the command `usermod`
To remove a user from a group, you can rewrite its list of groups with the command `usermod`
```sh
sudo usermod -G student student
......@@ -280,11 +255,11 @@ sudo usermod -G student student
Check the results.
# Security-Enhanced Linux
## Security-Enhanced Linux
While what you have seen in this section hold true for every Unix system, additionnal rules can be applied to control the rights in Linux. This is what is called [SE Linux](https://en.wikipedia.org/wiki/Security-Enhanced_Linux) (**s**ecurity-**e**nhanced **Linux**)
While what you have seen in this section hold true for every Unix system, we can apply additional rules to control the rights in Linux. This is what is called [SE Linux](https://en.wikipedia.org/wiki/Security-Enhanced_Linux) (**s**ecurity-**e**nhanced **Linux**).
When SE Linux is enabled on a system, every **processes** can be assigned a set of right. This is how, on Android for example, some programs can access your GPS while other cannot etc. In this case it's not the user rights that prevail, but the **process** launched by the user.
When SE Linux is enabled on a system, every **process** can be assigned a set of rights. This is how, on Android for example, some programs can access your GPS while others cannot, etc. In this case, it's not the user rights that prevail, but the **process** launched by the user.
> We have seen the commands:
>
......@@ -298,6 +273,6 @@ When SE Linux is enabled on a system, every **processes** can be assigned a set
> - `sudo` to borrow **root** rights
> - `groupadd` to create groups
> - `groups` to list groups
> - `usermod`to manipulate user's to groups
> - `usermod`to manipulate users' groups
[To understand more about processes you can head to the next section.](./6_unix_processes.html)
[To understand more about processes, you can head to the next section.](./6_unix_processes.html)
---
title: Unix Processes
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
# Unix Processes
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
Objective: Understand how process works in GNU/Linux
A program is a list of instructions to be executed on a computer. These instructions are written in one or many files in a format readable by the computer (binary files), or interpretable by the computer (text file). The interpretable format needs to be processed by an interpreter who is in binary format.
The execution of a program on the system is represented as one or many processes. The program is the file of instruction while the process is the instructions being read.
`mkdir` is the program, when you type `mkdir my_folder`, you launch an `mkdir` process.
Your shell is a process to manipulate other processes.
> In multitasking operating systems, processes (running programs) need a way to create new processes, e.g. to run other programs. [Fork](https://en.wikipedia.org/wiki/Fork_(system_call)) and its variants are typically the only way of doing so in Unix-like systems. For a process to start the execution of a different program, it first forks to create a copy of itself. Then, the copy, called the "[child process](https://en.wikipedia.org/wiki/Child_process)", calls the [exec](https://en.wikipedia.org/wiki/Exec_(system_call)) system call to overlay itself with the other program: it ceases execution of its former program in favor of the other.
Some commands in your shell don’t have an associated process, for example there is no `cd` program, it’s a functionality of your shell. The `cd` command tell your `bash` process to do something not to fork another process.
# Process attributes
- PID : the **p**rocess **id**entifier is an integer, at a given time each PID is unique to a process
- PPID : the **p**arent **p**rocess **id**entifier is the PID of the process that has stared the current process
- UID : the **u**ser **id**entifier is the identifier of the user that has started the process, except for SE Linux, the process will have th same right as the user launching it.
- PGID : the **p**rocess **g**roup **id**entifier (like users, processes have groups)
You can use the command `ps` to see the processes launched by your user
```sh
ps
```
Like for the command `ls` you can use the switch `-l` to have more details.
Open another tab in your browser to log again on your VM. Keep this tab open, we are going to use both of them in this session.
In this new tab, you are going to launch a `less` process.
```sh
less .bashrc
```
Comme back, to your first tab, can you see your `less` process with the command `ps` ?
The `ps` option `-u [login]` list all the processes with `UID`the `UID`associated with `[login]`
```sh
ps -l -u etudiant
```
Is the number of `bash` processes consistent with the number of tabs you openned ?
What is the PPID of the `less` process ? Can you identify in which `bash` process `less` is running ?
Did you launch the `systemd` and `(sd-pam)` process ?
`pam` stand for [**p**luggable **a**uthentication **m**odules](https://www.linux.com/news/understanding-pam/), it’s a collection of tools that handle identification and resource access restriction. From the PPID of the `(sd-pam)` can you find which process launched `(sd-pam)` ? What is the PPID of this process ?
The option `-C` allows you to filter process by name
```sh
ps -l -C systemd
```
Who launched the first `systemd` process ?
# Processes tree
From PPID to PPID, you can guess that like the file system, processes are organized in a tree. The command `pstree` can give you a nice representation of this tree.
The following `ps` command shows information on the process with PID 1
```sh
ps -l -1
```
Is this output coherent with what you know on PID and the previous `ps` command ? Can you look at the corresponding program (with the command `which`) ?
Can you look for information on PID 0 ?
The root of the processes tree is the PID 1.
What is the UID of the `dockerd` process, can you guess why we were able to gain `sudo` access in the previous section by using a `docker` command ?
`ps` give you a static snapshot of the processes but processes are dynamic. To see them running you can use the command `top`. While `top` is functional, most system have `htop` with a more accessible interface. You can test `top` and `htop`.
Like `ps` you can use `-u etduiant` with htop to only display your user processes.
With the `F6` key, you can change the column on which to sort your process.
- Which process is consuming the most of CPU ?
- Which process is consuming the most of memory ?
What is the difference between `M_SIZE` (`VIRT` column), `M_RESIDENT` (`RES` column) and `M_SHARE` (`SHR` column) ? To which value, the column `MEM%` corresponds to ?
- `M_SIZE` : The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used (If an application requests 1 GB of memory but uses only 1 MB, then `VIRT` will report 1 GB).
- `M_RESIDENT` : what’s currently in the physical memory. This does not include the swapped out memory and some of the memory may be shared with other processes (If a process uses 1 GB of memory and it calls `fork()`, the result of forking will be two processes whose `RES` is both 1 GB but only 1 GB will actually be used since Linux uses copy-on-write).
- `M_SHARE` : The amount of shared memory used by a task. It simply reflects memory that could be potentially shared with other processes.
Wait what is swapped out memory ?
> Linux divides its physical RAM (random access memory) into chucks of memory called pages. Swapping is the process whereby a page of memory is copied to the preconfigured space on the hard disk, called swap space, to free up that page of memory. The combined sizes of the physical memory and the swap space is the amount of virtual memory available.
And as you HDD (even your fast SSD) is way slower than your RAM, when you run out of RAM and the system start to swap out memory, things will start to go really slowly on your computer. Generally, you want to avoid swapping. The swap space is often a dedicated partition in the *Linux_swap* format.
From the `htop` command, what is the size of the swap space on your VM ?
You have control over all the process launched with your UID. To test this control we are going to use the well-named command `stress`. Check the **man**ual of the `stress` command.
Launch the `stress` for 1 cpu and 3600 second.
You don’t have a prompt, it means that the last command (`stress`) is running.
# Terminate
Instead of taking a nap and come back at the end of this session, we may want to interrupt this command. The first way to do that is to ask the system to terminate the `stress` process.
From your terminal you can press `ctrl` + `c`. This short cut terminates the current process, it works everywhere except for programs like `man` or `less` which can be closed with the key `q`.
Launch another long `stress` process and switch to your other terminal tab and list your active process.
```sh
ps -l -u etudiant
```
You ask `stress` to launch a worker using 100% of one cpu (you can also see that with `htop`). You can see that the `stress` process you launched (with the PPID of your `bash`) forked another `stress` process.
Another way to terminate a process is with the command `kill`. `kill` is used to send a signal to a process with the command:
```sh
kill -15 PID
```
The `-15` option is the default option for `kill` so you can also write `kill PID`.
We can do the same thing as with the command `ctrl` + `c`: ask nicely the process to terminate itself. The `-15` signal is called the SIGTERM.
> On rare occasions a buggy process will not be able to listen to signals anymore. The signal `-9` will kill a process (not nicely at all). The `-9` signal is called the SIGKILL signal. There are 64 different signals that you can send with the `kill` command.
Use the `kill` command to terminate the worker process of your stress command. Go to the other tab where stress was running. What is the difference with your previous `ctrl` + `c` ?
In your current terminal type the `bash` command, nothing happens. You have a shell within a shell. Launch a long `stress` command and switch to the other tab.
You can use the `ps` command to check that `sleep` is running within a `bash` within a `bash`
```sh
ps -l --forest -u etudiant
```
Nicely terminate the intermediate `bash`. What happened ?
Try not nicely. What happened ?
A process with a PPID of 1 is called a **daemon**, daemons are processes that run in the background. Congratulations you created your first daemon.
Kill the remaining `stress `processes with the command `pkill`. You can check the **man**ual on how to do that.
# Suspend
Launch `htop` then press `ctrl` + `z`. What happened ?
```sh
ps -l -u etudiant
```
The manual of the `ps` command say the following about process state
> ```
> D uninterruptible sleep (usually IO)
> I Idle kernel thread
> R running or runnable (on run queue)
> S interruptible sleep (waiting for an event to complete)
> T stopped by job control signal
> t stopped by debugger during the tracing
> W paging (not valid since the 2.6.xx kernel)
> X dead (should never be seen)
> Z defunct ("zombie") process, terminated but not reaped by its parent
> ```
You can use the command `fg` to put `htop` in the **f**ore**g**round.
Close `htop` and type two time de following command (`-i 1` is for simulating **io** **i**nput **o**utput operations) :
```sh
stress -i 1 -t 3600 &
```
Type the command `jobs`. What do you see ? You can specify which `stress` process you want to bring to the foreground with the command `fg %N` with `N` the number of the job.
```sh
fg %2
```
Bring the 2nd `htop` to the foreground. Put it back to the background with `ctrl` + `z`. What is now the differences between your two `stress` processes ?
The command `bg` allow you to resume a job stopped in the background. You can restart your stopped `stress` process with this command. You can use the `kill %N` syntax to kill your two `stress` processes.
# Priority
We have seen that we can launch a `stress `process to use 100% of a cpu. Launch two `stress` process like that in the background.
What happened ? What can you see with the `htop` command ?
> In Linux the [Scheduler](https://en.wikipedia.org/wiki/Completely_Fair_Scheduler), is a system process that manages the order of execution of the task by the CPU(s). Linux and most Unix systems are also multiprocesses OS which means that your OS is constantly switching between process that has access to the CPU(s). From a universal Turing machine point of view, the head of the machine would be constantly switching back and forth on the tape.
You are working on a computer with a graphical interface, think about the processes drawing your windows, the processes reading and rendering your mouse, checking for your mail, loading and rendering your web pages, reading your keystrokes to send them back over the network to the NSA. The scheduler of your OS has to jungle between everything’s without losing anything (don’t be hard on windows OS).
The **nice** state of a process indicates it’s priority for the scheduler. **nice** value range from -20 (the highest priority) to 19 (the lowest priority). The default **nice** value is 0. The command `renice` allows you to change the **nice** value of a process:
```sh
renice -n N -p PID
```
With `N` the **nice** value.
Use `renice` to set the **nice** value of the first `stress` process worker to 19. Use the command `htop` to check the result.
Can we increase the difference between the two processes ? Use re renice command to set the **nice** value of the second `stress` process worker to -20. What happened ?
Only the *root* user can lower the **nice** value of a process. You can also start start a new process with a given **nice** value with the command `nice`:
```sh
nice -n 10 stress -c 1 -t 3600 &
```
Without root access you can only set value lower than 0.
> We have seen the commands:
>
> - `ps` to display processes
> - `pstree` to display a tree of processes
> - `which` to display the PATH of a program
> - `top`/`htop` for a dynamic view of processes
> - `stress` to stress your system
> - `kill`/`pkill` to stop a process
> - `fg` to bring to the foreground a background processes
> - `jobs` to display background processes
> - `bg` to start a background process
> - `stress` to launch mock computation
> - `nice`/`renince` to change the nice value of a process
[To learn how to articulate processes you can head to the next section.](./7_streams_and_pipes.html)
---
title: Unix Processes
author: "Laurent Modolo"
---
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Objective: Understand how the process works in GNU/Linux
A program is a list of instructions to be executed on a computer. These instructions are written in one or many files in a format readable by the computer (binary files), or interpretable by the computer (text file). The interpretable format needs to be processed by an interpreter who is in binary format.
The execution of a program on the system is represented as one or many processes. The program is the file of instruction while the process is the instructions being read.
`mkdir` is an example of a program. When you type `mkdir my_folder`, you launch a `mkdir` process.
Your shell is a process to manipulate other processes.
> In multitasking operating systems, processes (running programs) need a way to create new processes, e.g., to run other programs. [Fork](https://en.wikipedia.org/wiki/Fork_(system_call)) and its variants are typically the only way of doing so in Unix-like systems. For a process to start the execution of a different program, it first forks to create a copy of itself. Then, the copy, called the "[child process](https://en.wikipedia.org/wiki/Child_process)", calls the [exec](https://en.wikipedia.org/wiki/Exec_(system_call)) system to overlay itself with the other program: it ceases execution of its former program in favor of the other.
Some commands in your shell don't have an associated process. For example, there is no `cd` program, it's a functionality of your shell. The `cd` command tells your `bash` process to do something not to fork another process.
## Process attributes
- **PID** : the **p**rocess **id**entifier is an integer, at a given time each **PID** is unique to a process
- **PPID** : the **p**arent **p**rocess **id**entifier is the **PID** of the process that has started the current process
- **UID** : the **u**ser **id**entifier is the identifier of the user that has started the process, except for SE Linux, the process will have the same right as the user launching it.
- **PGID** : the **p**rocess **g**roup **id**entifier (like users, processes have groups)
You can use the command `ps` to see the processes launched by your user.
```sh
ps
```
Like for the command, `ls` you can use the switch `-l` to have more details.
Open another tab in your browser to log again on your VM. Keep these tabs open. We are going to use both of them in this session.
In this new tab, you are going to launch a `less` process.
```sh
less .bashrc
```
Come back to your first tab, can you see your `less` process with the command `ps`?
The `ps` option `-u [login]` list all the processes with **UID** the **UID** associated with `[login]`
```sh
ps -l -u etudiant
```
Is the number of `bash` processes consistent with the number of tabs you opened?
What is the **PPID** of the `less` process? Can you identify in which `bash` process `less` is running?
Did you launch the `systemd` and `(sd-pam)` process?
**pam** stands for [**p**luggable **a**uthentication **m**odules](https://www.linux.com/news/understanding-pam/), it's a collection of tools that handle identification and resource access restriction. From the **PPID** of the `(sd-pam)` can you find which process launched `(sd-pam)`? What is the **PPID** of this process?
The option `-C` allows you to filter process by name
```sh
ps -l -C systemd
```
Who launched the first `systemd` process?
## Processes tree
From **PPID** to **PPID**, you can guess that like the file system, processes are organized in a tree. The command `pstree` can give you a nice representation of this tree.
The following `ps` command shows information on the process with **PID 1**
```sh
ps -l -1
```
Is this output coherent with what you know about **PID** and the previous `ps` command? Can you look at the corresponding program (with the command `which`)?
Can you look for information on **PID 0**?
The root of the processes tree is the **PID 1**.
What is the **UID** of the `dockerd` process? Can you guess why we could gain `sudo` access in the previous section by using a `docker` command?
`ps` give you a static snapshot of the processes, but processes are dynamic. To see them running, you can use the command `top`. While `top` is functional, most systems have `htop` with a more accessible interface. You can test `top` and `htop`.
Like `ps` you can use `-u etduiant` with `htop` to only display your user processes.
With the `F6` key, you can change the column on which to sort your process.
- Which process is consuming most of the CPU?
- Which process is consuming most of the memory?
What is the difference between `M_SIZE` (`VIRT` column), `M_RESIDENT` (`RES` column) and `M_SHARE` (`SHR` column)? To which value, the column `MEM%` corresponds to?
- `M_SIZE`: The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used (if an application requests 1 GB of memory but uses only 1 MB, then `VIRT` will report 1 GB).
- `M_RESIDENT`: What's currently in the physical memory. This does not include the swapped out memory and some of the memory may be shared with other processes (If a process uses 1 GB of memory and it calls `fork()`, the result of forking will be two processes whose `RES` is both 1 GB but only 1 GB will actually be used since Linux uses copy-on-write).
- `M_SHARE`: The amount of shared memory used by a task. It simply reflects memory that could be potentially shared with other processes.
Wait what is swapped out memory?
> Linux divides its physical RAM (random access memory) into chucks of memory called pages. Swapping is the process whereby a page of memory is copied to the preconfigured space on the hard disk, called swap space, to free up that page of memory. The combined sizes of the physical memory and the swap space are the amount of virtual memory available.
And as your HDD (even your fast SSD) is way slower than your RAM, when you run out of RAM and the system starts to swap out memory, things will start to go really slowly on your computer. Generally, you want to avoid swapping. The swap space is often a dedicated partition in the *Linux_swap* format.
From the `htop` command, what is the size of the swap space on your VM?
You have control over all the processes launched with your UID. To test this control, we are going to use the well-named command `stress`. Check the **man**ual of the `stress` command.
Launch the `stress` for 1 CPU and 3600 second.
You don't have a prompt; it means that the last command (`stress`) is running.
## Terminate
Instead of taking a nap and come back at the end of this session, we may want to interrupt this command. The first way to do that is to ask the system to terminate the `stress` process.
From your terminal you can press `ctrl` + `c`. This shortcut terminates the current process. It works everywhere except for programs like `man` or `less` which can be closed with the key `q`.
Launch another long `stress` process and switch to your other terminal tab, then list your active process.
```sh
ps -l -u etudiant
```
You ask `stress` to launch a worker using 100% of one CPU (you can also see that with `htop`). You can see that the `stress` process you launched (with the PPID of your `bash`) forked another `stress` process.
Another way to terminate a process is with the command `kill`. `kill` is used to send a signal to a process with the command:
```sh
kill -15 PID
```
The `-15` option is the default option for `kill` so you can also write `kill PID`.
We can do the same thing as with the command `ctrl` + `c`: ask nicely the process to terminate itself. The `-15` signal is called the **SIGTERM**.
> On rare occasions, a buggy process cannot listen to signals anymore. The signal `-9` will kill a process (not nicely at all). The `-9` signal is called the **SIGKILL** signal. There are 64 different signals that you can send with the `kill` command.
Use the `kill` command to terminate the worker process of your stress command. Go to the other tab where stress was running. What is the difference between your previous `ctrl` + `c`?
In your current terminal, type the `bash` command, nothing happens. You have a shell within a shell. Launch a long `stress` command and switch to the other tab.
You can use the `ps` command to check that `stress` is running within a `bash` within a `bash`
```sh
ps -l --forest -u etudiant
```
Nicely terminate the intermediate `bash`. What happened?
Try not nicely. What happened?
A process with a **PPID** of **1** is called a **daemon**, daemons are processes that run in the background. Congratulations you created your first daemon.
Kill the remaining `stress `processes with the command `pkill`. You can check the **man**ual on how to do that.
## Suspend
Launch `htop` then press `ctrl` + `z`. What happened?
```sh
ps -l -u etudiant
```
The manual of the `ps` command says the following about process state:
```
D uninterruptible sleep (usually IO)
I Idle kernel thread
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped by a job control signal
t stopped by debugger during the tracing
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent
```
You can use the command `fg` to put `htop` in the **f**ore**g**round.
Close `htop` and type twice the following command (`-i 1` is for simulating **io** **i**nput **o**utput operations):
```sh
stress -i 1 -t 3600 &
```
Type the command `jobs`. What do you see? You can specify which `stress` process you want to bring to the foreground with the command `fg %N` with `N` the number of the job.
```sh
fg %2
```
Bring the 2nd `stress` to the foreground. Put it back to the background with `ctrl` + `z`. What is now the differences between your two `stress` processes?
The command `bg` allows you to resume a job stopped in the background. You can restart your stopped `stress` process with this command. You can use the `kill %N` syntax to kill your two `stress` processes.
## Priority
We have seen that we can launch a `stress `process to use 100% of a CPU. Launch two `stress` processes like that in the background.
What happened? What can you see with the `htop` command?
> In Linux the [Scheduler](https://en.wikipedia.org/wiki/Completely_Fair_Scheduler), is a system process that manages the order of execution of the task by the CPU(s). Linux and most Unix systems are also multiprocessors OS so your OS is constantly switching between processes that have access to the CPU(s). From a universal Turing machine point of view, the head of the machine would be constantly switching back and forth on the tape.
You are working on a computer with a graphical interface, think about the processes drawing your windows, the processes reading and rendering your mouse, checking for your mail, loading and rendering your web pages, reading your keystrokes to send them back over the network to the NSA. The scheduler of your OS has to jungle between everything's without losing anything (don't be too hard on the windows OS).
The **nice** state of a process indicates its priority for the scheduler. **nice** value ranges from **-20** (the highest priority) to **19** (the lowest priority). The default **nice** value is **0**. The command `renice` allows you to change the **nice** value of a process:
```sh
renice -n N -p PID
```
With `N` the **nice** value.
Use `renice` to set the **nice** value of the first `stress` process worker to **19**. Use the command `htop` to check the result.
Can we increase the difference between the two processes? Use the `renice` command to set the **nice** value of the second `stress` process worker to **-20**. What happened?
Only the *root* user can lower the **nice** value of a process. You can also start a new process with a given **nice** value with the command `nice`:
```sh
nice -n 10 stress -c 1 -t 3600 &
```
Without root access, you can only set value greater than 0.
> We have seen the commands:
>
> - `ps` to display processes
> - `pstree` to display a tree of processes
> - `which` to display the PATH of a program
> - `top`/`htop` for a dynamic view of processes
> - `stress` to stress your system
> - `kill`/`pkill` to stop a process
> - `fg` to bring to the foreground a background processes
> - `jobs` to display background processes
> - `bg` to start a background process
> - `stress` to launch a mock computation
> - `nice`/`renince` to change the nice value of a process
[To learn how to articulate processes, you can head to the next section.](./7_streams_and_pipes.html)
---
title: Unix Streams and pipes
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
```
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
Objective: Understand function of streams and pipes in Unix systems
Objective: Understand the function of streams and pipes in Unix systems
When you read a file you start at the top from left to right, you read a flux of information which stops at the end of the file.
When you read a file, you start at the top from left to right, you read a flux of information which stops at the end of the file.
Unix streams are much the same things instead of opening a file as a whole bunch of data, process can process it as a flux. There are 3 standard Unix streams:
Unix streams are much the same things. Instead of opening a file as a whole bunch of data, the process can process it as a flux. There are 3 standard Unix streams:
0. **stdin** the **st**an**d**ard **in**put
1. **stdout** the **st**an**d**ard **out**put
2. **sterr** the **st**an**d**ard **err**or
Historically, **stdin** has been the card reader or the keyboard, while the two others where the card puncher or the display.
Historically, **stdin** has been the card reader or the keyboard, while the two others were the card puncher or the display.
The command `cat `simply read from **stdin** and displays the results on **stdout**
The command `cat `simply reads from **stdin** and displays the results on **stdout**
```sh
cat
......@@ -58,11 +33,9 @@ It can also read files and display the results on **stdout**
cat .bashrc
```
## Streams manipulation
# Streams manipulation
You can use the `>` character to redirect a flux toward a file. The following command make a copy of your `.bashrc` files.
You can use the `>` character to redirect a flux toward a file. The following command makes a copy of your `.bashrc` files.
```sh
cat .bashrc > my_bashrc
......@@ -72,7 +45,7 @@ Check the results of your command with `less`.
Following the same principle create a `my_cal` file containing the **cal**endar of this month. Check the results with the command `less`
Reuse the same command with the unnamed option `1999`. Check the results with the command `less`. What happened ?
Reuse the same command with the unnamed option `1999`. Check the results with the command `less`. What happened?
Try the following command
......@@ -80,13 +53,13 @@ Try the following command
cal -N 2 > my_cal
```
What is the content of `my_cal` what happened ?
What is the content of `my_cal` what happened?
The `>` command can have an argument, the syntax to redirect **stdout** to a file is `1>` it's also the default option (equivalent to `>`). Here the `-N` option doesn't exists, `cal` throws an error. Errors are sent to **stderr** which have the number 2.
The `>` command can have an argument, the syntax to redirect **stdout** to a file is `1>` it's also the default option (equivalent to `>`). Here the `-N` option doesn't exist, `cal` throws an error. Errors are sent to **stderr** which have the number 2.
Save the error message in `my_cal` and check the results with `less`.
We have seen tha `>` overwrite the content of the file. Try the following commands:
We have seen that `>` overwrites the content of the file. Try the following commands:
```sh
cal 2020 > my_cal
......@@ -96,7 +69,7 @@ cal -N 2 2>> my_cal
Check the results with the command `less`.
The command `>` send the stream from the left to the file on the right. Try the following:
The command `>` sends the stream from the left to the file on the right. Try the following:
```sh
cat < my_cal
......@@ -110,20 +83,18 @@ You can use different redirection on the same process. Try the following command
cat <<EOF > my_notes
```
Type some text and type `EOF` on a new line. `EOF` stand for **e**nd **o**f **f**ile, it's a conventional sequence to use to indicate the start and the end of a file in a stream.
Type some text and type `EOF` on a new line. `EOF` stands for **e**nd **o**f **f**ile, it's a conventional sequence used to indicate the start and the end of a file in a stream.
What happened ? Can you check the content of `my_notes` ? How would you modify this command to add new notes?
What happened? Can you check the content of `my_notes`? How would you modify this command to add new notes?
Finaly you can redirect a stream toward another stream with the following syntax:
Finally, you can redirect a stream toward another stream with the following syntax:
```sh
cal -N2 2&> my_redirection
cal 2&>> my_redirection
cal -N 2 &> my_redirection
cal 2 &>> my_redirection
```
# Pipes
## Pipes
The last stream manipulation that we are going to see is the pipe which transforms the **stdout** of a process into the **stding** of the next. Pipes are useful to chain multiples simple operations. The pipe operator is `| `
......@@ -131,14 +102,12 @@ The last stream manipulation that we are going to see is the pipe which transfor
cal 2020 | less
```
What is the difference between with this command ?
What is the difference with this command?
```sh
cal 2020 | cat | cat | less
```
The command `zcat` has the same function as the command `cat` but for compressed files in [`gzip` format](https://en.wikipedia.org/wiki/Gzip).
The command `wget` download files from a url to the corresponding file. Don't run the following command which would download the human genome:
......@@ -147,17 +116,17 @@ The command `wget` download files from a url to the corresponding file. Don't ru
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
```
We are going to use the `-q` switch which silence `wget` (no download progress bar or such), and the option `-O` which allows use to set the name of the output file. In Unix setting the output file to `-` allow you to write the output on the **stdout** stream.
We are going to use the `-q` switch which silence `wget` (no download progress bar or such), and the option `-O` which allows us to set the name of the output file. In Unix setting the output file to `-` allow you to write the output on the **stdout** stream.
Analyze the following command, what would it do ?
Analyze the following command. What would it do?
```sh
wget -q -O - http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz | gzip -dc | less
```
Remember that most Unix command process input and output line by line. Which means that you can process huge dataset without intermediate files or huge RAM capacity.
Remember that most Unix command process input and output line by line. As a result, you can process huge datasets without intermediate files or huge RAM capacity.
> We have users the following commands:
> We have used the following commands:
>
> - `cat`/ `zcat` to display information in **stdout**
> - `>` / `>>` / `<` / `<<` to redirect a flux
......
---
title: Text manipulation
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
```
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
Objective: Learn basics way to work with text file in Unix
Objective: Learn simple ways to work with text file in Unix
One of the great thing with command line tools is that they are simple and fast. Which means that they are great for handle large files. And as bioinformaticians you have to handle large file, so you need to use command line tools for that.
One of the great things with command-line tools is that they are simple and fast. So they work great for handling large files. And as bioinformaticians, you have to handle large files, so you need to use command-line tools for that.
# Text search
## Text search
The file [hg38.ncbiRefSeq.gtf.gz](http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz) contains the RefSeq annotation for hg38 in [GFT format](http://www.genome.ucsc.edu/FAQ/FAQformat.html#format4)
We can download files with the `wget` command. Here the annotation is in **gz** format which is a compressed format, you can use the `gzip` tool to hande **gz** files.
We can download files with the `wget` command. Here the annotation is in **gz** format which is a compressed format. You can use the `gzip` tool to handle **gz** files.
On useful command to check large text file is the `head `command.
......@@ -50,7 +25,7 @@ wget -q -O - http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.n
You can change the number of lines displayed with the option `-n number_of_line`. The command `tail` has the same function as `head` but starting from the end of the file.
Try the `tail` for the same number of lines displayed, does the computation take the same time ?
Try the `tail` for the same number of lines displayed. Does the computation take the same time?
Download the `hg38.ncbiRefSeq.gtf.gz` file in your `~/`.
......@@ -60,25 +35,25 @@ The program `grep string` allows you to search for *string* through a file or st
gzip -dc hg38.ncbiRefSeq.gtf.gz | grep "chr2" | head
```
What is the last annotation on the chromosome 1 (to write a tabulation character you can type `\t`) ?
What is the last annotation on the chromosome 1?
You can count things in text file with the command `wc` read the `wc` **man**ual to see how you can count line in a file.
You can count things in text file with the command `wc` read the `wc` **man**ual to see how you can count lines in a file.
Does the number of *3UTR* match the number of *5UTR* ?
Does the number of *3UTR* match the number of *5UTR*?
How many transcripts does the gene *CCR7* have ?
How many transcripts does the gene *CCR7* have?
# Regular expression
## Regular expression
When you do a loot text search, you will encounter regular expression (regexp), which allow you to perform fuzzy search. To run `grep` in regexp mode you can use the switch `-E`
When you do a lot text search, you will encounter regular expressions (regexp), which allow you to perform a fuzzy search. To run `grep` in regexp mode, you can use the switch `-E` or `-P` for Perl-like regexp (which offers additional features).
The most basic form fo regexp si the exact match:
The most basic form of regexp is the exact match:
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | grep -E "gene_id"
```
You can use the `.` wildcard character to match any thing
You can use the `.` wildcard character to match anything
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | grep -E "...._id"
......@@ -90,9 +65,9 @@ There are different special characters in regexp, but you can use `\` to escape
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | grep -E "\."
```
## Character classes and alternatives
### Character classes and alternatives
There are a number of special patterns that match more than one character. Youve already seen `.`, which matches any character apart from a newline. There are four other useful tools:
There are several special patterns that match more than one character. You've already seen `.`, which matches any character apart from a newline. There are four other useful tools:
- `\d`: matches any digit.
- `\s`: matches any whitespace (e.g. space, tab, newline).
......@@ -106,16 +81,20 @@ Search for two digits followed by an uppercase letter and one digit.
<details><summary>Solution</summary>
<p>
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | perl -E "\d\d[A-Z]\d"
gzip -dc hg38.ncbiRefSeq.gtf.gz | grep -E "[0-9][0-9][A-Z][0-9]"
```
We need to use the flag `-P` to use the character class `\d`:
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | grep -P "\d\d[A-Z]\d"
```
</p>
</details>
## Anchors
### Anchors
By default, regular expressions will match any part of a string. Its often useful to *anchor* the regular expression so that it matches from the start or end of the string. You can use
By default, regular expressions will match any part of a string. It's often useful to *anchor* the regular expression so that it matches from the start or end of the string. You can use either:
- ^` to match the start of the string.
- `^` to match the start of the string.
- `$` to match the end of the string.
```sh
......@@ -128,7 +107,7 @@ gzip -dc hg38.ncbiRefSeq.gtf.gz | head | grep -E "c"
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | grep -E "^c"
```
## Repetition
### Repetition
The next step up in power involves controlling how many times a pattern matches
......@@ -136,7 +115,7 @@ The next step up in power involves controlling how many times a pattern matches
- `+`: 1 or more
- `*`: 0 or more
What is the following regexp going to match ?
What is the following regexp going to match?
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | grep -E "[a-z]*_[a-z]*\s\"[1-3]\""
......@@ -149,31 +128,31 @@ You can also specify the number of matches precisely:
- `{,m}`: at most m
- `{n,m}`: between n and m
What is the following regexp going to match ?
What is the following regexp going to match?
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | grep -E "^[a-z]{3}[2-3]\s.*exon\s\d{4,5}\s\d{4,5}.*"
```
How many gene names of more than 16 characters does the annotation contain ?
How many gene names over 16 characters does the annotation contain?
<details><summary>Solution</summary>
<p>
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | grep -E "transcript\s.*gene_id\s\"\S{16,}\";" | wc -l
gzip -dc hg38.ncbiRefSeq.gtf.gz | grep -P "transcript\s.*gene_name\s\"\S{16,}\";" | wc -l
```
</p>
</details>
## Grouping and back references
### Grouping and back references
You can group match using `()`, for example the following regexp match doublet of *12* .
You can group match using `()`, for example, the following regexp match doublet of *12*.
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | grep -E "(12){2}"
```
Grouping is also used for back references in the case of text replacement. You can use the command `sed` for text replacement. The syntax of `sed` for replacement is the following: `sed -E 's|regexp|\n|g` where `n` is the grouping number. `s` stand for substitute and `g` stand for global (which means that is they are different substitutions per line `sed` won't stop at the first one).
Grouping is also used for back references in text replacement. You can use the command `sed` for text replacement. The syntax of `sed` for replacement is the following: `sed -E 's|regexp|\n|g` where `n` is the grouping number. `s` stand for substitute and `g` stand for global (so if they are different substitutions per line `sed` won't stop at the first one).
Try the following replacement regexp
......@@ -181,20 +160,20 @@ Try the following replacement regexp
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | sed -E 's|(transcript_).{2}|\1number|g'
```
Try to write a `sed` command to replace *ncbiRefSeq* with *transcript_id* .
Try to write a `sed` command to swap *ncbiRefSeq* with *transcript_id*.
<details><summary>Solution</summary>
<p>
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | sed -E 's|ncbiRefSeq(.*)(transcript_id "([A-Z_0-9.]*))|\3\1\2|g'
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | sed -E 's|(ncbiRefSeq)(.*)(transcript_id)(.*)|\3\2\1\4|g'
```
</p>
</details>
Regexp can be very complexe see for example [a regex to validate an email on starckoverflow](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression/201378#201378). When you start you can always use for a given regexp to a more experienced used (just give him the kind of text you want to match and not match). You can test your regex easily with the [regex101 website](https://regex101.com/).
Regexp can be very complex see for example [a regex to validate an email on starckoverflow](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression/201378#201378). The [regex101 website](https://regex101.com/) is a good place for starters that helps you construct your regexp step by step.
# Sorting
## Sorting
GTF files should be sorted by chromosome, starting position and end position. But you can change that with the command `sort` to select the column to sort on you can use the option `-k n,n` where `n` is the column number.
GTF files should be sorted by chromosome, starting position and end position. But you can change that with the command `sort` to select the column to sort on. You can use the option `-k n,n` where `n` is the column number.
You need to specify where sort keys start *and where they end*, otherwise (as in when you use `-k 3` instead of `-k 3,3`) they end at the end of the line.
......@@ -208,7 +187,7 @@ gzip -dc hg38.ncbiRefSeq.gtf.gz | head -n 10000 | sort -k 4,4 -k 5,5 | head
You can add more option to the sorting of each column, for example `r` for reverse order `d` for dictionary order or `n` for numeric order.
What will the following command do ?
What will the following command do?
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head -n 10000 | sort -k 3,3d -k 4,4n | head
......@@ -234,13 +213,11 @@ gzip -dc hg38.ncbiRefSeq.gtf.gz | head -n 10000 | sort -k 1,1 -k 4,4n -k 5,5n -c
</details>
# Field extractor
## Field extractor
Sometime rather than using complex regexp, we want to extract a particular column from a file. You can use the command `cut` to do that.
The following command extracts the 3rd column of the annotation
The following command extracts the 3rd column of the annotation:
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | cut -f 3
......@@ -251,11 +228,13 @@ You can change the field separator with the option `-d`, set it to `";"` to extr
<details><summary>Solution</summary>
<p>
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | cut -f 2 -f 5 -d ";"
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | grep -w "transcript" | cut -f 9 | cut -f 2,3 -d ";"
```
</p>
</details>
# Concatenation
## Concatenation
There are different tools to concatenate files from the command line `cat` for vertical concatenation and `paste` for horizontal concatenation.
......@@ -263,35 +242,34 @@ There are different tools to concatenate files from the command line `cat` for v
cat .bashrc .bashrc | wc -l
```
What will be the results of the following command ?
What will be the result of the following command?
```sh
gzip -dc hg38.ncbiRefSeq.gtf.gz | head | paste - -
```
## Text editor
# Text editor
You often have access to different text editors from the common line, two of the most popular ones are `vim` and `nano`.
You often have access to different text editors from the common line. Two of the most popular ones are `vim` and `nano`.
`nano` is more friendly to use than `vim` but also very limited.
To open a text file you can type `editor file_path`.
In `nano` everything is written at the bottom, so you only have to remember that `^`is the symbol for the key `Ctrl`.
In `nano` everything is written at the bottom, so you only have to remember that `^` is the symbol for the key `Ctrl`.
Open you `.bashrc` file and delete any comment line (starting with the `#` character).
Open your `.bashrc` file and delete any comment line (starting with the `#` character).
`vim` is a child of the project `vi` (which should also be available on your system), and which bring him more functionality. The workings of `vim` can be a little strange at first, but you have to understand that on a US keyboard the distance that your finger have to travel while using `vim` is minimal.
`vim` is a child of the project `vi` (which should also be available on your system), and which brings him more functionality. The workings of `vim` can be a little strange at first, but you have to understand that on a US keyboard, the distance that your finger has to travel while using `vim` is minimal.
You have 3 modes in `vim`:
- The **normal** mode, where you can navigate the file and enter command with the `:` key. You can come back to this mode by pressing `Esc`
- The **insert** mode, where you can write things. You enter this mode with the `i` key or any other key insertion key (for example `a` to insert after the cursor or `A` to insert at the end of the line)
- The **visual** mode where you can select text for copy/paste action. You can enter this mode with the `v` key
- The **normal** mode, where you can navigate the file and enter a command with the `:` key. You can come back to this mode by pressing `Esc`
- The **insert** mode, where you can write things. You enter this mode with the `i` key or any other key insertion key (for example, `a` to insert after the cursor or `A` to insert at the end of the line)
- The **visual** mode where you can select text for copy/paste action. You can enter this mode with the `v` key.
If you want to learn more about `vim` you can start with the https://vim-adventures.com/ website. Once you master `vim` everything is faster but you will have to practice a loot.
If you want to learn more about `vim`, you can start with the <https://vim-adventures.com/> website. Once you master `vim` everything is faster, but you will have to practice a lot.
> We have used the following commands:
>
......@@ -300,11 +278,10 @@ If you want to learn more about `vim` you can start with the https://vim-adventu
> - `gzip` to extract `tar.gz` files
> - `grep` to search text files
> - `wc` to count things
> - `sed` to search and replace string of text
> - `sort` to sort files on specific field
> - `sed` to search and replace strings of text
> - `sort` to sort files on specific fields
> - `cut` to extract a specific field
> - `cat` / `paste` for concatenation
> - `nano` / `vim` for text edition
In the next session, we are going to apply the logic of pipes and text manipulation to [batch processing.](./9_batch_processing.html)
---
title: Batch processing
author: "Laurent Modolo"
output:
rmdformats::downcute:
self_contain: true
use_bookdown: true
default_style: "light"
lightbox: true
css: "./www/style_Rmd.css"
---
```{r include = FALSE}
if (!require("fontawesome")) {
install.packages("fontawesome")
}
if (!require("klippy")) {
install.packages("remotes")
remotes::install_github("rlesur/klippy")
}
library(fontawesome)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA)
klippy::klippy(
position = c('top', 'right'),
color = "white",
tooltip_message = 'Click to copy',
tooltip_success = 'Copied !')
```
[![cc_by_sa](./img/cc_by_sa.png)](http://creativecommons.org/licenses/by-sa/4.0/)
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
<img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" />
</a>
Objective: Learn basics of batch processing in GNU/Linux
In the previous section, we have seen how to handle streams and text. We can use this knowledge to generate list of command instead of text. This is called batch processing.
In the previous section, we have seen how to handle streams and text. We can use this knowledge to generate a list of command instead of text. This is called batch processing.
In everyday life, you may want to run command sequentiality without using pipes.
In everyday life, you may want to run command sequentially without using pipes.
To run `CMD1` and then run `CMD2` you can use the `;` operator
......@@ -44,7 +19,7 @@ To run `CMD1` and then run `CMD2` you can use the `;` operator
CMD1 ; CMD2
```
To run `CMD1` and then run `CMD2` if `CMD1` didnt throw an error, you can use the `&&` operator which is safer than the `;` operator.
To run `CMD1` and then run `CMD2` if `CMD1` didn't throw an error, you can use the `&&` operator, which is safer than the `;` operator.
```sh
CMD1 && CMD2
......@@ -56,17 +31,17 @@ You can also use the `||` to manage errors and run `CMD2` if `CMD1` failed.
CMD1 || CMD2
```
# Executing list of commands
## Executing a list of commands
The easiest option to execute list of command is to use `xargs`. `xargs` reads arguments from **stdin** and use them as argument for a command. In UNIX systems the command `echo` send string of character into **stdout**. We are going to use this command to learn more about `xargs`.
The easiest option to execute a list of commands is to use `xargs`. `xargs` reads arguments from **stdin** and uses them as arguments for a command. In UNIX systems, the command `echo` send a string of character into **stdout**. We are going to use this command to learn more about `xargs`.
```sh
echo "hello world"
```
In general a string of character differs from a command when its placed between quotes.
In general, a string of character differs from a command when it's placed between quotes.
The two following commands are equivalent, why ?
Why are the two following commands equivalent?
```sh
echo "file1 file2 file3" | xargs touch
......@@ -75,7 +50,7 @@ touch file1 file2 file3
You can display the command executed by `xargs` with the switch `-t`.
By default the number of arguments sent by `xargs` is defined by the system. You can change it with the option `-n N`, where `N` is the number of arguments sent. Use the option `-t` and `-n` to run the previous command as 3 separate `touch` commands.
By default, the number of arguments sent by `xargs` is defined by the system. You can change it with the option `-n N`, where `N` is the number of arguments sent. Use the option `-t` and `-n` to run the previous command as 3 separate `touch` commands.
<details><summary>Solution</summary>
<p>
......@@ -94,18 +69,18 @@ echo "file1;file2;file3"
<details><summary>Solution</summary>
<p>
```sh
echo "file1;file2;file3" | xargs -t -d \; touch
echo -n "file1;file2;file3" | xargs -t -d\; touch
```
</p>
</details>
To reuse the arguments sent to `xargs` you can use the command `-I` which defines a string corresponding to the argument. Try the following command, what does the **man**ual says about the `-c` option of the command `cut` ?
To reuse the arguments sent to `xargs` you can use the command `-I` which defines a string corresponding to the argument. Try the following command. What does the **man**ual says about the `-c` option of the command `cut`?
```sh
ls -l file* | cut -c 44- | xargs -t -I % ln -s % link_%
```
Instead of using `ls` the command `xargs` is often used with the command `find`. The command `find` is a powerful command to search for files.
The command `xargs` is also often combined with the command `find`. The command `find` is a powerful command to search for files.
Modify the following command to make a non-hidden copy of all the file with a name starting with *.bash* in your home folder
......@@ -137,113 +112,111 @@ find /tmp/ -type d | xargs -t rm -R
</p>
</details>
# Writing `awk` commands
## Writing `awk` commands
`xargs` It is a simple solution for writing batch commands, but if you want to write more complex command you are going to need to learn `awk`. `awk` is a programming language by itself, but you dont need to know everything about `awk` to use it.
`xargs` is a simple solution for writing batch commands, but if you want to write more complex ones, you are going to need to learn `awk`. `awk` is a programming language by itself, but you don't need to know everything about `awk` to use it.
You can to think of `awk` as a `xargs -I $N` command where `$1` correspond to the first column `$2` to the second column, etc.
You can to think of `awk` as a `xargs -I $N` command where `$1` corresponds to the first column `$2` to the second column, etc.
There are also some predefined variables that you can use like.
There are also some predefined variables that you can use:
- `$0` Correspond to all the columns.
- `$0` corresponds to all the columns.
- `FS` the field separator used
- `NF` the number of fields separated by `FS`
- `NR` the number for records already read
- `NR` the number of records already read
A `awk` program is a chain of commands with the form `motif { action }`
- the `motif` define where there `action` is executed
- there `action` is what you want to do
- the `motif` defines where there `action` is executed
- the `action` is what you want to do
They `motif` can be
The `motif` can be
- a regexp
- The keyword `BEGIN`or `END` (before reading the first line, and after reading the last line)
- the keyword `BEGIN`or `END` (before reading the first line, and after reading the last line)
- a comparison like `<`, `<=`, `==`, `>=`, `>` or `!=`
- a combination of the three separated by `&&` (AND), `||`(OR) and `!` (Negation)
- a range of line `motif_1,motif_2`
With `awk` you can
Count the number of lines in a file
```sh
awk '{ print NR " : " $0 }' file
```
Modify this command to only display the total number of line with awk (like `wc -l`)
<details><summary>Solution</summary>
<p>
```sh
awk 'END{ print NR }' file
```
</p>
</details>
Convert a tabulated sequences file into fasta format
```sh
awk -vOFS='' '{print ">",$1,"\n",$2,"\n";}' two_column_sample_tab.txt > sample1.fa
```
Modify this command to only get a list of sequence names in a fasta file
<details><summary>Solution</summary>
<p>
```sh
awk -vOFS='' '{print $1 "\n";}' two_column_sample_tab.txt > seq_name.txt
```
</p>
</details>
Convert a multiline fasta file into a single line fasta file
```sh
awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' sample.fa > sample1_singleline.fa
```
Convert fasta sequences to uppercase
```sh
awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' file.fasta > file_upper.fasta
```
Modify this command to only get a list of sequence names in a fasta file un lowercase
<details><summary>Solution</summary>
<p>
```sh
awk '/[^>]/ {print(tolower($0))}' file.fasta > seq_name_lower.txt
```
</p>
</details>
Return a list of sequence_id sequence_length from a fasta file
```sh
awk 'BEGIN {OFS = "\n"}; /^>/ {print(substr(sequence_id, 2)" "sequence_length); sequence_length = 0; sequence_id = $0}; /^[^>]/ {sequence_length += length($0)}; END {print(substr(sequence_id, 2)" "sequence_length)}' file.fasta
```
Count the number of bases in a fastq.gz file
```sh
(gzip -dc $0) | awk 'NR%4 == 2 {basenumber += length($0)} END {print basenumber}'
```
Only read with more than 20bp from a fastq
```sh
awk 'BEGIN {OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) >= 20){print header, seq, qheader, qseq}}' < input.fastq > output.fastq
```
# Writing a bash script
When you start writing complicated command, you may want to save them to reuse them later.
You can find everything that you are typing in your `bash`in the `~/.bash_history` file, but working with this file can be tedious as it also contains all the command that you mistype. A good solution, for reproducibility is to write `bash` scripts. A bash script is simply a text file that contains a sequence of `bash`commands.
With `awk` you can:
* Count the number of lines in a file
```sh
awk '{ print NR " : " $0 }' file
```
Modify this command to only display the total number of line with awk (like `wc -l`)
<details><summary>Solution</summary>
<p>
```sh
awk 'END{ print NR }' file
```
</p>
</details>
* Convert a tabulated sequences file into fasta format
```sh
awk -vOFS='' '{print ">",$1,"\n",$2,"\n";}' two_column_sample_tab.txt > sample1.fa
```
Modify this command to only get a list of sequence names in a fasta file
<details><summary>Solution</summary>
<p>
```sh
awk -vFS='\t' -vOFS='' '{print $1 "\n";}' two_column_sample_tab.txt > seq_name.txt
```
</p>
</details>
* Convert a multiline fasta file into a single line fasta file
```sh
awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' sample.fa > sample_singleline.fa
```
* Convert fasta sequences to uppercase
```sh
awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' sample.fa > sample_upper.fa
```
Modify this command to only get a list of sequence names in a fasta file in lowercase
<details><summary>Solution</summary>
<p>
```sh
awk '/^[^>]/ {print(tolower($0))}' sample.fa > seq_name_lower.txt
```
</p>
</details>
* Return a list of sequence_id sequence_length from a fasta file
```sh
awk 'BEGIN {OFS = "\n"}; /^>/ {print(substr(sequence_id, 2)" "sequence_length); sequence_length = 0; sequence_id = $0}; /^[^>]/ {sequence_length += length($0)}; END {print(substr(sequence_id, 2)" "sequence_length)}' sample.fa
```
* Count the number of bases in a fastq.gz file
```sh
sample.fq.gz | (gzip -dc $0) | awk 'NR%4 == 2 {basenumber += length($0)} END {print basenumber}'
```
* Extract the reads with more than 20bp from a fastq file
```sh
awk 'BEGIN {OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) >= 20){print header, seq, qheader, qseq}}' < input.fq > output.fq
```
## Writing a bash script
When you start writing complex commands, you might want to save them to reuse them later.
You can find everything that you are typing in your `bash`in the `~/.bash_history` file, but working with this file can be tedious as it also contains all the commands that you mistype. A good solution, for reproducibility purpose, is to write `bash` scripts. A bash script is simply a text file that contains a sequence of `bash` commands.
As you use `bash` in your terminal, you can execute a `bash` script with the following command:
......@@ -251,11 +224,11 @@ As you use `bash` in your terminal, you can execute a `bash` script with the fol
source myscrip.sh
```
Its usual to write the `.sh` extension for `shell`scripts.
It's usual to write the `.sh` extension for `shell` scripts.
Write a bash script named `download_hg38.sh` that download the [hg38.ncbiRefSeq.gtf.gz](http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz) file, then extract it and that says that it has done it.
Write a bash script named `download_hg38.sh` that downloads the [hg38.ncbiRefSeq.gtf.gz](http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz) file, then extract it and that says that it has done it.
The `\` character like in regexp cancel the meaning of what follow, you can use it to split your one-liner scripts over many lines to use the `&&` operator.
The `\` character like in regexp cancels the meaning of what follow, you can use it to split your one-liner scripts over many lines to use the `&&` operator.
<details><summary>Solution</summary>
<p>
......@@ -268,25 +241,25 @@ echo "download and extraction complete"
</details>
## shebang
### shebang
In your first bash script, the only thing saying that your script is a bash script is its extension. But most of the time UNIX system doesnt care about file extension, a text file is a text file.
In your first bash script, the only thing saying that your script is a bash script is its extension. But most of the time UNIX system doesn't care about file extension, a text file is a text file.
To tell the system that your text file is a bash script you need to add a **shebang**. A **shebang** is a special first line that starts with a `#!` followed by the path of the interpreter for your script.
To tell the system that your text file is a bash script, you need to add a **shebang**. A **shebang** is a special first line that starts with a `#!` followed by the path of the interpreter for your script.
For example, for a bash script in a system where `bash` is installed in `/bin/bash` the **shebang** is:
For example, for a bash script in a system where `bash` is installed in `/bin/bash`, the **shebang** is:
```bash
#!/bin/bash
##!/bin/bash
```
When you are not sure `which`is the path of the tools available to interpret your script, you can use the following shebang:
When you are not sure `which` is the path of the tools available to interpret your script, you can use the following shebang:
```bash
#!/usr/bin/env bash
##!/usr/bin/env bash
```
You can add a **shebang** to your script and add it the e**x**ecutable right.
You can add a **shebang** to your script and add to it e**x**ecutable right for yourself.
<details><summary>Solution</summary>
<p>
......@@ -303,22 +276,22 @@ Now you can execute your script with the command:
./download_hg38.sh
```
Congratulations you wrote your first program !
Congratulations you wrote your first program!
## PATH
### PATH
Where did they `/usr/bin/env` find the information about your bash ? Why did we have to write a `./` before our script if we are in the same folder ?
Where did they `/usr/bin/env` find the information about your bash? Why did we have to write a `./` before our script if we are in the same folder?
This is all linked to the **PATH** bash variable. Like in many programming languages `bash` have what we call *variables*. *variables* are named storage for temporary information. You can print a list of all your environment variables (variables loaded in your `bash` memory), with the command `printenv`.
This is all linked to the **PATH** bash variable. Just as in many programming languages, `bash` have what we call *variables*. *variables* are named storage for temporary information. You can print a list of all your environment variables (variables loaded in your `bash` memory), with the command `printenv`.
To create a new variable you can use the following syntax:
To create a new variable, you can use the following syntax:
```sh
VAR_NAME="text"
VAR_NAME2=2
```
Create a `IDENTIY` variable with your first and last names.
Create an `IDENTIY` variable with your first and last names.
<details><summary>Solution</summary>
<p>
......@@ -328,7 +301,7 @@ IDENTITY="First name Last Name"
</p>
</details>
Its good practice to write your `bash` variable in uppercase with `_` in place of spaces.
It's good practice to write your `bash` variable in uppercase with `_` in place of spaces.
You can access the value of an existing `bash` variable with the `$VAR_NAME`
......@@ -338,14 +311,14 @@ To display the value of your `IDENTITY` variable with `echo` you can write:
echo $IDENTITY
```
When you want to mix variable value and text you can use the two following syntax:
When you want to mix variable value and text, you can use the two following syntax:
```sh
echo "my name is "$IDENTITY
echo "my name is ${IDENTITY}"
```
Going back to the `printenv` You can see a **PWD** variable that store your current path, a **SHELL** variable that store your current shell, and you can see a **PATH** variable that stores a loot of file path separated by `:`.
Going back to the `printenv` You can see a **PWD** variable that store your current path, a **SHELL** variable that store your current shell, and you can see a **PATH** variable that stores a lot of file paths, each separated by `:`.
The **PATH** variable contains every folder where to look for executable programs. Executable programs can be binary files or text files with a **shebang**.
......@@ -361,7 +334,7 @@ echo $PATH
You can create a `scripts`folder and move your `download_hg38.sh` script in it. Then we can modify the `PATH` variable to include the `scripts` folder in it.
> Dont erase your `PATH` variable !
> Don't erase your `PATH` variable!
<details><summary>Solution</summary>
<p>
......@@ -375,37 +348,37 @@ PATH=$PATH:~/scripts/
You can check the result of your command with `echo $PATH`
Try to call your `download_hg38.sh` from anywhere on the file tree. Congratulation you installed your first UNIX program !
Try to call your `download_hg38.sh` from anywhere on the file tree. Congratulation you installed your first UNIX program!
## Arguments
### Arguments
You can pass argument to your bash scripts, writing the following command:
You can pass argument to your bash scripts with the following command:
```sh
my_script.sh arg1 arg2 arg3
```
Means that from within the script:
From within the script:
- `$0` will give you the name of the script (`my_script.sh`)
- `$1`, `$2`, `$3`, `$n` will give you the value of the arguments (`arg1`, `arg2`, `arg3`, `argn`)
- `$$` the process id of the current shell
- `$#` the total number of arguments passed to the script
- `$@`the value of all the arguments passed to the script
- `$@` the value of all the arguments passed to the script
- `$?` the exit status of the last executed command
- `$!`the process id of the last executed command
- `$!` the process id of the last executed command
You can write the following `variables.sh` script in your `scripts` folder:
```sh
#!/bin/bash
##!/bin/bash
echo "Name of the script: $0"
echo "Total number of arguments: $#"
echo "Values of all the arguments: $@"
```
And you can try to call it with some arguments !
And you can try to call it with some arguments!
> We have used the following commands:
......@@ -413,10 +386,10 @@ And you can try to call it with some arguments !
> - `echo` to display text
> - `xarg` to execute a chain of commands
> - `awk` to execute complex chain of commands
> - `;` `&&` and `||` to chain commands
> - `;`, `&&` and `||` to chain commands
> - `source` to load a script
> - `shebang` to specify the language of a script
> - `PATH` to install script
In the next session, we are going to learn how to execute command on other computers with [ssh.](./10_network_and_ssh.html)
In the next session, we are going to learn how to execute commands on other computers with [ssh.](./10_network_and_ssh.html)
project:
type: book
book:
title: "UNIX command line"
author:
- "Laurent Modolo"
date: "2023-10-09"
chapters:
- index.md
- 1_understanding_a_computer.qmd
- 2_using_the_ifb_cloud.qmd
- 3_first_steps_in_a_terminal.qmd
- 4_unix_file_system.qmd
- 5_users_and_rights.qmd
- 6_unix_processes.qmd
- 7_streams_and_pipes.qmd
- 8_text_manipulation.qmd
- 9_batch_processing.qmd
- 10_network_and_ssh.qmd
- 11_install_system_programs.qmd
- 12_virtualization.qmd
body-footer: "License: Creative Commons [CC-BY-SA-4.0](http://creativecommons.org/licenses/by-sa/4.0/).<br>Made with [Quarto](https://quarto.org/)."
navbar:
search: true
right:
- icon: git
href: https://gitbio.ens-lyon.fr/can/unix-command-line
text: Sources
# bibliography: references.bib
format:
html:
theme:
light: flatly
dark: darkly
execute:
cache: true
\ No newline at end of file
---
title: # Unix / command line training course
---
# Unix / command line training course
## Unix / command line training course {.unnumbered}
1. [Understanding a computer](./1_understanding_a_computer.html)
2. [Using the IFB cloud](./2_using_the_ifb_cloud.html)
......
# IFB cloud group description for UNIX training
## Short name
CAN UNIX 2023
## Full name
UNIX command line training
## Website
https://gitbio.ens-lyon.fr/can/unix-command-line/
---
# Detailed description
## Résumé (20 lignes)
Le [Conseil d'Analyse Numérique (CAN)](https://www.sfr-biosciences.fr/la-sfr/conseil-analyse-numerique/) de l'UAR [SFR BioSciences](https://www.sfr-biosciences.fr) (Lyon) organise une formation **"UNIX ligne de commandes"** à destination des membres des laboratoires de biologie affiliés (entre autres ceux hébergés à l'ENS de Lyon), laquelle débutera début octobre.
Cette formation s'étalera sur une quatorzaine de semaines à raison de 1h30 par semaine au semestre d'automne (suivant les demandes, elle aura lieu tous les ans au semestre d'automne).
Les objectifs de cette formation sont:
- comprendre le fonctionnement général d'un ordinateur
- interagir avec un système de type UNIX par une interface en ligne de commande
- utiliser des ressources distantes en ligne de commande
- installer et utiliser des programmes en ligne de commandes
- manipuler des données de types textuelles de manière automatisée
- avoir des notions d'administration sur ce type de système
- avoir des notions en virtualisation de système
## Informations pratiques
- Dates : 1h30/semaine de octobre 2022 à janvier 2023
- Lieu : Centre Blaise Pascal (CBP), ENS de Lyon
- Noms des formateurs : Laurent Modolo, Ghislain Durif, Mia Croiset
- Nombre de participants : 12
## Ressources demandées
Nous utiliserons des instances de l'*appliance* "LBMC Unix 2022" déjà utilisée l'année dernière. Les TPs portent sur la prise en main de commandes bash, donc les VM les plus petites suffisent, pour un TP d'1h30 nous aurons besoin de 2 vCPU par VM au plus.
### Outils et environnements
- une distribution Linux standard (Ubuntu)
- docker
- bash
- shellinabox (pour un accès au terminal via une interface web)
Tous ces outils sont intégrés dans l'*appliance* "LBMC Unix 2022".
### Resources informatiques
*Indiquer la quantité estimée de calcul et de stockage sur la durée totale de votre formation.*
* Taille max des VMs par participant : 2 vCPU, 2Go mémoire RAM, gabarit `ifb.tr.medium` (2c 2Go)
* Nombre total d'heures vCPU (vCPU.h) : 2 vCPU * (12 participants + 2 encadrants) * 14 séances * 1h30 = 588 vCPU.h
* Volume de stockage partagé : <1Go
* Besoins spécifiques
- grosse mémoire (RAM > 1 To) : NON
- GPU : NON
- haute fréquence processeur (> 3 GHz) : NON
- parallélisme : 2 vCPU/VM max
Disponibilité des ressources après la formation ? NON
---
## Start
10/10/2023
## End
31/01/2023
# Appel à formateurs/formatrices pour formations "UNIX ligne de commandes" et "R pour les débutant(e)s", automne 2024
Bonjour (english below)
---
TL;DR appel à formateurs/formatrices pour des formations hebdomadaire au semestre d'automne "UNIX ligne de commandes" (1 créneau) et "R pour les débutant(e)s" (4 créneaux), liens pour s'inscrire ci-dessous
---
Le CAN (https://www.sfr-biosciences.fr/la-sfr/conseil-analyse-numerique/) va organiser deux formations sur le semestre d'automne qui débuteront débuts octobre : "UNIX ligne de commandes" et "R pour les débutant(e)s". En option, une extension à Python d'une partie du contenu de la formation "R pour les débutant(e)s" sera proposée sur deux séances supplémentaires.
Ces formations s'étaleront sur une dizaine de semaines, jusqu'en décembre (sauf pendant les vacances de Toussaint et Noël, avec la possibilité de rajouter quelques séances en janvier si besoin) à raison de 1h30 de travaux pratiques par semaine avec 10 personnes par créneau.
Ces formations sont accessibles à tous les membres (permanents et non permanents) des laboratoires suivants (partenaires de la SFR BioSciences) : CIRI, IGFL, LBMC, RDP, MMSB, LBTI, IVPC, IBCP.
Afin d'animer ces formations, nous sommes à la recherche de formateurs et formatrices volontaires, idéalement au moins 2 par créneau.
Informations importantes :
- il n'y a pas de préparation, les supports sont prêts,
- c'est 1h30 par semaine, on peut facilement échanger si un jour on n'est pas dispo,
- les personnes formées sont vraiment débutantes donc les questions ne seront pas complexes et toute aide sera la bienvenue, il faut juste être à l'aise avec R ou avec la ligne de commande sur UNIX (e.g. OS Linux), pas besoin d'être un expert ou une experte,
- et surtout c'est super enrichissant!
Si vous êtes intéressé(e), vous pouvez vous inscrire via les liens suivants :
- UNIX ligne de commandes : https://framaforms.org/appel-a-formatrices-et-formateurs-pour-la-formation-unix-ligne-de-commande-call-for-trainers-for-0
- R pour les débutant(e)s : https://framaforms.org/appel-a-formatrices-et-formateurs-pour-la-formation-r-pour-les-debutantes-call-for-trainers-for-r-0
Planning :
- lundi 13h-14h30 : R pour les débutant(e)s
- mardi 11h-12h30 : UNIX ligne de commandes
- mercredi 11h-12h30 : R pour les débutant(e)s
- jeudi 11h-12h30 : R pour les débutant(e)s (in ENGLISH)
- vendredi 11h-12h30 : R pour les débutant(e)s
Si vous avez des questions, vous pouvez contacter:
- Laurent Gilquin (laurent.gilquin@ens-lyon.fr)
Merci d'avance,
Bien cordialement,
----
# Call for trainers for "R for beginners" and "UNIX command line" training sessions, fall 2024
Hi,
---
TL;DR call for trainers for the weekly training session during fall semester : "UNIX command line" (1 slot) and "R for beginners" (4 slots), link to register below
---
The CAN (https://www.sfr-biosciences.fr/la-sfr/conseil-analyse-numerique/) will organize two training session during the fall semester (starting in early October): "UNIX command line" and "R for beginners". As an option, an extension to Python of the "R for beginners" course will be proposed over two additional sessions.
These training sessions will take place during around ten weeks, until December (except during Toussaint and Christmas Holidays, with the possibility to add additional slots in January if necessary), with 1h30 of tutorial/practical every week and 10 trainees/slot.
These training sessions are available for all (permanent and non-permanent) members of the following labs (SFR BioSciences partners): CIRI, IGFL, LBMC, RDP, MMSB, LBTI, IVPC, IBCP.
We are looking for volunteers to be trainers, ideally 2 per slot.
Important information :
- no preparation required, training materials are ready,
- it is only 1h30 per week (or more if you want to teach multiple sessions), it is possible to switch spots with other trainers if you are not available one time,
- trainees will be beginners, so questions will not be too complex and any help is welcomed, you just need to be a regular UNIX command line user (e.g. Linux OS) or R user, no need to be an expert,
- and also it is highly rewarding!
If you are interested, please register using the following links :
- UNIX command line: https://framaforms.org/appel-a-formatrices-et-formateurs-pour-la-formation-unix-ligne-de-commande-call-for-trainers-for-0
- R for beginners: https://framaforms.org/appel-a-formatrices-et-formateurs-pour-la-formation-r-pour-les-debutantes-call-for-trainers-for-r-0
Schedule :
- Monday 13h-14h30 : R pour les débutant(e)s
- Tuesday 11h-12h30 : UNIX ligne de commandes
- Wednesday 11h-12h30 : R pour les débutant(e)s
- Thursday 11h-12h30 : R pour les débutant(e)s (in ENGLISH)
- Friday 11h-12h30 : R pour les débutant(e)s
If you have any questions, please contact:
- Laurent Gilquin (laurent.gilquin@ens-lyon.fr)
Thanks in advance,
Best regards,
# Formations "UNIX ligne de commandes" et "R pour les débutant(e)s", automne 2024
Bonjour (english below)
---
TL;DR formations hebdomadaire au semestre d'automne "UNIX ligne de commandes" (10 places sur 1 créneau) et "R pour les débutant(e)s" (40 places sur 4 créneaux), liens pour s'inscrire ci-dessous (merci de bien lire les informations ci-dessous)
---
Le CAN (https://www.sfr-biosciences.fr/la-sfr/conseil-analyse-numerique/) va organiser deux formations sur le semestre d'automne qui débuteront débuts octobre : "UNIX ligne de commandes" et "R pour les débutant(e)s". En option, une extension à Python d'une partie du contenu de la formation "R pour les débutant(e)s" sera proposée sur deux séances supplémentaires.
Ces formations s'étaleront sur une dizaine de semaines, jusqu'en décembre (sauf pendant les vacances de Toussaint et Noël, avec la possibilité de rajouter quelques séances en janvier si besoin) à raison de 1h30 de travaux pratiques par semaine.
Ces formations sont accessibles à tous les membres (permanents et non permanents) des laboratoires suivants (partenaires de la SFR BioSciences): CIRI, IGFL, LBMC, RDP, MMSB, LBTI, IVPC, IBCP.
Prérequis : avoir un compte @ens-lyon.fr (pour accéder aux ordinateurs des salles de TP) ou avoir un ordinateur portable avec accès à internet via eduroam et un navigateur internet récent (aucune installation spécifique nécessaire, les TPs se font via une plateforme spécifique accessible par son navigateur Internet).
Il y aura 1 créneau hebdomadaire (en français ou anglais suivant la demande) pour la formation UNIX (soit 10 places) et 4 créneaux hebdomadaires pour la formation R (soit 40 places au total), dont 1 créneau en anglais.
Si vous êtes intéressé(e), vous pouvez vous inscrire via les liens suivants :
- UNIX ligne de commandes : https://framaforms.org/formation-unix-ligne-de-commande-unix-command-line-training-session-1720512376
- R pour les débutant(e)s : https://framaforms.org/formation-r-pour-les-debutantes-r-for-beginners-training-session-1720512419
IMPORTANT : En vous inscrivant vous vous engagez à venir sur l'ensemble de la formation (sauf absence ponctuelle en cas d'impératif professionnel ou personnel évidemment). Les attestations de formation ne seront délivrées qu'aux personnes ayant suivi au moins 80% des séances de leur groupe.
Planning :
- lundi 13h-14h30 : R pour les débutant(e)s
- mardi 11h-12h30 : UNIX ligne de commandes
- mercredi 11h-12h30 : R pour les débutant(e)s
- jeudi 11h-12h30 : R pour les débutant(e)s (in ENGLISH)
- vendredi 11h-12h30 : R pour les débutant(e)s
Si vous avez des questions, vous pouvez contacter:
- Laurent Gilquin (laurent.gilquin@ens-lyon.fr)
# Note : un e-mail différent sera envoyé pour en appeler aux formateurs et formatrices volontaires
Les objectifs de ces formations sont :
Pour "UNIX ligne de commandes" :
- comprendre le fonctionnement général d'un ordinateur
- interagir avec un système de type UNIX par une interface en ligne de commande
- utiliser des ressources distantes en ligne de commande
- installer et utiliser des programmes en ligne de commandes
- manipuler des données de types textuelles de manière automatisée
- avoir des notions d'administration sur ce type de système
- avoir des notions en virtualisation de système
Pour "R pour les débutant(e)s" :
- apprendre les bases du langages R
- apprendre à utiliser l’IDE Rstudio
- importer des tables de données
- filtrer et trier des tables de données
- réorganiser des tables de données
- réaliser des figures
- manipuler des EXpressions REGulières (regex)
Bien cordialement,
----
# "R for beginners" and "UNIX command line" training sessions, fall 2024
Hi,
---
TL;DR weekly training session during fall semester : "UNIX command line" (10 places on 1 slot) and "R for beginners" (40 places on 4 slots), link to register below (thanks to carefully read the information below)
---
The CAN (https://www.sfr-biosciences.fr/la-sfr/conseil-analyse-numerique/) will organize two training session during the fall semester (starting in early October): "UNIX command line" and "R for beginners". As an option, an extension to Python of the "R for beginners" course will be proposed over two additional sessions.
These training sessions will take place during around ten weeks, until December (except during Toussaint and Christmas Holidays, with the possibility to add additional slots in January if necessary), with 1h30 of tutorial/practical every week.
These training sessions are available for all (permanent and non-permanent) members of the following labs (SFR BioSciences partners): CIRI, IGFL, LBMC, RDP, MMSB, LBTI, IVPC, IBCP.
Requirement: having an @ens-lyon.fr account (to access computers during the session) or having a laptop with a working eduroam Internet access and a recent web browser (no further installation is required, all practicals will be done via a specific platform available through the web browser).
There will be 1 weekly session (in french or in english depending on the registered persons) pour the UNIX training (10 places) and 4 weekly sessions for the R training (40 places), including 1 session in english.
If you are interested, please register using the following links :
- UNIX command line: https://framaforms.org/formation-unix-ligne-de-commande-unix-command-line-training-session-1720512376
- R for beginners: https://framaforms.org/formation-r-pour-les-debutantes-r-for-beginners-training-session-1720512419
IMPORTANT: By registering, you agree to attend the entire course (except for one-time absences in case of professional or personal imperative obviously). Training certificates will only be delivered to people who have attended at least 80% of their group's sessions.
Schedule :
- Monday 13h-14h30 : R pour les débutant(e)s
- Tuesday 11h-12h30 : UNIX ligne de commandes
- Wednesday 11h-12h30 : R pour les débutant(e)s
- Thursday 11h-12h30 : R pour les débutant(e)s (in ENGLISH)
- Friday 11h-12h30 : R pour les débutant(e)s
If you have any questions, please contact:
- Laurent Gilquin (laurent.gilquin@ens-lyon.fr)
# Note: a separate e-mail will be send to call for trainers
The contents of the these training sessions are the following :
For "UNIX command line":
- understand the general functioning of a computer
- interact with a UNIX-type system through the command line interface
- use remote resources through command line
- install and use software and programs through command line
- automatically manipulate text data
- learn basic knowledge of system administration
- learn basic knowledge of system virtualization
For "R for beginners":
- learn basic knowledge of R language programming
- use Rstudio IDE (Integrated Development Environment)
- import data table/array
- filter and sort data table
- reorganize data table
- generate graphics and plots
- manipulate REGular EXpressions (regex)
Best regards,