-
Laurent Modolo authoredLaurent Modolo authored
Batch processing
Objective: Learn basics of batch processing in GNU/Linux
In the previous section, we have seen how to handle streams and text. We can use this knowledge to generate list of command instead of text. This is called batch processing.
In everyday life, you may want to run command sequentiality without using pipes.
To run CMD1
and then run CMD2
you can use the ;
operator
CMD1 ; CMD2
To run CMD1
and then run CMD2
if CMD1
didn’t throw an error, you can use the &&
operator which is safer than the ;
operator.
CMD1 && CMD2
You can also use the ||
to manage errors and run CMD2
if CMD1
failed.
CMD1 || CMD2
Executing list of commands
The easiest option to execute list of command is to use xargs
. xargs
reads arguments from stdin and use them as argument for a command. In UNIX systems the command echo
send string of character into stdout. We are going to use this command to learn more about xargs
.
echo "hello world"
In general a string of character differs from a command when it’s placed between quotes.
The two following commands are equivalent, why ?
echo "file1 file2 file3" | xargs touch
touch file1 file2 file3
You can display the command executed by xargs
with the switch -t
.
By default the number of arguments sent by xargs
is defined by the system. You can change it with the option -n N
, where N
is the number of arguments sent. Use the option -t
and -n
to run the previous command as 3 separate touch
commands.
Solution
```sh echo "file1 file2 file3" | xargs -t -n 1 touch ```
Sometime, the arguments are not separated by space but by other characters. You can use the -d
option to specify them. Execute touch
1 time from the following command:
echo "file1;file2;file3"
Solution
```sh echo "file1;file2;file3" | xargs -t -d \; touch ```
To reuse the arguments sent to xargs
you can use the command -I
which defines a string corresponding to the argument. Try the following command, what does the manual says about the -c
option of the command cut
?
ls -l file* | cut -c 44- | xargs -t -I % ln -s % link_%
Instead of using ls
the command xargs
is often used with the command find
. The command find
is a powerful command to search for files.
Modify the following command to make a non-hidden copy of all the file with a name starting with .bash in your home folder
find . -name ".bash*" | sed 's|./.||g'
Solution
```sh find . -name ".bash*" | sed 's|./.||g' | xargs -t -I % cp .% % ```
You can try to remove all the files in the /tmp
folder with the following command:
find /tmp/ -type f | xargs -t rm
Modify this command to remove every folder in the /tmp
folder.
Solution
```sh find /tmp/ -type d | xargs -t rm -R ```
awk
commands
Writing xargs
It is a simple solution for writing batch commands, but if you want to write more complex command you are going to need to learn awk
. awk
is a programming language by itself, but you don’t need to know everything about awk
to use it.
You can to think of awk
as a xargs -I $N
command where $1
correspond to the first column $2
to the second column, etc.
There are also some predefined variables that you can use like.
-
$0
Correspond to all the columns. -
FS
the field separator used -
NF
the number of fields separated byFS
-
NR
the number for records already read
A awk
program is a chain of commands with the form motif { action }
- the
motif
define where thereaction
is executed - there
action
is what you want to do
They motif
can be
- a regexp
- The keyword
BEGIN
orEND
(before reading the first line, and after reading the last line) - a comparison like
<
,<=
,==
,>=
,>
or!=
- a combination of the three separated by
&&
(AND),||
(OR) and!
(Negation) - a range of line
motif_1,motif_2
With awk
you can
Count the number of lines in a file
awk '{ print NR " : " $0 }' file
Modify this command to only display the total number of line with awk (like wc -l
)
Solution
```sh awk 'END{ print NR }' file ```
Convert a tabulated sequences file into fasta format
awk -vOFS='' '{print ">",$1,"\n",$2,"\n";}' two_column_sample_tab.txt > sample1.fa
Modify this command to only get a list of sequence names in a fasta file