Skip to content
Snippets Groups Projects
Commit 9fd8ff32 authored by Gilquin's avatar Gilquin
Browse files

fix: harmonize filenames in awk exercises

parent b5bd764b
No related branches found
No related tags found
1 merge request!2fix: correct some errors
...@@ -184,13 +184,13 @@ awk -vFS='\t' -vOFS='' '{print $1 "\n";}' two_column_sample_tab.txt > seq_name.t ...@@ -184,13 +184,13 @@ awk -vFS='\t' -vOFS='' '{print $1 "\n";}' two_column_sample_tab.txt > seq_name.t
Convert a multiline fasta file into a single line fasta file Convert a multiline fasta file into a single line fasta file
```sh ```sh
awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' sample.fa > sample1_singleline.fa awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' sample.fa > sample_singleline.fa
``` ```
Convert fasta sequences to uppercase Convert fasta sequences to uppercase
```sh ```sh
awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' file.fasta > file_upper.fasta awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' sample.fa > sample_upper.fa
``` ```
Modify this command to only get a list of sequence names in a fasta file un lowercase Modify this command to only get a list of sequence names in a fasta file un lowercase
...@@ -198,7 +198,7 @@ Modify this command to only get a list of sequence names in a fasta file un lowe ...@@ -198,7 +198,7 @@ Modify this command to only get a list of sequence names in a fasta file un lowe
<details><summary>Solution</summary> <details><summary>Solution</summary>
<p> <p>
```sh ```sh
awk '/[^>]/ {print(tolower($0))}' file.fasta > seq_name_lower.txt awk '/[^>]/ {print(tolower($0))}' sample.fa > seq_name_lower.txt
``` ```
</p> </p>
</details> </details>
...@@ -206,19 +206,19 @@ awk '/[^>]/ {print(tolower($0))}' file.fasta > seq_name_lower.txt ...@@ -206,19 +206,19 @@ awk '/[^>]/ {print(tolower($0))}' file.fasta > seq_name_lower.txt
Return a list of sequence_id sequence_length from a fasta file Return a list of sequence_id sequence_length from a fasta file
```sh ```sh
awk 'BEGIN {OFS = "\n"}; /^>/ {print(substr(sequence_id, 2)" "sequence_length); sequence_length = 0; sequence_id = $0}; /^[^>]/ {sequence_length += length($0)}; END {print(substr(sequence_id, 2)" "sequence_length)}' file.fasta awk 'BEGIN {OFS = "\n"}; /^>/ {print(substr(sequence_id, 2)" "sequence_length); sequence_length = 0; sequence_id = $0}; /^[^>]/ {sequence_length += length($0)}; END {print(substr(sequence_id, 2)" "sequence_length)}' sample.fa
``` ```
Count the number of bases in a fastq.gz file Count the number of bases in a fastq.gz file
```sh ```sh
(gzip -dc $0) | awk 'NR%4 == 2 {basenumber += length($0)} END {print basenumber}' sample.fq.gz | (gzip -dc $0) | awk 'NR%4 == 2 {basenumber += length($0)} END {print basenumber}'
``` ```
Only read with more than 20bp from a fastq Extract the reads with more than 20bp from a fastq
```sh ```sh
awk 'BEGIN {OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) >= 20){print header, seq, qheader, qseq}}' < input.fastq > output.fastq awk 'BEGIN {OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) >= 20){print header, seq, qheader, qseq}}' < input.fq > output.fq
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment