This post compiles 7 useful Linux commands in bioinformatics with examples: cat, wc, du, grep, awk, sed, and find. However, I will not review pwd, mkdir, cd, ls, mv, rm, head, tail, and wget which are also essential!
Cat
Create an empty file
cat /dev/null > myfile.txt
Wc
1. Print the number of lines in a file
wc -l myfile.txt
2. Print the number of files in the folder
ls -f | wc -l
Du
Print the disk usage of a file in bytes
du -sb myfile.txt
Grep
1. Print all the lines that contain “chr” in a file
grep chr myfile.txt
2. Print the first line that contains “chr” in a file and stops reading the file
grep -m 1 chr myfile.txt
3. Print all the lines that contain “chr” in a gzipped file
zgrep chr myfile.txt.gz
Awk
1. Remove “chr” whenever it is found at the beginning of a line in a file
awk '{sub(/^chr/,"",$0)} 1' myfile.txt
2. Append a file with the content of a variable
awk -v awkvar=8 '{print $0"\t"awkvar}' myfile.txt
3. Print the nth column of a file depending on a variable
awk -v awkvar=8 '{print $awkvar}' myfile.txt
4. Print the nth column of a file depending on a variable with condition
awk -v awkvar=${cell_type_1} '{if ($1 == awkvar) print $2}' myfile.txt
Sed
1. Substitute a word by another one in a file:
sed "s/old_name/new_name/" myfile.txt
2. Substitute the content of one variable by another one in a file:
sed "s|${old_name}|${new_name}|" myfile.txt
Find
1. Find all the fastq files above 10,000,000 bytes
find *.fastq.gz -type f -size +10000000c
2. Find all the fastq files above 10,000 Mb
find *.fastq.gz -type f -size +10000M
3. Find all the fastq files above 10 Gb
find *.fastq.gz -type f -size +10G
4. Find all the fastq files above 10 Gb but below 40 Gb
find *.fastq.gz -type f -size +10G -size -40G
5. Delete all the fastq files of exactly 10 Gb
find *.fastq.gz -type f -size 10G -delete
6. Delete all the fastq files without descending in sub directories
find *.fastq.gz -type f -delete -maxdepth 0
Conclusion
To sum up, I have presented some examples of applications for 7 useful Linux commands in bioinformatics. Did this post help you?