7 useful Linux commands in bioinformatics

This post compiles 7 useful Linux commands in bioinformatics with examples: cat, wc, du, grep, awk, sed, and find. However, I will not review pwd, mkdir, cd, ls, mv, rm, head, tail, and wget which are also essential!

Cat

Create an empty file

cat /dev/null > myfile.txt

Wc

1. Print the number of lines in a file

wc -l myfile.txt

2. Print the number of files in the folder

ls -f | wc -l

Du

Print the disk usage of a file in bytes

du -sb myfile.txt

Grep

1. Print all the lines that contain “chr” in a file

grep chr myfile.txt

2. Print the first line that contains “chr” in a file and stops reading the file

grep -m 1 chr myfile.txt

3. Print all the lines that contain “chr” in a gzipped file

zgrep chr myfile.txt.gz

Awk

1. Remove “chr” whenever it is found at the beginning of a line in a file

awk '{sub(/^chr/,"",$0)} 1' myfile.txt

2. Append a file with the content of a variable

awk -v awkvar=8 '{print $0"\t"awkvar}' myfile.txt

3. Print the nth column of a file depending on a variable

awk -v awkvar=8 '{print $awkvar}' myfile.txt

4. Print the nth column of a file depending on a variable with condition

awk -v awkvar=${cell_type_1} '{if ($1 == awkvar) print $2}' myfile.txt

Sed

1. Substitute a word by another one in a file:

sed "s/old_name/new_name/" myfile.txt

2. Substitute the content of one variable by another one in a file:

sed "s|${old_name}|${new_name}|" myfile.txt

Find

1. Find all the fastq files above 10,000,000 bytes

find *.fastq.gz -type f -size +10000000c

2. Find all the fastq files above 10,000 Mb

find *.fastq.gz -type f -size +10000M

3. Find all the fastq files above 10 Gb

find *.fastq.gz -type f -size +10G

4. Find all the fastq files above 10 Gb but below 40 Gb

find *.fastq.gz -type f -size +10G -size -40G

5. Delete all the fastq files of exactly 10 Gb

find *.fastq.gz -type f -size 10G -delete

6. Delete all the fastq files without descending in sub directories

find *.fastq.gz -type f -delete -maxdepth 0

Conclusion

To sum up, I have presented some examples of applications for 7 useful Linux commands in bioinformatics. Did this post help you?

Related posts

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply