bioinformatics-recipes

Useful recipes for bioinformatics research.

Subsampling reads from FASTQ file with awk one-liner

Recently I’ve been entertained by the powerfulness of awk commands in bioinformatics. In this post, I’m going to explain the underlying principles in this awk-extensive BASH one-liners written for random-sampling of FASTQ reads, step-by-step. 1. Random-sampling single-read FASTQ The one-liner is given as below. Note that I fixed some parts in the code which might cause some logical errors. It is pretty illegible, huh? Nevertheless, believe me, the code is