Another Japan in the World

Jun Aruga's blog.

Sequencing: Quality Check by fastqc

This article to demonstrate how to do quality check for the data from Illumina Sequencer.

Download FASTQ file

Use publicly uploaded sequencing data on DRA (DDBJ Sequence Read Archive) [1]. You can also download same data from SRA (NCBI Sequence Read Archive) or ERA(EBI Sequence Read Archive [1].

Go to DRASearch page, clicking Search button on the top of the page [1].

On the DRASearch page, I selected

  • Organism: Homo sapiens
  • StudyType: Exome Sequencing
  • Platform: Illumina

f:id:happybirthday:20180422192617p:plain

This time, I would select a submitted data Submission ID: SRA067162 [2] whose base number is small (1.3 G bases). Thank you for sharing the data!

You can see FASTQ data file [3][4] links in the right side of [2]. The FASTQ (.fastq) file is like FASTA (.fasta) file + quality check data.

Click FASTQ link right side of SRR747784 that is the 1st Run data.

Now you can see the URL [5].

Download the 2 .fastq file.

$ cd YOUR_PROJECT_DIR

$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA067/SRA067162/SRX242800/SRR747784_1.fastq.bz2
$ wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA067/SRA067162/SRX242800/SRR747784_2.fastq.bz2

Extract the files.

$ ls -lh
total 239632
-rw-r--r--  1 jun.aruga  staff    57M Apr 22 09:44 SRR747784_1.fastq.bz2
-rw-r--r--  1 jun.aruga  staff    58M Apr 22 09:46 SRR747784_2.fastq.bz2

$ bunzip2 SRR747784_1.fastq.bz2
$ bunzip2 SRR747784_2.fastq.bz2

$ ls -lh
total 919552
-rw-r--r--  1 jun.aruga  staff   224M Apr 22 09:44 SRR747784_1.fastq
-rw-r--r--  1 jun.aruga  staff   224M Apr 22 09:46 SRR747784_2.fastq

Run fastqc

Download fastqc application from the web site [6]. As I am using Mac, I would download the Mac dmg file. Copy FastQC.app to /Applications to install it.

GUI

By click the installed, you can see the GUI application.

CUI command line

The command is installed in below path.

$ /Applications/FastQC.app/Contents/MacOS/fastqc --version
FastQC v0.11.7
$ export PATH=$PATH:/Applications/FastQC.app/Contents/MacOS

$ which fastqc
/Applications/FastQC.app/Contents/MacOS/fastqc

$ fastqc --version
FastQC v0.11.7
$ cd YOUR_PROJECT_DIR

$ mkdir qc_out

$ fastqc -o qc_out SRR747784_1.fastq SRR747784_2.fastq
Started analysis of SRR747784_1.fastq
Approx 5% complete for SRR747784_1.fastq
Approx 10% complete for SRR747784_1.fastq
...
Approx 95% complete for SRR747784_1.fastq
Analysis complete for SRR747784_1.fastq
Started analysis of SRR747784_2.fastq
Approx 5% complete for SRR747784_2.fastq
Approx 10% complete for SRR747784_2.fastq
...
Approx 95% complete for SRR747784_2.fastq
Analysis complete for SRR747784_2.fastq

The files are created in the qc_out/.

$ ls -l qc_out/
total 1848
-rw-r--r--@ 1 jun.aruga  staff  220242 Apr 22 10:22 SRR747784_1_fastqc.html
-rw-r--r--  1 jun.aruga  staff  228325 Apr 22 10:22 SRR747784_1_fastqc.zip
-rw-r--r--  1 jun.aruga  staff  221286 Apr 22 10:22 SRR747784_2_fastqc.html
-rw-r--r--  1 jun.aruga  staff  230648 Apr 22 10:22 SRR747784_2_fastqc.zip

See the .html file to see the result report.

f:id:happybirthday:20180422192059p:plain

Understanding the result report

The example reports "Good Illumina Data" and "Bad Illumina Data" in the FastQC official page [6] are helpful to understand the report.

There is also a tutorial video by FastQC project. www.youtube.com