Another Japan in the World

Jun Aruga's blog.

Sequencing: Run mapping

This is the next article from Sequencing: Install mapping tools - Another Japan in the World.

Try to do mapping by Hisat2 with previous fastq files trimmed by FastQC.

Go to Hisat2 official web [1], click H. sapiens GRCh38 "genome" link, and download index data file of reference genome data on the right side of the page.

$ wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38.tar.gz

$ tar xzf grch38.tar.gz

$ cd grch38

$ ls -l
total 8844024
-rw-r--r--  1 jun.aruga  staff   986172031 Mar 12  2016 genome.1.ht2
-rw-r--r--  1 jun.aruga  staff   736462272 Mar 12  2016 genome.2.ht2
-rw-r--r--  1 jun.aruga  staff       11294 Mar 12  2016 genome.3.ht2
-rw-r--r--  1 jun.aruga  staff   736462267 Mar 12  2016 genome.4.ht2
-rw-r--r--  1 jun.aruga  staff  1295322177 Mar 12  2016 genome.5.ht2
-rw-r--r--  1 jun.aruga  staff   749943562 Mar 12  2016 genome.6.ht2
-rw-r--r--  1 jun.aruga  staff           8 Mar 12  2016 genome.7.ht2
-rw-r--r--  1 jun.aruga  staff           8 Mar 12  2016 genome.8.ht2
-rwxr-xr-x  1 jun.aruga  staff        1504 Mar 17  2016 make_grch38.sh

Then a directory that I created trimmed fastq files.

$ cd /path/to/SRA067162/trimmed_by_quality_28

$ ls *.fq
SRR747784_1_val_1.fq  SRR747784_2_val_2.fq

$ mkdir hisat2_out

Run mapping. sam (Sequence Alignment Map) [2] file is created as a output. I needed to wait a few minutes.

-x option is to set a prefix of the index files. If you extracted the index files in /Users/jun.aruga/doc/dev/bio/reference_genome/grch38/genome.1.ht2, set /Users/jun.aruga/doc/dev/bio/reference_genome/grch38/genome without the string .1.ht2.

$ hisat2 -x /Users/jun.aruga/doc/dev/bio/reference_genome/grch38/genome -1 SRR747784_1_val_1.fq -2 SRR747784_2_val_2.fq -S hisat2_out/SRR747784.sam
847491 reads; of these:
  847491 (100.00%) were paired; of these:
    60665 (7.16%) aligned concordantly 0 times
    765182 (90.29%) aligned concordantly exactly 1 time
    21644 (2.55%) aligned concordantly >1 times
    ----
    60665 pairs aligned concordantly 0 times; of these:
      469 (0.77%) aligned discordantly 1 time
    ----
    60196 pairs aligned 0 times concordantly or discordantly; of these:
      120392 mates make up the pairs; of these:
        111327 (92.47%) aligned 0 times
        6086 (5.06%) aligned exactly 1 time
        2979 (2.47%) aligned >1 times
93.43% overall alignment rate

$ ls -hl hisat2_out/SRR747784.bam
-rw-r--r--  1 jun.aruga  staff   546M May 20 13:54 hisat2_out/SRR747784.bam

Install samtools [3] to convert sam file to bam (Binary Alignment Map) file [4]. sam format is good to read by human. bam format is good to read by machine. Later I am going to use the bam file by a program.

$ brew install samtools

$ samtools --version
samtools 1.8
Using htslib 1.8
Copyright (C) 2018 Genome Research Ltd.

To convert the sam file to the bam file.

$ samtools view -b SRR747784.sam -o SRR747784.bam

As a reference, I'd like to show you the command to convert from bam file to sam file, though I have not use this command in this case.

$ samtools view -h SRR747784.bam -o SRR747784_new.sam