This is the next article from Sequencing: Install mapping tools - Another Japan in the World.
Try to do mapping by Hisat2 with previous fastq files trimmed by FastQC.
Go to Hisat2 official web [1], click H. sapiens GRCh38 "genome" link, and download index data file of reference genome data on the right side of the page.
$ wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38.tar.gz $ tar xzf grch38.tar.gz $ cd grch38 $ ls -l total 8844024 -rw-r--r-- 1 jun.aruga staff 986172031 Mar 12 2016 genome.1.ht2 -rw-r--r-- 1 jun.aruga staff 736462272 Mar 12 2016 genome.2.ht2 -rw-r--r-- 1 jun.aruga staff 11294 Mar 12 2016 genome.3.ht2 -rw-r--r-- 1 jun.aruga staff 736462267 Mar 12 2016 genome.4.ht2 -rw-r--r-- 1 jun.aruga staff 1295322177 Mar 12 2016 genome.5.ht2 -rw-r--r-- 1 jun.aruga staff 749943562 Mar 12 2016 genome.6.ht2 -rw-r--r-- 1 jun.aruga staff 8 Mar 12 2016 genome.7.ht2 -rw-r--r-- 1 jun.aruga staff 8 Mar 12 2016 genome.8.ht2 -rwxr-xr-x 1 jun.aruga staff 1504 Mar 17 2016 make_grch38.sh
Then a directory that I created trimmed fastq files.
$ cd /path/to/SRA067162/trimmed_by_quality_28 $ ls *.fq SRR747784_1_val_1.fq SRR747784_2_val_2.fq $ mkdir hisat2_out
Run mapping. sam (Sequence Alignment Map) [2] file is created as a output. I needed to wait a few minutes.
-x
option is to set a prefix of the index files. If you extracted the index files in /Users/jun.aruga/doc/dev/bio/reference_genome/grch38/genome.1.ht2
, set /Users/jun.aruga/doc/dev/bio/reference_genome/grch38/genome
without the string .1.ht2
.
$ hisat2 -x /Users/jun.aruga/doc/dev/bio/reference_genome/grch38/genome -1 SRR747784_1_val_1.fq -2 SRR747784_2_val_2.fq -S hisat2_out/SRR747784.sam 847491 reads; of these: 847491 (100.00%) were paired; of these: 60665 (7.16%) aligned concordantly 0 times 765182 (90.29%) aligned concordantly exactly 1 time 21644 (2.55%) aligned concordantly >1 times ---- 60665 pairs aligned concordantly 0 times; of these: 469 (0.77%) aligned discordantly 1 time ---- 60196 pairs aligned 0 times concordantly or discordantly; of these: 120392 mates make up the pairs; of these: 111327 (92.47%) aligned 0 times 6086 (5.06%) aligned exactly 1 time 2979 (2.47%) aligned >1 times 93.43% overall alignment rate $ ls -hl hisat2_out/SRR747784.bam -rw-r--r-- 1 jun.aruga staff 546M May 20 13:54 hisat2_out/SRR747784.bam
Install samtools [3] to convert sam file to bam (Binary Alignment Map) file [4]. sam format is good to read by human. bam format is good to read by machine. Later I am going to use the bam file by a program.
$ brew install samtools $ samtools --version samtools 1.8 Using htslib 1.8 Copyright (C) 2018 Genome Research Ltd.
To convert the sam file to the bam file.
$ samtools view -b SRR747784.sam -o SRR747784.bam
As a reference, I'd like to show you the command to convert from bam file to sam file, though I have not use this command in this case.
$ samtools view -h SRR747784.bam -o SRR747784_new.sam
- [1] Hisat2: https://ccb.jhu.edu/software/hisat2/
- [2] SAM file format: https://en.wikipedia.org/wiki/SAM_(file_format)
- [3] samtools: https://github.com/samtools/samtools
- [4] BAM file format: https://en.wikipedia.org/wiki/Binary_Alignment_Map