Another Japan in the World

Jun Aruga's blog.

Sequencing: Install mapping tools

I tried to do mapping the genome data I used before.

I tried to install a tool for mapping of the genome. I tried 2 tools: TopHat 2 [1], HISAT2 [2] for that.

The result is

  • TopHat 2: I could not install it.
  • HISAT 2: I could install it.

This article is to log for these 2 tools.

By the way, "alignment" and "mapping" have been used as a synonymous. [3] But nowadays "alignment" is not a way to to do the result "mapping" as a possibility.

TopHat2

TopHat looks popular tool for the mapping. We can see the article in EBI. There are TopHat and the next version TopHat2.

There are 2 dependency libraries "boost" [5] and "bowtie". I could install those building from source. But on Mac environment, installing by brew install is easier.

$ brew install boost
$ brew install bowtie2

I used GCC 5, as I saw the install was tried on GCC 5.

$ git clone git@github.com:infphilo/tophat.git

$ cd tophat

$ brew install gcc@5

$ /usr/local/bin/gcc-5 --version
gcc-5 (Homebrew GCC 5.5.0_2) 5.5.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./autogen.sh

$ ./configure --prefix=/usr/local/tophat-2-dev CC=gcc-5

...
-- tophat 2.1.1 Configuration Results --
  C++ compiler:        g++ -Wall -Wno-strict-aliasing -g -gdwarf-2 -Wuninitialized  -O3  -DNDEBUG -I./samtools-0.1.18 -pthread -I/usr/local/include -I./SeqAn-1.4.2
  Linker flags:        -L./samtools-0.1.18 -L/usr/local/lib
  BOOST libraries:     -lboost_thread-mt -lboost_system
  GCC version:         gcc-5 (Homebrew GCC 5.5.0_2) 5.5.0
  Host System type:    x86_64-apple-darwin17.4.0
  Install prefix:      /usr/local/tophat-2-dev
  Install eprefix:     ${prefix}

  See config.h for further configuration information.
  Email bug reports to <tophat.cufflinks@gmail.com>.

$ make 2>&1 | tee -a make.log
...
gcc -c -g -Wall -O2  -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_USE_KNETFILE -D_CURSES_LIB=0 -I. bam2depth.c -o bam2depth.o
gcc -g -Wall -O2  -o samtools_0.1.18 bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o -Lbcftools  libbam.a -lbcf -lm -lz #-lcurses
Undefined symbols for architecture x86_64:
  "___ks_insertsort_heap", referenced from:
      _ks_combsort_heap in libbam.a(bam_sort.o)
      _ks_introsort_heap in libbam.a(bam_sort.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [samtools_0.1.18] Error 1
make[2]: *** [libbam.a] Error 2
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Then according to this article [7], TopHat 2 The author is recommending using HISAT2 rather than TopHat2.

So, I moved on HISAT2, leaving the error.

HISAT2

The install is success.

$ git clone git@github.com:infphilo/hisat2.git

$ cd hisat2

$ make

$ ./hisat2 --version
/Users/jun.aruga/git/hisat2/hisat2-align-s version 2.1.0
64-bit
Built on users-MacBook-Air.local
Fri May  4 00:29:47 CEST 2018
Compiler: InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Options: -O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

$ ls -l hisat2
-rwxr-xr-x  1 jun.aruga  staff  18181 May  4 00:25 hisat2*

$ ./hisat2 --help
...

Consideration

I would see the bioinformatics tools like an IT programming tools used in IT industry Python, Ruby of old ages. The bioinformatics' CI environment could be improved. The reason is the the number of users for bioinformatics are much smaller than the number for IT tools such as Python, Ruby. Bioinformatics is in a dawn of the age. This could be improved by people working for tomorrow.

References