Jun's Blog

Output, activities, memo and etc.

Install and test Trinity, the De novo assemble tool

This article is about how to install and test Trinity RNA seq de novo assemble tool on Mac OSX [1].

Install

Seeing Trinity .travis.yml [2] is good to know it.

First, we need to use a compiler supporting Open MP [3] (= Open Multi-Processing). Mac's default gcc (= actually = clang) does not support Open MP. Latest GCC 8 (gcc-8, g++-8) is supporting it.

In this time, use GCC 8.

$ brew install gcc-8
$ brew install g++-8

Then install htslib, Samtools, Bowtie2, Jellyfish, Salmon and numpy as a dependency packages.

Install Trinity from the source code.

$ git clone git@github.com:trinityrnaseq/trinityrnaseq.git

$ cd trinityrnaseq

$ CC=gcc-8 CXX=g++-8 make

$ CC=gcc-8 CXX=g++-8 make plugins

$ ./Trinity --help



###############################################################################
#

     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

    Trinity-v2.8.4


#
#
# Required:
#
#  --seqType <string>      :type of reads: ('fa' or 'fq')
#
#  --max_memory <string>      :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc)
#                            provided in Gb of RAM, ie.  '--max_memory 10G'
#
#  If paired reads:
#      --left  <string>    :left reads, one or more file names (separated by commas, no spaces)
#      --right <string>    :right reads, one or more file names (separated by commas, no spaces)
#
#  Or, if unpaired reads:
#      --single <string>   :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired )
#
#  Or,
#      --samples_file <string>         tab-delimited text file indicating biological replicate relationships.
#                                   ex.
#                                        cond_A    cond_A_rep1    A_rep1_left.fq    A_rep1_right.fq
#                                        cond_A    cond_A_rep2    A_rep2_left.fq    A_rep2_right.fq
#                                        cond_B    cond_B_rep1    B_rep1_left.fq    B_rep1_right.fq
#                                        cond_B    cond_B_rep2    B_rep2_left.fq    B_rep2_right.fq
#
#                      # if single-end instead of paired-end, then leave the 4th column above empty.
#
####################################
##  Misc:  #########################
#
#  --include_supertranscripts      :yield supertranscripts fasta and gtf files as outputs.
#
#  --SS_lib_type <string>          :Strand-specific RNA-Seq read orientation.
#                                   if paired: RF or FR,
#                                   if single: F or R.   (dUTP method = RF)
#                                   See web documentation.
#
#  --CPU <int>                     :number of CPUs to use, default: 2
#  --min_contig_length <int>       :minimum assembled contig length to report
#                                   (def=200)
#
#  --long_reads <string>           :fasta file containing error-corrected or circular consensus (CCS) pac bio reads
#                                   (** note: experimental parameter **, this functionality continues to be under development)
#
#  --genome_guided_bam <string>    :genome guided mode, provide path to coordinate-sorted bam file.
#                                   (see genome-guided param section under --show_full_usage_info)
#
#  --jaccard_clip                  :option, set if you have paired reads and
#                                   you expect high gene density with UTR
#                                   overlap (use FASTQ input file format
#                                   for reads).
#                                   (note: jaccard_clip is an expensive
#                                   operation, so avoid using it unless
#                                   necessary due to finding excessive fusion
#                                   transcripts w/o it.)
#
#  --trimmomatic                   :run Trimmomatic to quality trim reads
#                                        see '--quality_trimming_params' under full usage info for tailored settings.
#
#
#  --no_normalize_reads            :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 200.
#                                       see '--normalize_max_read_cov' under full usage info for tailored settings.
#                                       (note, as of Sept 21, 2016, normalization is on by default)
#
#  --no_distributed_trinity_exec   :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list.
#
#
#  --output <string>               :name of directory for output (will be
#                                   created if it doesn't already exist)
#                                   default( your current working directory: "/Users/jun.aruga/git/trinityrnaseq/trinity_out_dir"
#                                    note: must include 'trinity' in the name as a safety precaution! )
#
#  --workdir <string>              :where Trinity phase-2 assembly computation takes place (defaults to --output setting).
#                                  (can set this to a node-local drive or RAM disk)
#
#  --full_cleanup                  :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta
#
#  --cite                          :show the Trinity literature citation
#
#  --verbose                       :provide additional job status info during the run.
#
#  --version                       :reports Trinity version (Trinity-v2.8.4) and exits.
#
#  --show_full_usage_info          :show the many many more options available for running Trinity (expert usage).
#
#
###############################################################################
#
#  *Note, a typical Trinity command might be:
#
#        Trinity --seqType fq --max_memory 50G --left reads_1.fq  --right reads_2.fq --CPU 6
#
#            (if you have multiple samples, use --samples_file ... see above for details)
#
#    and for Genome-guided Trinity, provide a coordinate-sorted bam:
#
#        Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G
#                --genome_guided_max_intron 10000 --CPU 6
#
#     see: /Users/jun.aruga/git/trinityrnaseq/sample_data/test_Trinity_Assembly/
#          for sample data and 'runMe.sh' for example Trinity execution
#
#     For more details, visit: http://trinityrnaseq.github.io
#
###############################################################################

Optionally if you want to install Trinity to somewhere, run "make install".

Files are installed to /usr/local/bin/trinityrnaseq/ in this case.

$ sudo make install
$ export PATH=/usr/local/bin/trinityrnaseq:$PATH

$ which Trinity
/usr/local/bin/trinityrnaseq/Trinity

Test

$ TRINITY_HOME=$(pwd) make test -C sample_data/test_Trinity_Assembly
...
##### Done Running Trinity #####
...

Usage

The document is here. [4][5]

Introduction. www.youtube.com

There are 2 cases Paired reads and Unpaired (Single) reads [6].

On my Mac, number of CPU is 4.

$ sysctl -n hw.ncpu
4

Paired reads

$ Trinity --seqType fq --max_memory 1G --left reads_1.fq --right reads_2.fq --CPU 4

Unpaired reads

$ Trinity --seqType fq --max_memory 1G --single reads.fq --CPU 4

References