Install and test Trinity, the De novo assemble tool
This article is about how to install and test Trinity RNA seq de novo assemble tool on Mac OSX [1].
Install
Seeing Trinity .travis.yml
[2] is good to know it.
First, we need to use a compiler supporting Open MP [3] (= Open Multi-Processing). Mac's default gcc (= actually = clang) does not support Open MP. Latest GCC 8 (gcc-8, g++-8) is supporting it.
In this time, use GCC 8.
$ brew install gcc-8 $ brew install g++-8
Then install htslib
, Samtools
, Bowtie2
, Jellyfish
, Salmon
and numpy
as a dependency packages.
Install Trinity from the source code.
$ git clone git@github.com:trinityrnaseq/trinityrnaseq.git $ cd trinityrnaseq $ CC=gcc-8 CXX=g++-8 make $ CC=gcc-8 CXX=g++-8 make plugins $ ./Trinity --help ############################################################################### # ______ ____ ____ ____ ____ ______ __ __ | || \ | || \ | || || | | | || D ) | | | _ | | | | || | | |_| |_|| / | | | | | | | |_| |_|| ~ | | | | \ | | | | | | | | | |___, | | | | . \ | | | | | | | | | | | |__| |__|\_||____||__|__||____| |__| |____/ Trinity-v2.8.4 # # # Required: # # --seqType <string> :type of reads: ('fa' or 'fq') # # --max_memory <string> :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc) # provided in Gb of RAM, ie. '--max_memory 10G' # # If paired reads: # --left <string> :left reads, one or more file names (separated by commas, no spaces) # --right <string> :right reads, one or more file names (separated by commas, no spaces) # # Or, if unpaired reads: # --single <string> :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired ) # # Or, # --samples_file <string> tab-delimited text file indicating biological replicate relationships. # ex. # cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq # cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq # cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq # cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq # # # if single-end instead of paired-end, then leave the 4th column above empty. # #################################### ## Misc: ######################### # # --include_supertranscripts :yield supertranscripts fasta and gtf files as outputs. # # --SS_lib_type <string> :Strand-specific RNA-Seq read orientation. # if paired: RF or FR, # if single: F or R. (dUTP method = RF) # See web documentation. # # --CPU <int> :number of CPUs to use, default: 2 # --min_contig_length <int> :minimum assembled contig length to report # (def=200) # # --long_reads <string> :fasta file containing error-corrected or circular consensus (CCS) pac bio reads # (** note: experimental parameter **, this functionality continues to be under development) # # --genome_guided_bam <string> :genome guided mode, provide path to coordinate-sorted bam file. # (see genome-guided param section under --show_full_usage_info) # # --jaccard_clip :option, set if you have paired reads and # you expect high gene density with UTR # overlap (use FASTQ input file format # for reads). # (note: jaccard_clip is an expensive # operation, so avoid using it unless # necessary due to finding excessive fusion # transcripts w/o it.) # # --trimmomatic :run Trimmomatic to quality trim reads # see '--quality_trimming_params' under full usage info for tailored settings. # # # --no_normalize_reads :Do *not* run in silico normalization of reads. Defaults to max. read coverage of 200. # see '--normalize_max_read_cov' under full usage info for tailored settings. # (note, as of Sept 21, 2016, normalization is on by default) # # --no_distributed_trinity_exec :do not run Trinity phase 2 (assembly of partitioned reads), and stop after generating command list. # # # --output <string> :name of directory for output (will be # created if it doesn't already exist) # default( your current working directory: "/Users/jun.aruga/git/trinityrnaseq/trinity_out_dir" # note: must include 'trinity' in the name as a safety precaution! ) # # --workdir <string> :where Trinity phase-2 assembly computation takes place (defaults to --output setting). # (can set this to a node-local drive or RAM disk) # # --full_cleanup :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta # # --cite :show the Trinity literature citation # # --verbose :provide additional job status info during the run. # # --version :reports Trinity version (Trinity-v2.8.4) and exits. # # --show_full_usage_info :show the many many more options available for running Trinity (expert usage). # # ############################################################################### # # *Note, a typical Trinity command might be: # # Trinity --seqType fq --max_memory 50G --left reads_1.fq --right reads_2.fq --CPU 6 # # (if you have multiple samples, use --samples_file ... see above for details) # # and for Genome-guided Trinity, provide a coordinate-sorted bam: # # Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G # --genome_guided_max_intron 10000 --CPU 6 # # see: /Users/jun.aruga/git/trinityrnaseq/sample_data/test_Trinity_Assembly/ # for sample data and 'runMe.sh' for example Trinity execution # # For more details, visit: http://trinityrnaseq.github.io # ###############################################################################
Optionally if you want to install Trinity to somewhere, run "make install".
Files are installed to /usr/local/bin/trinityrnaseq/ in this case.
$ sudo make install
$ export PATH=/usr/local/bin/trinityrnaseq:$PATH $ which Trinity /usr/local/bin/trinityrnaseq/Trinity
Test
$ TRINITY_HOME=$(pwd) make test -C sample_data/test_Trinity_Assembly ... ##### Done Running Trinity ##### ...
Usage
The document is here. [4][5]
Introduction. www.youtube.com
There are 2 cases Paired reads and Unpaired (Single) reads [6].
On my Mac, number of CPU is 4.
$ sysctl -n hw.ncpu 4
Paired reads
$ Trinity --seqType fq --max_memory 1G --left reads_1.fq --right reads_2.fq --CPU 4
Unpaired reads
$ Trinity --seqType fq --max_memory 1G --single reads.fq --CPU 4
References
- [1] Trinity GitHub: https://github.com/trinityrnaseq/trinityrnaseq
- [2] Trinity .travis.yml: https://github.com/trinityrnaseq/trinityrnaseq/blob/master/.travis.yml
- [3] https://www.openmp.org/
- [4] Trinity document: https://github.com/trinityrnaseq/trinityrnaseq/wiki
- [5] Trinity document - Running Trinity: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Running-Trinity
- [6] Paired-End vs. Single-Read Sequencing Technology