Skip to content

Commit

Permalink
2.4.2a
Browse files Browse the repository at this point in the history
  • Loading branch information
alexdobin committed Jun 19, 2015
1 parent 6705029 commit 52f6f61
Show file tree
Hide file tree
Showing 12 changed files with 66 additions and 10 deletions.
3 changes: 2 additions & 1 deletion CHANGES
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
STARlong: fixed --outFilterIntronMotifs and --outSAMstrandField options.
STAR 2.4.2a 2015/06/19
Implemented --quantMode GeneCounts option for counting number of reads per gene, similar to htseq-count.
STARlong: fixed --outFilterIntronMotifs and --outSAMstrandField options.
Yet another fix for --sjdbOverhang logic.
Error message when shared memory and on the fly junction insertion are used together.
Fixed a bug causing unnecessary 1 base soft-clipping in rare cases with sparse suffix array.
Expand Down
23 changes: 23 additions & 0 deletions RELEASEnotes
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
STAR 2.4.2a 2015/06/19

New features:

Counting reads per gene while mapping with --quantMode GeneCounts option.
A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps.
The counts coincide with those produced by htseq-count with default parameters.

Requires annotations (GTF or GFF with --sjdbGTFfile option) used at the genome generation step, or at the mapping step.

Outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:
column 1: gene ID
column 2: counts for unstranded RNA-seq
column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes)
column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse)
Select the output according to the strandedness of your data.
Note, that if you have stranded data and choose one of the columns 3 or 4, the other column (4 or 3) will give you the count of antisense reads.

With --quantMode TranscriptomeSAM GeneCounts, and get both the Aligned.toTranscriptome.out.bam and ReadsPerGene.out.tab outputs.



################################################################################################################################################################
STAR 2.4.1a 2015/04/17

New features:
Expand Down
Binary file modified bin/MacOSX_x86_64/STAR
Binary file not shown.
Binary file added bin/MacOSX_x86_64/STARlong
Binary file not shown.
Binary file modified doc/STARmanual.pdf
Binary file not shown.
26 changes: 21 additions & 5 deletions extras/doc-latex/STARmanual.tex
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

\newcommand{\sechyperref}[1]{\hyperref[#1]{Section \ref{#1}. \nameref{#1}}}

\title{STAR manual 2.4.1a}
\title{STAR manual 2.4.2a}
\author{Alexnder Dobin\\
[email protected]}
\maketitle
Expand Down Expand Up @@ -153,7 +153,7 @@ \subsubsection{Very small genome.}
For small genomes, the parameter \opt{genomeSAindexNbases} needs to be scaled down, with a typical value of \code{min(14, log2(GenomeLength)/2 - 1)}. For example, for 1~megaBase genome, this is equal to 9, for 100~kiloBase genome, this is equal to 7.

\subsubsection{Genome with a large number of references.}
If you are using a genome with a large (\textgreater 5,000) number of references (chrosomes/scaffolds), you may need to reduce the \opt{genomeChrBinNbits} to reduce RAM consumption. The following scaling is recomended: \opt{genomeChrBinNbits} = \code{min(18, log2(GenomeLength/NumberOfReferences))}. For example, for 3~gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.
If you are using a genome with a large (\textgreater 5,000) number of references (chrosomes/scaffolds), you may need to reduce the \opt{genomeChrBinNbits} to reduce RAM consumption. The following scaling is recommended: \opt{genomeChrBinNbits} = \code{min(18, log2(GenomeLength/NumberOfReferences))}. For example, for 3~gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.

\section{Running mapping jobs.}\label{Running_mapping_jobs}
\subsection{Basic options.}
Expand All @@ -165,7 +165,7 @@ \subsection{Basic options.}

\begin{itemize}
\item[]
\opt{runThreadN} option defines the number of threads to be used for genome generation, it has to be set to the number of available cores on the server node.
%\opt{runThreadN} option defines the number of threads to be used for mapping, it has to be set to the number of available cores on the server node.

\opt{genomeDir} specifies path to the genome directory where genome indices where generated (see \sechyperref{Generating_genome_indexes}).

Expand Down Expand Up @@ -203,7 +203,7 @@ \subsubsection{ENCODE options}
\opt{outFilterMismatchNmax} 999\\
maximum number of mismatches per pair, large number switches off this filter
\item[]
\opt{outFilterMismatchNoverLmax} 0.04\\
%\opt{outFilterMismatchNoverReadLmax} 0.04\\
max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read
\item[]
\opt{alignIntronMin} 20\\
Expand Down Expand Up @@ -367,11 +367,27 @@ \subsection{Chimeric alignments in \ofilen{Chimeric.out.junction}}


\section{Output in transcript coordinates.}
With \opt{quantMode} \optv{TranscriptomeSAM} option STAR will outputs alignments translated into transcript coordinates in the \ofilen{Aligned.toTranscriptome.out.bam} file (in addition to alignments in genomic coordinates in \ofilen{Aligned.*.sam/bam} files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress. For example, RSEM command line would look as follows: \codelines{rsem-calculate-expression ... --bam Aligned.toTranscriptome.out.bam /path/to/RSEM/reference RSEM}.
With \opt{quantMode} \optv{TranscriptomeSAM} option STAR will output alignments translated into transcript coordinates in the \ofilen{Aligned.toTranscriptome.out.bam} file (in addition to alignments in genomic coordinates in \ofilen{Aligned.*.sam/bam} files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress. For example, RSEM command line would look as follows: \codelines{rsem-calculate-expression ... --bam Aligned.toTranscriptome.out.bam /path/to/RSEM/reference RSEM}.
Note, that STAR first aligns reads to entire genome, and only then searches for concordance between alignments and transcripts. I believe this approach might offer certain advantages compared to the alignment to transcriptome only, by not forcing the alignments to annotated transcripts.

By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use \opt{quantTranscriptomeBan} \optv{Singleend} to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).

\section{Counting number of reads per gene.}
With \opt{quantMode} \optv{GeneCounts} option STAR will count number reads per gene while mapping.
A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps.
The counts coincide with those produced by htseq-count with default parameters.
This option requires annotations (GTF or GFF with --sjdbGTFfile option) used at the genome generation step, or at the mapping step.
STAR outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:
\begin{itemize}[leftmargin=1in]
\item[column 1:] gene ID
\item[column 2:] counts for unstranded RNA-seq
\item[column 3:] counts for the 1st read strand aligned with RNA (htseq-count option -s yes)
\item[column 4:] counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse)
\end{itemize}
Select the output according to the strandedness of your data.
Note, that if you have stranded data and choose one of the columns 3 or 4, the other column (4 or 3) will give you the count of antisense reads.
With \opt{quantMode} \optv{TranscriptomeSAM} \optv{GeneCounts}, and get both the \ofilen{Aligned.toTranscriptome.out.bam} and \ofilen{ReadsPerGene.out.tab} outputs.


\section{2-pass mapping.}

Expand Down
1 change: 1 addition & 0 deletions extras/doc-latex/convertParDefToLatexTable.awk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ function substLatexSymbols() {
gsub(">","{\\textgreater}");
gsub("<","{\\textless}");
gsub("_","{\\textunderscore}");
gsub("&","{\\\\&}");
};

BEGIN {
Expand Down
9 changes: 8 additions & 1 deletion extras/doc-latex/parametersDefault.tex
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@
\optName{runThreadN}
\optValue{1}
\optLine{int: number of threads to run STAR}
\optName{runDirPerm}
\optValue{User{\textunderscore}RWX}
\optLine{string: permissions for the directories created at the run-time. }
\begin{optOptTable}
\optOpt{User{\textunderscore}RWX} \optOptLine{user-read/write/execute}
\optOpt{All{\textunderscore}RWX} \optOptLine{all-read/write/execute (same as chmod 777)}
\end{optOptTable}
\end{optTable}
\optSection{Genome Parameters}\label{Genome_Parameters}
\begin{optTable}
Expand Down Expand Up @@ -258,7 +265,7 @@
\optLine{int: 0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise.}
\optName{outSAMflagAND}
\optValue{65535}
\optLine{int: 0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG \& outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise.}
\optLine{int: 0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG {\&} outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise.}
\optName{outSAMattrRGline}
\optValue{-}
\optLine{string(s): SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z". }
Expand Down
4 changes: 4 additions & 0 deletions source/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,10 @@ STARforMacStatic : CCFLAGS=-D'COMPILE_FOR_MAC' -I ./Mac_Include/ $(CCFLAGS_main)
STARforMacStatic : parametersDefault.xxd $(OBJECTS)
$(CXX) -o STAR $(CCFLAGS) $(LDFLAGS_Mac_static) $(OBJECTS)

STARlongForMacStatic : CCFLAGS=-D'COMPILE_FOR_LONG_READS' -D'COMPILE_FOR_MAC' -I ./Mac_Include/ $(CCFLAGS_main)
STARlongForMacStatic : parametersDefault.xxd $(OBJECTS)
$(CXX) -o STARlong $(CCFLAGS) $(LDFLAGS_Mac_static) $(OBJECTS)

#
STARforMacGDB : CCFLAGS=-D'COMPILE_FOR_MAC' -I ./Mac_Include/ $(CCFLAGS_gdb)
STARforMacGDB : parametersDefault.xxd $(OBJECTS)
Expand Down
4 changes: 4 additions & 0 deletions source/Quantifications.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

Quantifications::Quantifications (uint32 nGeIn) {

geneCounts.nType=3;
geneCounts.cAmbig = new uintQ[geneCounts.nType];
geneCounts.cNone = new uintQ[geneCounts.nType];

geneCounts.nGe=nGeIn;
geneCounts.gCount = new uintQ* [geneCounts.nType];

Expand Down
4 changes: 2 additions & 2 deletions source/Quantifications.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ class Quantifications {
public:
struct {//counting reads per gene, similar to HTseq
uint32 nGe; //number of genes
static const int nType=3; //number of count types (columns)
int nType; //number of count types (columns)
uintQ cMulti; //count multimappers
uintQ cAmbig[nType], cNone[nType];//ambigouous, no-feature
uintQ *cAmbig, *cNone;//ambigouous, no-feature
uintQ **gCount; // array of read counts per gene for two strands
} geneCounts;

Expand Down
2 changes: 1 addition & 1 deletion source/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
#define STAR_VERSION "STAR_2.4.1d_modified"
#define STAR_VERSION "STAR_2.4.2a"

0 comments on commit 52f6f61

Please sign in to comment.