Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
406 commits
Select commit Hold shift + click to select a range
933cd8e
Minor changes to EM algorithm.
alexdobin Nov 4, 2019
349c5a9
develop_2019-11-07
alexdobin Nov 8, 2019
1b0c196
Fixed a seg-fault in STARsolo for cases where no cell barcodes matche…
alexdobin Dec 28, 2019
bfffee4
report multi-mapping chimeric reads in BAM format
suhrig Jan 3, 2020
0881987
Update installation manual for Mac OS X
maxim-k Jan 10, 2020
e3e8554
Fixed a bug with solo SJ output for large genomes. Fixing the problem…
alexdobin Jan 23, 2020
aeef18c
Introduced --seedMapMin which was previously hard-coded, allowing use…
alexdobin Jan 23, 2020
705487a
Fixed the problem in Solo Q30 Bases in Summary.csv average. Now it is…
alexdobin Jan 23, 2020
828d0c1
2.7.3a_2020-01-23
alexdobin Jan 23, 2020
22393ed
Implementing SmartSeq: CB=RG output.
alexdobin Jan 23, 2020
ed9dd4e
Merged 2.7.3a_2020-01-23 master changes into develop.
alexdobin Jan 23, 2020
2ce3096
Merged from soloDevelop
alexdobin Jan 23, 2020
8890c14
Solo code cleanup.
alexdobin Jan 24, 2020
b6b8153
Implementing SmartSeq. Change in STARsolo SJ output behavior: junctio…
alexdobin Jan 24, 2020
26ecc51
Finished Solo SmartSeq implementation.
alexdobin Jan 28, 2020
90f14a3
Finished debugging SmartSeq implementation.
alexdobin Jan 31, 2020
3a5fe1a
alpha release develop_2020-01-31
alexdobin Jan 31, 2020
4ddcb25
Deallocated the cbFeatureUMImap in countSmartSeq to reduce RAM consum…
alexdobin Feb 3, 2020
9c5cc13
Reimplemented SmartSeq counting to reduce RAM consumption.
alexdobin Feb 4, 2020
b0d8cf7
Finished debugging the SmartSeq reimplementation.
alexdobin Feb 5, 2020
10e563b
make --peOverlapNbasesMin compatible with --chimOutType WithinBAM
suhrig Feb 10, 2020
bf39ba4
remove unused members of class ChimericDetection
suhrig Feb 10, 2020
c23aaaf
Fixed bugs in Solo SmartSeq
alexdobin Feb 10, 2020
73e437a
Implemented NoDedup option for Solo SmartSeq.
alexdobin Feb 12, 2020
49dc707
Minor corrections in parameters descriptions.
alexdobin Mar 13, 2020
a189596
Added GitHub links for 10X whitelists to STARsolo.md
alexdobin Apr 3, 2020
e33a665
Commit before redoing readManifest
alexdobin Apr 4, 2020
a9fd1d9
Alpha release develop_2020-01-31
alexdobin Apr 8, 2020
ff65f5f
Fixed seg-fault for STARsolo CB/UB SAM attributes output with --soloF…
alexdobin May 20, 2020
ab71d99
2.7.3a_2020-05-20
alexdobin May 21, 2020
c651f3b
Issue #907: Fixed the bug that prevented output of STARsolo GX/GN tag…
alexdobin May 22, 2020
f996cb7
Issue #864: Fxied seg-fault for STARsolo runs with very small number …
alexdobin May 22, 2020
7f5b2c9
Fixed the long-standing seg-fault problem for small genomes.
alexdobin May 24, 2020
c7a6337
Issue #881: Check if --genomeDir exists, create if necessary.
alexdobin May 24, 2020
dc9552b
Issue #843, #880: Throw an error if read file in --readFilesIn does n…
alexdobin May 27, 2020
2319051
Fixed description for readFilesManifest in parametersDefault.
alexdobin May 27, 2020
cd2f09c
For genome generation runs, the Log.out file is moved into the --geno…
alexdobin May 28, 2020
1cc6da8
Implemented creation of parent directories. Reverted genomeSAindex.cpp.
alexdobin May 28, 2020
8d8f689
The output directory in --outFileNamePrefix is checked and created if…
alexdobin May 28, 2020
6b98946
2.7.3a_2020-05-28 alpha release. Issue #882: Added 3rd column 'Gene E…
alexdobin May 28, 2020
b789ab1
Fix `readFilesPrefix` preifx mispell
illusional May 28, 2020
ff1fbde
Fix `readFilesPrefix` preifx mispell on manual
illusional May 28, 2020
1648c8a
Fixed the long-standing seg-fault problem for small genomes.
alexdobin May 29, 2020
97d35a5
Merge pull request #769 from alexey0308/master
alexdobin May 30, 2020
f80d8a1
Merge pull request #808 from maxim-k/patch-1
alexdobin May 30, 2020
6973246
Merge pull request #922 from illusional/fix-readFilesPrefix-docs
alexdobin May 30, 2020
ae5799c
Ready for 2.7.4a
alexdobin Jun 1, 2020
04a67a8
2.7.4a
alexdobin Jun 1, 2020
688258d
Merged master pre-2.7.4a.
alexdobin Jun 2, 2020
3d052ac
Merged 2.7.4a master.
alexdobin Jun 2, 2020
0dcd0fb
UnderDevelopment options in parametersDefault, skipped in the manual.
alexdobin Jun 3, 2020
047b49f
Fixed seg-fault for STARsolo CB/UB SAM attributes output with --soloF…
alexdobin Jun 10, 2020
89b94a3
Issue #934: Fixed a problem with annotated junctions that was casuing…
alexdobin Jun 15, 2020
3568cae
Issue #883: Patch for FreeBSD in SharedMemory and Makefile improvemen…
alexdobin Jun 15, 2020
502dd0c
N characters in --soloAdapterSequence are not counted as mismatches, …
alexdobin Jun 16, 2020
e67d668
Read for 2.7.5a
alexdobin Jun 16, 2020
ca4c213
2.7.5a
alexdobin Jun 16, 2020
e5654bf
Added math.h for compiler compatibility.
alexdobin Jun 17, 2020
63b96e7
Another compiler compatibility fix.
alexdobin Jun 17, 2020
5c68b0e
--soloType CB_samTagOut now allows output of (uncorrected) UMI sequen…
alexdobin Jun 19, 2020
b95267b
Docker build: switched to debian:stable-slim in the Dockerfile.
alexdobin Jun 21, 2020
1552aa0
alpha release 2.7.5a_2020-06-29
alexdobin Jun 30, 2020
3b9662a
Issue #965: output genome sizes with and without padding into Log.out.
alexdobin Jul 19, 2020
b368154
fix memory leak of chimeric BAM output
suhrig Jul 28, 2020
9a5bb6a
Issue #558: Fixed a bug that can cause a seg-fault in STARsolo run wi…
alexdobin Jul 28, 2020
46bfa67
For --soloType CB_samTagOut no need to perform alignment classification.
alexdobin Jul 28, 2020
184710d
Issue #952: Increased the maximum allowed length of the SAM tags in t…
alexdobin Jul 30, 2020
3418803
Check for errors when creating FIFO file. Cosmetic changes to paramet…
alexdobin Aug 1, 2020
a6076cd
2.7.5b
alexdobin Aug 1, 2020
123e9ff
Typo in CHANGES.md
alexdobin Aug 4, 2020
03204ac
Started implementing EmptyDrops_CR filtering policy.
alexdobin Aug 5, 2020
d5de6c1
Implemented --runMode soloCellFiltering.
alexdobin Aug 9, 2020
92bcf88
EmptyDrops_CR is working, perfect agreement with CR4 filtering.
alexdobin Aug 10, 2020
9cee102
Rearranging soloCellFiltering.
alexdobin Aug 12, 2020
d80d94e
Finalized EmptyDrops_CR filtering. Perfect agreement with CR when run…
alexdobin Aug 13, 2020
afc6e7e
Alpha develop release: 2.7.5b_develop_2020-08-13: EmptyDrops_CR filte…
alexdobin Aug 13, 2020
10f3a65
Issue #945: otuput GX/GN for --soloFeatures GeneFull
alexdobin Aug 15, 2020
10930be
Issue #978: fixed corrupted transcriptInfo.tab in genome generation f…
alexdobin Aug 15, 2020
1fe083b
Issue #988: proceed reading from GTF after a warning that exon end is…
alexdobin Aug 15, 2020
8537c16
Throw an error when --soloFeatures Velocyto is used with --soloType S…
alexdobin Aug 15, 2020
5941c5f
Implemented removal of control characters from the ends of input read…
alexdobin Aug 15, 2020
aa7bc30
Ready for 2.7.5c .
alexdobin Aug 17, 2020
7ed383e
2.7.5c
alexdobin Aug 17, 2020
ac3daec
Update SoloFeature_cellFiltering.cpp
rob-p Aug 26, 2020
3417ba0
Merge pull request #1012 from rob-p/patch-1
alexdobin Aug 27, 2020
ef89953
Merged PR #802, by @suhrig: output multimapping chimeras to BAM file.
alexdobin Aug 28, 2020
0a994d0
Fixed minor incompatibilities in the PR. All tests passed.
alexdobin Sep 10, 2020
7f18a70
2.7.5c_develop_2020-09-11: Fixed a problem with --runMode soloCellFil…
alexdobin Sep 11, 2020
59d1af7
Issue #786: fixed the bug causing the *Different SJ motifs problem* f…
alexdobin Sep 17, 2020
cf66335
PR # 1012: fixed the bug with --soloCellFiltering TopCells option.
alexdobin Sep 18, 2020
f42d879
Issue #945: GX/GN can be output for all --soloType, as well as for no…
alexdobin Sep 18, 2020
f64ab07
Added Parameters_samAttributes.cpp
alexdobin Sep 18, 2020
b11fe73
Merge branch 'suhrig-multimapping_chimeric_reads_in_BAM_format'
alexdobin Sep 18, 2020
fdfae27
Ready for 2.7.6a
alexdobin Sep 19, 2020
1a9a6fc
2.7.6a
alexdobin Sep 19, 2020
4794210
Merged master 2.7.6a into develop.
alexdobin Sep 19, 2020
c5a9c63
Reimplemented adapter clipping. Results agree with the old implement…
alexdobin Jul 28, 2020
78c032e
Fixed a problem with input from GTF files that might have been causin…
alexdobin Sep 22, 2020
371d41d
Merged all changes from adapter trimming branch, and some small chang…
alexdobin Sep 22, 2020
a98421e
Reverted back to float precision for CB multi-matching, to preserve t…
alexdobin Sep 23, 2020
9552e0a
2.7.6a_develop_2020-09-23, small changes from ExactMatchCR branch, st…
alexdobin Sep 23, 2020
3f841c1
Implemented --soloUMIdedup 1MM_CR option.
alexdobin Sep 25, 2020
d643ab0
Minor adjustments to match ExactMatchCR branch which perfectly matche…
alexdobin Sep 28, 2020
980183e
Trying for a better match with CR3. This works worse than the previous.
alexdobin Sep 30, 2020
5482e0d
Again: Trying for a better match with CR3. This works worse than the …
alexdobin Sep 30, 2020
368ffc0
Fixed a bug in the Transcript_transformGenome.cpp that may have cause…
alexdobin Oct 1, 2020
fda752f
Allowed multiple WL matches for CBs that contain one N.
alexdobin Oct 2, 2020
c9fe7d8
Another attempt to match CR3. Checking whether uncorrected UMI counts…
alexdobin Oct 3, 2020
41e4fe4
Only one read in disagreement with CR3 in pbmc5k-Lane1.
alexdobin Oct 11, 2020
2ac866f
And another attempt to match CR3 - makes things worse.
alexdobin Oct 17, 2020
2b54c27
Reverted back to 41e4fe4
alexdobin Oct 17, 2020
1924818
And another attempt to match CR3.
alexdobin Oct 17, 2020
e2e22e1
Another change to 1MM_CR algorithm trying to match CR3.1.0.
alexdobin Oct 18, 2020
358cab0
Reverted SoloFeature_collapseUMI_CR.cpp back to 41e4fe4, which still …
alexdobin Oct 23, 2020
c0b6559
Output CB/UB tags into BAM for the 1MM_CR option. Sort matrix.mtx out…
alexdobin Oct 24, 2020
4631bda
Issue #1071: fixed a bug that can cause a crash for STARsolo runs wit…
alexdobin Nov 5, 2020
22c6fee
Merged small patch from master.
alexdobin Nov 6, 2020
f9d4770
Implementing adapter clipping to match CR4.
alexdobin Nov 15, 2020
e7bbb45
Issue #1040: fixed a bug causing rare seg-faults for paired-end --sol…
alexdobin Nov 19, 2020
a315ce5
Merged master 2.7.6a_patch_2020-11-17
alexdobin Nov 19, 2020
12c969e
Tweaking adapter clipping.
alexdobin Nov 27, 2020
e5401de
Implemented CR4-like clipping of the 5' TSO adapter. Local alignment …
alexdobin Dec 1, 2020
4f89538
Fixed some problems with clipping.
alexdobin Dec 3, 2020
756565f
Overhauling ClipMate
alexdobin Dec 4, 2020
849446f
Finished implementing CR4-like adapter clipping.
alexdobin Dec 4, 2020
c4a418e
Fixed a few issues with clipping implementation. Implementing input f…
alexdobin Dec 8, 2020
9669403
Implemented --readFilesSAMattrKeep option.
alexdobin Dec 8, 2020
cd76951
Implemented input from SAM/BAM for STARsolo. Fixed an issue that was …
alexdobin Dec 10, 2020
1877778
Fixed a bug in SoloFeature_emptyDrops_CR causing seg-fault for runs w…
alexdobin Dec 11, 2020
15d49c4
Fixed a seg-fault in emptyDrops_CR for all empty cells containing no …
alexdobin Dec 12, 2020
a58874c
The UMI deduplication/correction specified in --soloUMIdedup is used …
alexdobin Dec 14, 2020
2d46da9
Different --soloUMIdedup counts, if requested, are recorded in separa…
alexdobin Dec 18, 2020
e8c74b9
Deprecated --genomeConsensusFile option. Please use --genomeTransform…
alexdobin Dec 18, 2020
ceb4a1c
Rearranged ReadAlign::outputAlignments.
alexdobin Dec 19, 2020
8038514
Fixed some issues with transforming alignments to the original genome.
alexdobin Dec 23, 2020
a21f3b1
Introduced --genomeTransformOutput SAM SJ option. Separated trMult fo…
alexdobin Dec 23, 2020
e6b1ff6
Implemented generation of original genome index for --genomeTransform…
alexdobin Dec 24, 2020
827228b
Merged master 2.7.7x (STARconsensus) into develop.
alexdobin Dec 24, 2020
238beb7
Reverting to match 2.7.6a counts.
alexdobin Dec 25, 2020
af7b2ec
Special UB tag for MultiGene UMI in 1MM_CR option.
alexdobin Dec 26, 2020
c64463b
Fixed some issues with UB tag with multigene UMI filtering and 1MM_CR…
alexdobin Dec 26, 2020
3ae9455
Simplified UB tag output in collapseUMI 1MM_All using unordered_map.
alexdobin Dec 26, 2020
50e6f15
Implemented --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts which al…
alexdobin Dec 28, 2020
b28904b
Ready for 2.7.7a
alexdobin Dec 28, 2020
054b0b8
2.7.7a
alexdobin Dec 28, 2020
97ee3f6
Merged master 2.7.7a into develop.
alexdobin Dec 28, 2020
7611506
SoloFeature_countVelocyto.cpp: UMI==-1 (from MultiGene) are not proce…
alexdobin Dec 29, 2020
e2b42af
Streamlining 1MM_CR and Directional UMI deduplcation.
alexdobin Jan 3, 2021
d287114
SoloReadBarcode_getCBandUMI.cpp: reverted to 2.7.6a behavior: reads w…
alexdobin Jan 3, 2021
87eda8b
Fixed a problen with 1MM_CR counting introduced in e2b42afe. Results …
alexdobin Jan 5, 2021
b7612bd
Merged all UMIdedup types into one function, except 1MM_All.
alexdobin Jan 6, 2021
db06a02
Finalized new UMIdedup calculations. Passed all tests with 10M reads.
alexdobin Jan 17, 2021
d3d0ae2
Fixed a problem with GX/GN tag output for --soloFeatures GeneFull opt…
alexdobin Jan 17, 2021
4a223bd
Fixed a problem with 1MM_Directional, now matrix.mtx is identical to …
alexdobin Jan 20, 2021
859a760
Cosmetic changes.
alexdobin Jan 22, 2021
7d41ac1
Rearranging loading of the barcode read sequence.
alexdobin Jan 22, 2021
4fd32a7
Fixed a bug that may cause seg-fault for STARconsensus runs.
alexdobin Jan 22, 2021
3b98a1c
Merged small STARconsensus bug-fix from master.
alexdobin Jan 22, 2021
d9d97fd
Moved barcode sequence loading from processChunks to readBarcodeLoad.
alexdobin Jan 22, 2021
5e31f43
Renamed readNmatesIn as readNends.
alexdobin Jan 22, 2021
4574007
Implemented barcode read loading.
alexdobin Jan 25, 2021
9f63bfb
Checked readNmates vs readNends throughout the code.
alexdobin Jan 26, 2021
362eaeb
Removed Qual1 since it's not used anymore. Moved 2nd mate joining wit…
alexdobin Jan 27, 2021
50ff486
Read clipping options --clip* now require specifying the values for a…
alexdobin Jan 27, 2021
1e0d1e5
Rearranged qualHist calculations.
alexdobin Jan 30, 2021
87418f0
Small fixes in qualHist calculation.
alexdobin Jan 30, 2021
85d94c3
2.7.7a_develop_2021-01-30
alexdobin Jan 30, 2021
49539d5
Attempted to re-write SoloFeature_countSmartSeq.cpp, but will return …
alexdobin Feb 2, 2021
7619b7c
Fixed the problems with SmartSeq introduced in the latest changes.
alexdobin Feb 2, 2021
596cfcb
Implemented --soloUMIdedup 1M_Directional_UMItools options to match U…
alexdobin Feb 3, 2021
76c2579
Issue #1129: fixed an issue with short barcode sequences and --soloBa…
alexdobin Feb 4, 2021
de47d3c
Prohibit CR/UR/CY/UY/CB/UB SAM attributes for --soloType SmartSeq.
alexdobin Feb 9, 2021
ff01def
Final fixes for 2.7.8a release.
alexdobin Feb 16, 2021
8e1ea7a
Preparing documentation for 2.7.8a
alexdobin Feb 19, 2021
8643202
Minor updates to manual for 2.7.8a
alexdobin Feb 20, 2021
b2763ed
Ready for 2.7.8a
alexdobin Feb 20, 2021
b8ab69a
Fixing compilation issues on Tavis-CI
alexdobin Feb 20, 2021
3ae0966
2.7.8a
alexdobin Feb 20, 2021
a1205fd
Also cleaning opal/opal.o
smoe Feb 23, 2021
5483e1c
State which STAR binary is executed
smoe Feb 24, 2021
0d5be74
SOURCE_DATE_EPOCH to make the build more reproducible
mr-c Mar 1, 2021
7707e77
Added extras/scripts/calcUMIperCell.awk: a script to calculate total …
alexdobin Mar 1, 2021
0d18b23
Merge branch 'patch-7' of https://github.com/smoe/STAR into smoe-patch-7
alexdobin Mar 6, 2021
d50175e
Merge branch 'patch-8' of https://github.com/smoe/STAR
alexdobin Mar 6, 2021
7f6f09b
Print STAR command line and version information to stdout.
alexdobin Mar 6, 2021
a9fd064
Merge branch 'reproducible' of https://github.com/mr-c/STAR
alexdobin Mar 6, 2021
c51d843
SOURCE_DATE_EPOCH to make the build more reproducible. Force update o…
alexdobin Mar 6, 2021
94debf8
Changed: ---limitIObufferSize now requires two numbers - separate siz…
alexdobin Mar 7, 2021
76e720e
Issue #1166: seg-fault for STARsolo --soloCBwhitelist None (no whitel…
alexdobin Mar 8, 2021
ffb66fb
2.7.8a_2021-03-08. Fixed Issue #1167: STARsolo CR/UR SAM tags are scr…
alexdobin Mar 8, 2021
92e923f
Merged master 2.7.8a_2021-03-08 into develop.
alexdobin Mar 9, 2021
2aa0b12
Starting to implement Solo multimappers.
alexdobin Mar 12, 2021
9e5c8c3
Implemented multi-genic UMI counting: Uniform and Rescue. Coding outp…
alexdobin Mar 13, 2021
59fbb26
simple, but inelegant support for non-x86 using SIMDe
mr-c Mar 1, 2021
6b85c81
New option: --soloUMIfiltering MultiGeneUMI_All to filter out all UMI…
alexdobin Mar 15, 2021
c9aaf02
Finished coding EM.
alexdobin Mar 19, 2021
a5a24fe
Fixed a bug causing problems for Solo without multimappers.
alexdobin Mar 21, 2021
c6ff64b
Implementing the GeneFull_Inside option.
alexdobin Mar 22, 2021
fca9575
Issue #1177: Added file checks for the --inputBAMfile
alexdobin Mar 31, 2021
ad42132
Issue #1190: Allow GX/GN output for non-STARsolo runs.
alexdobin Apr 1, 2021
442ff6f
Issue #1180: Output the actual number of alignments in NH attributes …
alexdobin Apr 1, 2021
e2a4e91
Simple script to convert BED spliced junctions (SJ.out.tab) to BED12 …
alexdobin Apr 1, 2021
3950f51
Merge branch 'SIMDe' of https://github.com/mr-c/STAR into mr-c-SIMDe_…
alexdobin Apr 1, 2021
0372be4
Reverted to Makefile from master.
alexdobin Apr 1, 2021
cc76bb6
Added the suffix to STAR targets - as in the original mr-c/SIMDe com…
alexdobin Apr 1, 2021
4e019d7
Changed opal/opal.cpp to always use AVX2, with SIMDe taking care of s…
alexdobin Apr 1, 2021
733b3f9
Merged master 2021-04-01 into develop.
alexdobin Apr 1, 2021
b90cbf6
Separate readInfoYes and readIndexYes, to allow recording of read IDs…
alexdobin Apr 2, 2021
f5aa4bf
Fixed BAM output when multimappers are requested. GX/GN tags are stil…
alexdobin Apr 2, 2021
adcee71
Fixed stats output for multimappers. Minor changes to stats output.
alexdobin Apr 3, 2021
12fa627
Fixed a problem with Solo GeneFull BAM.
alexdobin Apr 4, 2021
d6e091a
Minor changes in parametersDefault
alexdobin Apr 27, 2021
6110131
2.7.8a_2021-04-27 alpha-release with SIMDe incorporated.
alexdobin Apr 27, 2021
3d7b045
Merge branch 'develop' into develop_GeneFull_Inside
alexdobin Apr 28, 2021
88d770f
Starting coding GeneFull_CR option.
alexdobin Apr 29, 2021
9ed8eb3
Fixed issue 1220: corrupt SAM/BAM files for --outFilterType BySJout. …
alexdobin May 3, 2021
2cf500a
Fixed issue #1211: scrambled CB tags in BAM output for --soloCBwhitel…
alexdobin May 4, 2021
b83521f
PR #1210: fixed parametersDefault description.
alexdobin May 4, 2021
4b66c5e
Merged develop branch: multi-gene counting for STARsolo. Getting read…
alexdobin May 4, 2021
1ebfe0b
Ready for 2.7.9a
alexdobin May 4, 2021
ac39348
2.7.9a
alexdobin May 5, 2021
f68cf5f
PR #1234: Fixed typo in STARsolo.md
alexdobin May 16, 2021
d14a0a9
Added script extras/scripts/soloCountMatrixFromBAM.awk to re-create S…
alexdobin Jun 4, 2021
432d638
2.7.9a_2021-06-17: Issue #1230: fixed the bug that caused seg-faults …
alexdobin Jun 17, 2021
98468c6
Issue #1262: fixed the bug that prevented EM matrix output when only …
alexdobin Jun 18, 2021
8603be6
Issue #1177: throw an error in case the BAM file does not contain NH …
alexdobin Jun 18, 2021
2eb750b
2.7.9a_2021-06-25: Fixed a bug introduced in 2.7.9a for --quantMode T…
alexdobin Jun 25, 2021
452105f
Restarting coding GeneFull options: ExonOverIntron
alexdobin Jul 31, 2021
184fb90
Merged master 2.7.9a_2021-06-25
alexdobin Jul 31, 2021
6f7aeaf
Implemented GeneFull_ExonOverIntron. Compiles and runs.
alexdobin Jul 31, 2021
77d972d
Added Transcriptome_geneFullAlignOverlap_ExonOverIntron.cpp
alexdobin Aug 1, 2021
4ac891d
Fixed a bug that resulted in slightly different solo counts if --solo…
alexdobin Aug 23, 2021
bd3f212
Implementing CR-like pre-mRNA counting.
alexdobin Aug 28, 2021
51842fb
Implemented 50% exonic overlap and anti-sense alignment types for Gen…
alexdobin Sep 3, 2021
6783992
Renamed geneFull_CR into geneFull_Ex50pAS. Fixed warnings. Started to…
alexdobin Sep 9, 2021
d506db8
Fixed GX/GN tags for Ex50pAS.
alexdobin Sep 10, 2021
bf6bac8
Added git information output to Log.out. dev_EoI_2.7.9a_2021-09-10 re…
alexdobin Sep 10, 2021
8386ebd
Recompiled executables for dev_EoI_2.7.9a_2021-09-10 release.
alexdobin Sep 10, 2021
e4d8903
New way to select which feature determines solo SAM attribures: samAt…
alexdobin Sep 13, 2021
83e2567
Rearranging output of different features.
alexdobin Sep 14, 2021
f746bd5
Changed Solo BAM tags output for multiple --soloFeatures: now the fir…
alexdobin Sep 15, 2021
2949ada
Implemented Solo BAM tags gx gn: output ';'-separated gene IDs and na…
alexdobin Sep 15, 2021
a103047
Implemented simple insertions, but it seems we need insertion/deletio…
alexdobin Sep 20, 2021
86c197b
Implemented --soloCBmatchWLtype ParseBio_ED3 to allow multiple mismat…
alexdobin Sep 22, 2021
4408118
Changed Solo summary statistics outputs in Barcodes.stats and Feature…
alexdobin Sep 22, 2021
e49af00
Issue #1316: fixed the seg-fault which occurred if --soloType CB_samT…
alexdobin Sep 24, 2021
c89e7a3
Issue 1339: clarified the --soloCellFilter EmptyDrops_CR warning mess…
alexdobin Sep 24, 2021
63a50ce
Issue #1322: for incorrectly formatted read lines, output the offendi…
alexdobin Sep 27, 2021
2f34c8b
Issue #843: for comma separated list of input files in --readFilesIn,…
alexdobin Sep 27, 2021
12beb0c
Issues #535, #1350: fixed a long-standing problem that resulted in a …
alexdobin Sep 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 17 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,13 +1,24 @@
*.o
Depend.list

.project
.kdev4/
source.kdev4


# Don't track intermediary files from building the manual
doc/*.aux
doc/*.fdb_latexmk
doc/*.fls
doc/*.log
doc/*.out
doc/*.toc
extras/doc-latex/*.aux
extras/doc-latex/*.fdb_latexmk
extras/doc-latex/*.fls
extras/doc-latex/*.log
extras/doc-latex/*.out
extras/doc-latex/*.gz
extras/doc-latex/*.toc

# Don't track the STAR binary once it has being built
source/STAR
.DS_Store
*.xcworkspacedata
*.pbxproj
*.plist
*.xcuserstate
220 changes: 212 additions & 8 deletions CHANGES.md

Large diffs are not rendered by default.

32 changes: 21 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
STAR 2.7
========
STAR 2.7.9a
==========
Spliced Transcripts Alignment to a Reference
© Alexander Dobin, 2009-2019
© Alexander Dobin, 2009-2021
https://www.ncbi.nlm.nih.gov/pubmed/23104886

AUTHOR/SUPPORT
==============
Alex Dobin, dobin@cshl.edu
Alex Dobin, dobin@cshl.edu </br>
https://github.com/alexdobin/STAR/issues </br>
https://groups.google.com/d/forum/rna-star

HARDWARE/SOFTWARE REQUIREMENTS
==============================
* x86-64 compatible processors
* 64 bit Linux or Mac OS X
* 64 bit Linux or Mac OS X

MANUAL
======
Expand All @@ -35,9 +36,9 @@ Download the latest [release from](https://github.com/alexdobin/STAR/releases) a

```bash
# Get latest STAR source from releases
wget https://github.com/alexdobin/STAR/archive/2.7.2a.tar.gz
tar -xzf 2.7.2a.tar.gz
cd STAR-2.7.2a
wget https://github.com/alexdobin/STAR/archive/2.7.9a.tar.gz
tar -xzf 2.7.9a.tar.gz
cd STAR-2.7.9a

# Alternatively, get STAR source using git
git clone https://github.com/alexdobin/STAR.git
Expand All @@ -51,17 +52,26 @@ Compile under Linux
cd STAR/source
make STAR
```
For processors that do not support AVX extensions, specify the target SIMD architecture, e.g.
```
make STAR CXXFLAGS_SIMD=sse
```


Compile under Mac OS X
----------------------

```bash
# 1. Install brew (http://brew.sh/)
# 2. Install gcc with brew:
# 2. Install gcc with brew:
$ brew install gcc --without-multilib
# 3. Build STAR:
# run 'make' in the source directory
# note that the path to c++ executable has to be adjusted to its current version
$cd source
$make STARforMacStatic CXX=/usr/local/Cellar/gcc/8.2.0/bin/g++-8
# 4. Make it availible through the terminal
$cp STAR /usr/local/bin
```

All platforms - non-standard gcc
Expand Down Expand Up @@ -100,5 +110,5 @@ Please contact the author for a list of recommended parameters for much larger o
FUNDING
=======
The developmenr of STAR is supported by the National Human Genome Research Institute of
the National Institutes of Health under Award Number R01HG009318.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
the National Institutes of Health under Award Number R01HG009318.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
140 changes: 118 additions & 22 deletions RELEASEnotes.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,99 @@
STAR 2.7.9a --- 2021/05/05 ::: STARsolo updates
=====================================================

* [**Counting *multi-gene* (multimapping) reads**](#multi-gene-reads)
* STARsolo uses [SIMDe](https://github.com/simd-everywhere/simde) package which support different types of SIMD extensions. For processors that do not support AVX extensions, specify the target SIMD architecture, e.g.
```
make STAR CXXFLAGS_SIMD=sse
```


STAR 2.7.8a --- 2021/02/20
===========================
**Major STARsolo updates and many bug fixes**

* [**Cell calling (filtering) similar to CellRanger:**](docs/STARsolo.md#emptydrop-like-filtering)
* ```--soloCellFilter EmptyDrops_CR``` option for cell filtering (calling) nearly identical to that of CellRanger 3 and 4
* ```--runMode soloCellFiltering``` option for cell filtering (calling) of the raw count matrix, without re-mapping
* [**Input from BAM files for STARsolo:**](docs/STARsolo.md#input-reads-from-bam-files)
* Input from unmapped or mapped SAM/BAM for STARsolo, with options ```--soloInputSAMattrBarcodeSeq``` and ```--soloInputSAMattrBarcodeQual``` to specify SAM tags for the barcode read sequence and qualities
* [**Read trimming similar to CellRanger4:**](docs/STARsolo.md#matching-cellranger-4xx-and-5xx-results)
* ```--clipAdapterType CellRanger4``` option for 5' TSO adapter and 3' polyA-tail clipping of the reads to better match CellRanger >= 4.0.0 mapping results
* [**Support for barcodes embedded in mates (such as 10X 5' protocol):**](docs/STARsolo.md#barcode-and-cdna-on-the-same-mate)
* ```--soloBarcodeMate``` to support scRNA-seq protocols in which one of the paired-end mates contains both barcode sequence and cDNA (e.g. 10X 5' protocol)

STAR 2.7.7a --- 2020/12/28
==========================
**Major new feature:
STARconsensus: mapping RNA-seq reads to consensus genome.**

* Provide the VCF file with consensus SNVs and InDels at the genome generation stage with ```--genomeTransformVCF Variants.vcf --genomeTransformType Haploid```.
The alternative alleles in this VCF will be inserted to the reference genome to create a "transformed" genome.
Both the genome sequence and transcript/gene annotations are transformed.

* At the mapping stage, the reads will be mapped to the transformed (consensus) genome.
The quantification in the transformed annotations can be performed with standard ```--quantMode TranscriptomeSAM and/or GeneCounts``` options.
If desired, alignments (SAM/BAM) and spliced junctions (SJ.out.tab) can be transformed back to the original (reference) coordinates with ```--genomeTransformOutput SAM and/or SJ```.
This is useful if downstream processing relies on reference coordinates.

STAR 2.7.6a --- 2020/09/19
==========================
**Major new feature:**
Output multimapping chimeric alignments in BAM format using
```
--chimMultimapNmax N>1 --chimOutType WithinBAM --outSAMtype BAM Unsorted [and/or] SortedByCoordinate
```
Many thanks to Sebastian @suhrig who implemented this feature!
More detailed description from Sebastian in PR #802.

STAR 2.7.5a 2020/06/16
======================
**Major new features:
~ support for Plate-based (Smart-seq) scRNA-seq
~ manifest file to list the input reads FASTQ files**

* Typical STAR command for mapping and quantification of plate-based (Smart-seq) scRNA-seq will look like:
```
--soloType SmartSeq --readFilesManifest /path/to/manifest.tsv --soloUMIdedup Exact --soloStrand Unstranded
```
For detailed description, see [Plate-based (Smart-seq) scRNA-seq](docs/STARsolo.md#plate-based-Smart-seq-scRNA-seq)

* The convenient way to list a large number of reads FASTQ files and their IDs is to create a file manifest and supply it in `--readFilesManifest /path/to/manifest.tsv`. The manifest file should contain 3 tab-separated columns. For paired-end reads:
```
Read1-file-name \t Read2-file-name \t File-id
```
For single-end reads, the 2nd column should contain the dash - :
```
Read1-file-name \t - \t File-id
```
File-id can be any string without spaces. File-id will be added as ReadGroup tag (*RG:Z:*) for each read in the SAM/BAM output. If File-id starts with *ID:*, it can contain several fields separated by tab, and all the fields will be copied verbatim into SAM *@RG* header line.


STAR 2.7.4a 2020/06/01
======================
This release fixes multiple bugs and issues.
The biggest issue fixed was a seg-fault for small genome which previously required scaling down `--genomeSAindexNbases`. Such scaling is still recommended but is no longer required.
**This release requires re-generation of the genome indexes**

STAR 2.7.3a 2019/10/08
======================
Major new features in STARsolo
------------------------------
* **Output enhancements:**
* Summary.csv statistics output for raw and filtered cells useful for quick run quality assessment.
* --soloCellFilter option for basic filtering of the cells, similar to the methods used by CellRanger 2.2.x.
* [**Better compatibility with CellRanger 3.x.x:**](docs/STARsolo.md#matching-cellranger-3xx-results)
* --soloUMIfiltering MultiGeneUMI option introduced in CellRanger 3.x.x for filtering UMI collisions between different genes.
* --soloCBmatchWLtype 1MM_multi_pseudocounts option, introduced in CellRanger 3.x.x, which slightly changes the posterior probability calculation for CB with 1 mismatch.
* [**Velocyto spliced/unspliced/ambiguous quantification:**](docs/STARsolo.md#velocyto-splicedunsplicedambiguous-quantification)
* --soloFeatures Velocyto option to produce Spliced, Unspliced, and Ambiguous counts similar to the [velocyto.py](http://velocyto.org/) tool developed by [LaManno et al](https://doi.org/10.1038/s41586-018-0414-6). This option is under active development and the results may change in the future versions.
* [**Support for complex barcodes, e.g. inDrop:**](docs/STARsolo.md#barcode-geometry)
* Complex barcodes in STARsolo with --soloType CB_UMI_Complex, --soloCBmatchWLtype --soloAdapterSequence, --soloAdapterMismatchesNmax, --soloCBposition,--soloUMIposition
* [**BAM tags:**](#bam-tags)
* CB/UB for corrected CellBarcode/UMI
* GX/GN for gene ID/name
* STARsolo most up-to-date [documentation](docs/STARsolo.md).

STAR 2.7.2a 2019/08/13
======================

Expand All @@ -17,10 +113,10 @@ STAR 2.7.0c 2019/02/05
STARsolo: mapping, demultiplexing and gene quantification for single cell RNA-seq
---------------------------------------------------------------------------------
STARsolo is a turnkey solution for analyzing droplet single cell RNA sequencing data (e.g. 10X Genomics Chromium System) built directly into STAR code.
STARsolo inputs the raw FASTQ reads files, and performs the following operations
STARsolo inputs the raw FASTQ reads files, and performs the following operations
* error correction and demultiplexing of cell barcodes using user-input whitelist
* mapping the reads to the reference genome using the standard STAR spliced read alignment algorithm
* error correction and collapsing (deduplication) of Unique Molecular Identifiers (UMIa)
* error correction and collapsing (deduplication) of Unique Molecular Identifiers (UMIa)
* quantification of per-cell gene expression by counting the number of reads per gene

STARsolo output is designed to be a drop-in replacement for 10X CellRanger gene quantification output.
Expand All @@ -29,15 +125,15 @@ At the same time STARsolo is ~10 times faster than the CellRanger.

The STAR solo algorithm is turned on with:
```
--soloType Droplet
--soloType Droplet
```

Presently, the cell barcode whitelist has to be provided with:
```
```
--soloCBwhitelist /path/to/cell/barcode/whitelist
```

The 10X Chromium whitelist file can be found inside the CellRanger distribution,
The 10X Chromium whitelist file can be found inside the CellRanger distribution,
e.g. [10X-whitelist](https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist-).
Please make sure that the whitelist is compatible with the specific version of the 10X chemistry (V1,V2,V3 etc).

Expand Down Expand Up @@ -105,20 +201,20 @@ If the overlap is found, STAR will map merge the mates and attempt to map the re
If requested, the chimeric detection will be performed on the merged-mate sequence, thus allowing chimeric detection in the overlap region.
If the score of this alignment higher than the original one, or if a chimeric alignment is found, STAR will report the merged-mate aligment instead of the original one.
In the output, the merged-mate aligment will be converted back to paired-end format.
The developmment of this algorithm was supported by Illumina, Inc.
The developmment of this algorithm was supported by Illumina, Inc.
Many thanks to June Snedecor, Xiao Chen, and Felix Schlesinger for their extensive help in developing this feature.


**2. Detection of personal variants overlapping alignments.**
Option --varVCFfile /path/to/vcf/file is used to input VCF file with personal variants. Only single nucleotide variants (SNVs) are supported at the moment.
Option --varVCFfile /path/to/vcf/file is used to input VCF file with personal variants. Only single nucleotide variants (SNVs) are supported at the moment.
Each variant is expected to have a genotype with two alleles.
To output variants that overlap alignments, vG and vA have to be added to --outSAMattributes list.
To output variants that overlap alignments, vG and vA have to be added to --outSAMattributes list.
SAM attribute vG outputs the genomic coordinate of the variant, allowing for identification of the variant.
SAM attribute vA outputs which allele is detected in the read: 1 or 2 match one of the genotype alleles, 3 - no match to genotype.

**3. WASP filtering of allele specific alignments.**
This is re-implementation of the original WASP algorithm by Bryce van de Geijn, Graham McVicker, Yoav Gilad & Jonathan K Pritchard. Please cite the original [WASP paper: Nature Methods 12, 1061–1063 (2015) ](https://www.nature.com/articles/nmeth.3582).
WASP filtering is activated with --waspOutputMode SAMtag, which will add vW tag to the SAM output:
WASP filtering is activated with --waspOutputMode SAMtag, which will add vW tag to the SAM output:
vW:i:1 means alignment passed WASP filtering, while all other values mean it did not pass.
Many thanks to Bryce van de Geijn for fruitful discussions.

Expand Down Expand Up @@ -147,17 +243,17 @@ STAR 2.5.0a 2015/11/06

Major new features:
-------------------
1. It is now possible to add extra sequences to the reference genome ont the fly (without re-generating the genome) by specifying
_--genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2_ at the mapping stage.
1. It is now possible to add extra sequences to the reference genome ont the fly (without re-generating the genome) by specifying
_--genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2_ at the mapping stage.

2. By default, the order of the multi-mapping alignments for each read is not truly random.
The _--outMultimapperOrder Random_ option outputs multiple alignments for each read in random order,
and also also randomizes the choice of the primary alignment from the highest scoring alignments.
Parameter _--runRNGseed_ can be used to set the random generator seed.
With this option, the ordering of multi-mapping alignments of each read,
The _--outMultimapperOrder Random_ option outputs multiple alignments for each read in random order,
and also also randomizes the choice of the primary alignment from the highest scoring alignments.
Parameter _--runRNGseed_ can be used to set the random generator seed.
With this option, the ordering of multi-mapping alignments of each read,
and the choice of the primary alignment will vary from run to run, unless only one thread is used and the seed is kept constant.

3. The --outSAMmultNmax parameter limits the number of output alignments (SAM lines) for multimappers.
3. The --outSAMmultNmax parameter limits the number of output alignments (SAM lines) for multimappers.
For instance, _--outSAMmultNmax 1_ will output exactly one SAM line for each mapped read.


Expand All @@ -173,13 +269,13 @@ The counts coincide with those produced by htseq-count with default parameters.
Requires annotations (GTF or GFF with --sjdbGTFfile option) used at the genome generation step, or at the mapping step.

Outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:
column 1: gene ID
column 1: gene ID
column 2: counts for unstranded RNA-seq
column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes)
column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse)
Select the output according to the strandedness of your data.
Note, that if you have stranded data and choose one of the columns 3 or 4, the other column (4 or 3) will give you the count of antisense reads.

With --quantMode TranscriptomeSAM GeneCounts, and get both the Aligned.toTranscriptome.out.bam and ReadsPerGene.out.tab outputs.


Expand All @@ -196,7 +292,7 @@ New features:
The on the fly genome indices can be saved (for reuse) with "--sjdbInsertSave All", into _STARgenome directory inside the current run directory.
Default --sjdbOverhang is now set at 100, and does not have to be specified unless you need to change this value.

The "all-sample" 2-pass method can be simplified using this on the fly junction insertion option:
The "all-sample" 2-pass method can be simplified using this on the fly junction insertion option:
(i) run the 1st pass for all samples as usual, with or without annotations
(ii) run 2nd pass for all samples, listing SJ.out.tab files from all samples in --sjdbFileChrStartEnd /path/to/sj1.tab /path/to/sj2.tab ...

Expand All @@ -208,12 +304,12 @@ New features:
3. Included link (submodule) to Brian Haas' STAR-Fusion code for detecting fusion transcript from STAR chimeric output:
https://github.com/STAR-Fusion/STAR-Fusion

4. Included Gery Vessere's shared memory implementation for POSIX and SysV.
4. Included Gery Vessere's shared memory implementation for POSIX and SysV.
To compile STAR with POSIX shared memory, use `make POSIXSHARED`

5. New option "--chimOutType WithinBAM" to include chimeric alignments together with normal alignments in the main (sorted or unsorted) BAM file(s).
Formatting of chimeric alignments follows the latest SAM/BAM specifications. Thanks to Felix Schlesinger for thorough testing of this option.

6. New option "--quantTranscriptomeBan Singleend" allows insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
6. New option "--quantTranscriptomeBan Singleend" allows insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).

7. New option "--alignEndsTypeExtension Extend5pOfRead1" to enforce full extension of the 5p of the read1, while all other ends undergo local alignment and may be soft-clipped.
Binary file modified bin/Linux_x86_64/STAR
Binary file not shown.
Binary file modified bin/Linux_x86_64/STARlong
Binary file not shown.
Binary file modified bin/Linux_x86_64_static/STAR
Binary file not shown.
Binary file modified bin/Linux_x86_64_static/STARlong
Binary file not shown.
Binary file modified bin/MacOSX_x86_64/STAR
100755 → 100644
Binary file not shown.
Binary file modified bin/MacOSX_x86_64/STARlong
100755 → 100644
Binary file not shown.
Binary file modified doc/STARmanual.pdf
Binary file not shown.
13 changes: 13 additions & 0 deletions docs/STARconsensus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
STARconsensus: mapping RNA-seq reads to consensus genome.
=========================================================

* Introduced in STAR 2.7.7a (2020/12/28)

* Provide the VCF file with consensus SNVs and InDels at the genome generation stage with ```--genomeTransformVCF Variants.vcf --genomeTransformType Haploid```.
The alternative alleles in this VCF will be inserted to the reference genome to create a "transformed" genome.
Both the genome sequence and transcript/gene annotations are transformed.

* At the mapping stage, the reads will be mapped to the tranformed (consensus) genome.
The quantification in the transformed annotations can be performed with standard ```--quantMode TranscriptomeSAM and/or GeneCounts``` options.
If desired, alignments (SAM/BAM) and spliced junctions (SJ.out.tab) can be transformed back to the original (reference) coordinates with ```--genomeTransformOutput SAM and/or SJ```.
This is useful if downstream processing relies on reference coordinates.
Loading