-
Notifications
You must be signed in to change notification settings - Fork 243
Open
Description
Summary
mmseqs easy-cluster
on a nucleotide FASTA crashes with a Segmentation fault during the prefilter step. Crash occurs after successful linclust
redundancy removal and alignment steps, while building/using the index for prefiltering the reduced database.
Command
/usr/bin/time -v mmseqs easy-cluster INPUT.fasta OUTPUT \
/home/USER/nvme/mmseqs2_tmp --cluster-mode 2 --cov-mode 1 --min-seq-id 0.98 --threads 110 -k 11
Notes:
- Nucleotide database
--cov-mode 1
,--min-seq-id 0.98
,--cluster-mode 2
(greedy),-k 11
--threads 32
,--split-memory-limit 500G
Expected behavior
Clustering completes without a crash and writes final cluster outputs.
Actual behavior
Pipeline reaches the prefilter
phase of the nucleotide clustering workflow and crashes with:
Segmentation fault (core dumped)
Error: Prefilter step died
Error: Search died
Full (sanitized) log
$ /usr/bin/time -v mmseqs easy-cluster INPUT.fasta OUTPUT /home/USER/nvme/mmseqs2_tmp --cluster-mode 2 --cov-mode 1 --min-seq-id 0.98 --threads 110 -k 11
Create directory /home/USER/nvme/mmseqs2_tmp
easy-cluster INPUT.fasta OUTPUT /home/USER/nvme/mmseqs2_tmp --cluster-mode 2 --cov-mode 1 --min-seq-id 0.98 --threads 110 -k 11
MMseqs Version: 18.8cc5c
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4
k-mer length 11
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max sequence length 65535
Max results per query 20
Split database 0
Split mode 2
Split memory limit 0
Coverage threshold 0.8
Coverage mode 1
Compositional bias 1
Compositional bias scale 1
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 15
Selected taxa
Include identical seq. id. false
Spaced k-mers 1
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Spaced k-mer pattern
Local temporary path
Threads 110
Compressed 0
Verbosity 3
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.98
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max reject 2147483647
Max accept 2147483647
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Cluster mode 2
Max connected component depth 1000
Similarity type 2
Weight file name
Cluster Weight threshold 0.9
Single step clustering false
Cascaded clustering steps 3
Cluster reassign false
Remove temporary files true
Force restart with latest tmp false
MPI runner
k-mers per sequence 21
Scale k-mers per sequence aa:0.000,nucl:0.200
Adjust k-mer length false
Shift hash 67
Include only extendable false
Skip repeating k-mers false
Database type 0
Shuffle input database true
Createdb mode 1
Write lookup file 0
Offset of numeric ids 0
Use GPU 0
createdb INPUT.fasta /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input --createdb-mode 1 --write-lookup 0 --threads 110
Shuffle database cannot be combined with --createdb-mode 1
We recompute with --shuffle 0
Converting sequences
Multiline fasta can not be combined with --createdb-mode 0
We recompute with --createdb-mode 1
Time for merging to input_h: 0h 0m 0s 14ms
Time for merging to input: 0h 0m 0s 0ms
[4725488] 16s 916ms
Time for merging to input_h: 0h 0m 0s 0ms
Time for merging to input: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 16s 953ms
Create directory /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp
cluster /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp -k 11 --max-seqs 20 -c 0.8 --cov-mode 1 --spaced-kmer-mode 1 --threads 110 --alignment-mode 3 -e 0.001 --min-seq-id 0.98 --cluster-mode 2 --remove-tmp-files 1
Set cluster sensitivity to -s 1.000000
Set cluster iterations to 1
linclust /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/clu_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 1 --max-seq-len 10000 --comp-bias-corr 0 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --alph-size aa:21,nucl:5 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 -k 0 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --rescore-mode 0 --filter-hits 0 --sort-results 0 --remove-tmp-files 1 --force-reuse 0
kmermatcher /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:21,nucl:5 --min-seq-id 0.98 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 10000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9
kmermatcher /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:21,nucl:5 --min-seq-id 0.98 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 10000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9
Database size: 4725489 type: Nucleotide
Generate k-mers list for 1 split
[=================================================================] 100.00% 4.73M 26s 261ms
Adjusted k-mer length 17
Sort kmer 0h 0m 15s 563ms
Sort by rep. sequence 0h 0m 9s 316ms
Time for fill: 0h 0m 31s 139ms
Time for merging to pref: 0h 0m 0s 0ms
Time for processing: 0h 2m 0s 766ms
rescorediagonal /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_rescore1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 0 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 110 --compressed 0 -v 3
[=================================================================] 100.00% 4.73M 36s 604ms
Time for merging to pref_rescore1: 0h 0m 1s 742ms
Time for processing: 0h 0m 44s 736ms
clust /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_rescore1 /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pre_clust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9
Clustering mode: Greedy
Total time: 0h 0m 1s 404ms
Size of the sequence database: 4725489
Size of the alignment database: 4725489
Number of clusters: 1468102
Writing results 0h 0m 0s 466ms
Time for merging to pre_clust: 0h 0m 0s 0ms
Time for processing: 0h 0m 2s 457ms
createsubdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/order_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy -v 3 --subdb-mode 1
Time for merging to input_step_redundancy: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 580ms
createsubdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/order_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter1 -v 3 --subdb-mode 1
Time for merging to pref_filter1: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 999ms
filterdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter1 /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter2 --filter-file /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/order_redundancy --threads 110 --compressed 0 -v 3
Filtering using file(s)
[=================================================================] 100.00% 1.47M 1s 555ms
Time for merging to pref_filter2: 0h 0m 0s 651ms
Time for processing: 0h 0m 4s 427ms
align /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter2 /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 1 --max-seq-len 10000 --comp-bias-corr 0 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 110 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 1468102 type: Nucleotide
Target database size: 1468102 type: Nucleotide
Calculation of alignments
[=================================================================] 100.00% 1.47M 31m 39s 77ms
Time for merging to aln: 0h 0m 0s 567ms
135012489 alignments calculated
1849202 sequence pairs passed the thresholds (0.013697 of overall calculated)
1.259587 hits per query sequence
Time for processing: 0h 31m 41s 417ms
clust /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/ linclust/12029077883332159825/aln /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/clust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9
Clustering mode: Greedy
Total time: 0h 0m 0s 171ms
Size of the sequence database: 1468102
Size of the alignment database: 1468102
Number of clusters: 1307095
Writing results 0h 0m 0s 326ms
Time for merging to clust: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 819ms
mergeclusters /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/clu_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/ clu_tmp/11346147783894810752/linclust/12029077883332159825/pre_clust /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/clust --threads 110 --compressed 0 -v 3
Clustering step 1
[=================================================================] 100.00% 1.47M 0s 564ms
Clustering step 2
[=================================================================] 100.00% 1.31M 1s 308ms
Write merged clustering
[=================================================================] 100.00% 4.73M 2s 280ms
Time for merging to clu_redundancy: 0h 0m 0s 546ms
Time for processing: 0h 0m 3s 374ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter1 -v 3
Time for processing: 0h 0m 0s 11ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref -v 3
Time for processing: 0h 0m 0s 647ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_rescore1 -v 3
Time for processing: 0h 0m 0s 73ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pre_clust -v 3
Time for processing: 0h 0m 0s 20ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy -v 3
Time for processing: 0h 0m 0s 11ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy_h -v 3
Time for processing: 0h 0m 0s 4ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter2 -v 3
Time for processing: 0h 0m 0s 301ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/aln -v 3
Time for processing: 0h 0m 0s 44ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/clust -v 3
Time for processing: 0h 0m 0s 16ms
createsubdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/clu_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/cl u_tmp/11346147783894810752/input_step_redundancy -v 3 --subdb-mode 1
Time for merging to input_step_redundancy: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 608ms
extractframes /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs --forward-f rames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 110 --compressed 0 -v 3
[=================================================================] 100.00% 1.31M 1s 618ms
Time for merging to query_seqs_h: 0h 0m 0s 789ms
Time for merging to query_seqs: 0h 0m 27s 41ms
Time for processing: 0h 0m 32s 762ms
prefilter /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/ nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 1 -k 11 --target-search-mode 0 --k-score seq:21 47483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 20 --split 0 --split-mode 2 --split-memory-limit 0 -c 0.8 --cov-mode 1 --comp-bias-corr 0 --comp-bias-corr-scale 1 --diag-score 0 --exact-k mer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 60 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4 .100,context:5.800 --threads 110 --compressed 0 -v 3
Query database size: 2614190 type: Nucleotide
Estimated memory consumption: 29G
Target database size: 1307095 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 11
Index table: counting k-mers
[=================================================================] 100.00% 1.31M 12s 708ms
Index table: Masked residues: 28023227
Index table: fill
[=================================================================] 100.00% 1.31M 22s 66ms
Index statistics
Entries: 3203420912
DB size: 18362 MB
Avg k-mer size: 763.755062
Top 10 k-mers
GGTGAGGTCCC 396918
TAATGCTCCCA 386885
AAATGACCCGG 381619
AACATAAACCC 381280
TAATGTTACAT 381124
ACGATACATCC 380273
GGGTGTATTTC 379717
GGCTGAACCTA 379621
CAGATAAGGTC 375979
CTCACGGTGAA 370318
Time for index table init: 0h 0m 39s 802ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2614190
Target db start 1 to 1307095
[=================================================================] 100.00% 2.61M 9h 50m 32s 792ms
0.967954 k-mers per position
26265772 DB matches per sequence
1089329 overflows
15 sequences passed prefiltering per query sequence
20 median result list length
305540 sequences with 0 size result lists
Time for merging to pref: 0h 0m 1s 164ms
Time for processing: 9h 51m 37s 323ms
rescorediagonal /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_ungapped --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 2 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 110 --compressed 0 -v 3
[=================================================================] 100.00% 2.61M 5s 191ms
Time for merging to aln_ungapped: 0h 0m 0s 977ms
Time for processing: 0h 0m 8s 191ms
subtractdbs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_ungapped /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref_subtract --threads 110 --compressed 0 -v 3
[=================================================================] 100.00% 2.61M 0s 792ms
Time for merging to pref_subtract: 0h 0m 1s 12ms
Time for processing: 0h 0m 2s 750ms
align /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref_subtract /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_gapped --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 1 --max-seq-len 10000 --comp-bias-corr 0 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 110 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 2614190 type: Nucleotide
Target database size: 1307095 type: Nucleotide
Calculation of alignments
[=================================================================] 100.00% 2.61M 1m 28s 861ms
Time for merging to aln_gapped: 0h 0m 0s 932ms
15439105 alignments calculated
48374 sequence pairs passed the thresholds (0.003133 of overall calculated)
0.018504 hits per query sequence
Time for processing: 0h 1m 31s 722ms
concatdbs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_ungapped /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_gapped /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln --preserve-keys --take-larger-entry --threads 110 --compressed 0 -v 3
[=================================================================] 100.00% 2.61M 0s 896ms
[=================================================================] 100.00% 2.61M 1s 720ms
Time for merging to aln: 0h 0m 0s 967ms
Time for processing: 0h 0m 3s 241ms
offsetalignment /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_off --chain-alignments 0 --merge-query 1 --search-type 3 --threads 110 --compressed 0 --db-load-mode 0 -v 3
Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 414ms
Writing results to: /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_off
Invalid database read for id=4294967295, database index=/home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy.index
getSeqLen: local id (4294967295) >= db size (1307095)
Error: Offset step died ] 6.00% 283.74K eta 0s
Error: Search died
Command exited with non-zero status 1
Command being timed: "mmseqs easy-cluster INPUT.fasta OUTPUT /home/USER/nvme/mmseqs2_tmp --cluster-mode 2 --cov-mode 1 --min-seq-id 0.98 --threads 110 -k 11"
User time (seconds): 4070647.01
System time (seconds): 3998.65
Percent of CPU this job got: 10796%
Elapsed (wall clock) time (h:mm:ss or m:ss): 10:29:01
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 61698568
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 7828714
Minor (reclaiming a frame) page faults: 402295918
Voluntary context switches: 8461717
Involuntary context switches: 14515385
Swaps: 0
File system inputs: 14099428
File system outputs: 74391361
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
Environment
- MMseqs2 version:
17.b804f
- OS: Ubuntu 22.04.5 LTS
- Kernel: 6.8.0-65-generic
- CPU / RAM / NUMA: Dual Intel(R) Xeon(R) Platinum 8180M CPU @ 2.50GHz/512GB/NUMA ON
- Storage & FS: ~170 TB RAID5 with 11TB NVME SSD RADI as tmp
Input data characteristics
- Nucleotide FASTA (U-RVDBv31.0.fasta`)
- Multi-line FASTA (tool auto-recomputed with
--createdb-mode 1
) - ~4,725,489 sequences initial DB; reduced to ~1,468,102 after redundancy step
Reproducibility notes
- Crash occurs specifically in
prefilter
after index table build, with--exact-kmer-matching 1
,-k 11
,--cov-mode 1
,--min-seq-id 0.98
. - The same error occurs with other two Ubuntu workstations with the similar configurations.
- Segment fault occurs with version
17.b804f
evenulimit -s unlimited
was issued. - The same error occurs when using "step-by-step" command: mmseqs createdb, followed by mmseqs cluster (the error occurs here).
- Reduced
--threads 110
to--threads 32
, and no luck. - I also tried to limit the RAM use by "--split-memory-limit 480G", but it does not help.
time -v
was used to check the system load. The error still occurs withouttime -v
.- The input fasta file has been gone through sanity check by BioPython code as below and is valid.
from Bio import SeqIO
def is_fasta_valid(filename):
return any(SeqIO.parse(filename, "fasta"))
print("Valid FASTA?" , is_fasta_valid("my_sequences.fasta"))
I was trying to share my fasta input. It seems the fasta.gz file is too big (1.7GB) so GitHub does not let me to share.
Thank you!
Metadata
Metadata
Assignees
Labels
No labels