Skip to content

Segmentation fault in prefilter during mmseqs easy-cluster (nucl, k=11, cov-mode 1, min-seq-id 0.98) #1031

@a7032018

Description

@a7032018

Summary

mmseqs easy-cluster on a nucleotide FASTA crashes with a Segmentation fault during the prefilter step. Crash occurs after successful linclust redundancy removal and alignment steps, while building/using the index for prefiltering the reduced database.

Command

 /usr/bin/time -v mmseqs easy-cluster INPUT.fasta OUTPUT \
/home/USER/nvme/mmseqs2_tmp --cluster-mode 2 --cov-mode 1 --min-seq-id 0.98 --threads 110 -k 11

Notes:

  • Nucleotide database
  • --cov-mode 1, --min-seq-id 0.98, --cluster-mode 2 (greedy), -k 11
  • --threads 32, --split-memory-limit 500G

Expected behavior

Clustering completes without a crash and writes final cluster outputs.

Actual behavior

Pipeline reaches the prefilter phase of the nucleotide clustering workflow and crashes with:

Segmentation fault (core dumped)
Error: Prefilter step died
Error: Search died

Full (sanitized) log

$ /usr/bin/time -v mmseqs easy-cluster INPUT.fasta OUTPUT /home/USER/nvme/mmseqs2_tmp --cluster-mode 2 --cov-mode 1 --min-seq-id 0.98 --threads 110 -k 11
Create directory /home/USER/nvme/mmseqs2_tmp
easy-cluster INPUT.fasta OUTPUT /home/USER/nvme/mmseqs2_tmp --cluster-mode 2 --cov-mode 1 --min-seq-id 0.98 --threads 110 -k 11

MMseqs Version:                         18.8cc5c
Substitution matrix                     aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix                aa:VTML80.out,nucl:nucleotide.out
Sensitivity                             4
k-mer length                            11
Target search mode                      0
k-score                                 seq:2147483647,prof:2147483647
Alphabet size                           aa:21,nucl:5
Max sequence length                     65535
Max results per query                   20
Split database                          0
Split mode                              2
Split memory limit                      0
Coverage threshold                      0.8
Coverage mode                           1
Compositional bias                      1
Compositional bias scale                1
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1
Mask residues probability               0.9
Mask lower case residues                0
Mask lower letter repeating N times     0
Minimum diagonal score                  15
Selected taxa
Include identical seq. id.              false
Spaced k-mers                           1
Preload mode                            0
Pseudo count a                          substitution:1.100,context:1.400
Pseudo count b                          substitution:4.100,context:5.800
Spaced k-mer pattern
Local temporary path
Threads                                 110
Compressed                              0
Verbosity                               3
Add backtrace                           false
Alignment mode                          3
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0.98
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Max reject                              2147483647
Max accept                              2147483647
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Correlation score weight                0
Gap open cost                           aa:11,nucl:5
Gap extension cost                      aa:1,nucl:2
Zdrop                                   40
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Cluster mode                            2
Max connected component depth           1000
Similarity type                         2
Weight file name
Cluster Weight threshold                0.9
Single step clustering                  false
Cascaded clustering steps               3
Cluster reassign                        false
Remove temporary files                  true
Force restart with latest tmp           false
MPI runner
k-mers per sequence                     21
Scale k-mers per sequence               aa:0.000,nucl:0.200
Adjust k-mer length                     false
Shift hash                              67
Include only extendable                 false
Skip repeating k-mers                   false
Database type                           0
Shuffle input database                  true
Createdb mode                           1
Write lookup file                       0
Offset of numeric ids                   0
Use GPU                                 0

createdb INPUT.fasta /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input --createdb-mode 1 --write-lookup 0 --threads 110

Shuffle database cannot be combined with --createdb-mode 1
We recompute with --shuffle 0
Converting sequences
Multiline fasta can not be combined with --createdb-mode 0
We recompute with --createdb-mode 1
Time for merging to input_h: 0h 0m 0s 14ms
Time for merging to input: 0h 0m 0s 0ms
[4725488] 16s 916ms
Time for merging to input_h: 0h 0m 0s 0ms
Time for merging to input: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 16s 953ms
Create directory /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp
cluster /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp -k 11 --max-seqs 20 -c 0.8 --cov-mode 1 --spaced-kmer-mode 1 --threads 110 --alignment-mode 3 -e 0.001 --min-seq-id 0.98 --cluster-mode 2 --remove-tmp-files 1

Set cluster sensitivity to -s 1.000000
Set cluster iterations to 1
linclust /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/clu_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 1 --max-seq-len 10000 --comp-bias-corr 0 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --alph-size aa:21,nucl:5 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 -k 0 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --rescore-mode 0 --filter-hits 0 --sort-results 0 --remove-tmp-files 1 --force-reuse 0

kmermatcher /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:21,nucl:5 --min-seq-id 0.98 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 10000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9

kmermatcher /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:21,nucl:5 --min-seq-id 0.98 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 10000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Database size: 4725489 type: Nucleotide

Generate k-mers list for 1 split
[=================================================================] 100.00% 4.73M 26s 261ms

Adjusted k-mer length 17
Sort kmer 0h 0m 15s 563ms
Sort by rep. sequence 0h 0m 9s 316ms
Time for fill: 0h 0m 31s 139ms
Time for merging to pref: 0h 0m 0s 0ms
Time for processing: 0h 2m 0s 766ms
rescorediagonal /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_rescore1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 0 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 110 --compressed 0 -v 3

[=================================================================] 100.00% 4.73M 36s 604ms
Time for merging to pref_rescore1: 0h 0m 1s 742ms
Time for processing: 0h 0m 44s 736ms
clust /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_rescore1 /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pre_clust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads 110 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Clustering mode: Greedy
Total time: 0h 0m 1s 404ms

Size of the sequence database: 4725489
Size of the alignment database: 4725489
Number of clusters: 1468102

Writing results 0h 0m 0s 466ms
Time for merging to pre_clust: 0h 0m 0s 0ms
Time for processing: 0h 0m 2s 457ms
createsubdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/order_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy -v 3 --subdb-mode 1

Time for merging to input_step_redundancy: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 580ms
createsubdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/order_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter1 -v 3 --subdb-mode 1

Time for merging to pref_filter1: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 999ms
filterdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter1 /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter2 --filter-file /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/order_redundancy --threads 110 --compressed 0 -v 3

Filtering using file(s)
[=================================================================] 100.00% 1.47M 1s 555ms
Time for merging to pref_filter2: 0h 0m 0s 651ms
Time for processing: 0h 0m 4s 427ms
align /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter2 /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 1 --max-seq-len 10000 --comp-bias-corr 0 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 110 --compressed 0 -v 3

Compute score, coverage and sequence identity
Query database size: 1468102 type: Nucleotide
Target database size: 1468102 type: Nucleotide
Calculation of alignments
[=================================================================] 100.00% 1.47M 31m 39s 77ms
Time for merging to aln: 0h 0m 0s 567ms
135012489 alignments calculated
1849202 sequence pairs passed the thresholds (0.013697 of overall calculated)
1.259587 hits per query sequence
Time for processing: 0h 31m 41s 417ms
clust /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/  linclust/12029077883332159825/aln /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/clust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads   110 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Clustering mode: Greedy
Total time: 0h 0m 0s 171ms

Size of the sequence database: 1468102
Size of the alignment database: 1468102
Number of clusters: 1307095

Writing results 0h 0m 0s 326ms
Time for merging to clust: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 819ms
mergeclusters /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/clu_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/  clu_tmp/11346147783894810752/linclust/12029077883332159825/pre_clust /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/clust --threads 110 --compressed 0 -v 3

Clustering step 1
[=================================================================] 100.00% 1.47M 0s 564ms
Clustering step 2
[=================================================================] 100.00% 1.31M 1s 308ms
Write merged clustering
[=================================================================] 100.00% 4.73M 2s 280ms
Time for merging to clu_redundancy: 0h 0m 0s 546ms
Time for processing: 0h 0m 3s 374ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter1 -v 3

Time for processing: 0h 0m 0s 11ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref -v 3

Time for processing: 0h 0m 0s 647ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_rescore1 -v 3

Time for processing: 0h 0m 0s 73ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pre_clust -v 3

Time for processing: 0h 0m 0s 20ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy -v 3

Time for processing: 0h 0m 0s 11ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/input_step_redundancy_h -v 3

Time for processing: 0h 0m 0s 4ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/pref_filter2 -v 3

Time for processing: 0h 0m 0s 301ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/aln -v 3

Time for processing: 0h 0m 0s 44ms
rmdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/linclust/12029077883332159825/clust -v 3

Time for processing: 0h 0m 0s 16ms
createsubdb /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/clu_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/input /home/USER/nvme/mmseqs2_tmp/7212019490085045169/cl  u_tmp/11346147783894810752/input_step_redundancy -v 3 --subdb-mode 1

Time for merging to input_step_redundancy: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 608ms
extractframes /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs --forward-f  rames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 110 --compressed 0 -v 3

[=================================================================] 100.00% 1.31M 1s 618ms
Time for merging to query_seqs_h: 0h 0m 0s 789ms
Time for merging to query_seqs: 0h 0m 27s 41ms
Time for processing: 0h 0m 32s 762ms
prefilter /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/  nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 1 -k 11 --target-search-mode 0 --k-score seq:21  47483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 20 --split 0 --split-mode 2 --split-memory-limit 0 -c 0.8 --cov-mode 1 --comp-bias-corr 0 --comp-bias-corr-scale 1 --diag-score 0 --exact-k  mer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 60 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4  .100,context:5.800 --threads 110 --compressed 0 -v 3

Query database size: 2614190 type: Nucleotide
Estimated memory consumption: 29G
Target database size: 1307095 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 11
Index table: counting k-mers
[=================================================================] 100.00% 1.31M 12s 708ms
Index table: Masked residues: 28023227
Index table: fill
[=================================================================] 100.00% 1.31M 22s 66ms
Index statistics
Entries:          3203420912
DB size:          18362 MB
Avg k-mer size:   763.755062
Top 10 k-mers
    GGTGAGGTCCC 396918
    TAATGCTCCCA 386885
    AAATGACCCGG 381619
    AACATAAACCC 381280
    TAATGTTACAT 381124
    ACGATACATCC 380273
    GGGTGTATTTC 379717
    GGCTGAACCTA 379621
    CAGATAAGGTC 375979
    CTCACGGTGAA 370318
Time for index table init: 0h 0m 39s 802ms
Process prefiltering step 1 of 1

k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2614190
Target db start 1 to 1307095
[=================================================================] 100.00% 2.61M 9h 50m 32s 792ms

0.967954 k-mers per position
26265772 DB matches per sequence
1089329 overflows
15 sequences passed prefiltering per query sequence
20 median result list length
305540 sequences with 0 size result lists
Time for merging to pref: 0h 0m 1s 164ms
Time for processing: 9h 51m 37s 323ms
rescorediagonal /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_ungapped --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 2 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 110 --compressed 0 -v 3

[=================================================================] 100.00% 2.61M 5s 191ms
Time for merging to aln_ungapped: 0h 0m 0s 977ms
Time for processing: 0h 0m 8s 191ms
subtractdbs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_ungapped /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref_subtract --threads 110 --compressed 0 -v 3

[=================================================================] 100.00% 2.61M 0s 792ms
Time for merging to pref_subtract: 0h 0m 1s 12ms
Time for processing: 0h 0m 2s 750ms
align /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/pref_subtract /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_gapped --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.98 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 1 --max-seq-len 10000 --comp-bias-corr 0 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 110 --compressed 0 -v 3

Compute score, coverage and sequence identity
Query database size: 2614190 type: Nucleotide
Target database size: 1307095 type: Nucleotide
Calculation of alignments
[=================================================================] 100.00% 2.61M 1m 28s 861ms
Time for merging to aln_gapped: 0h 0m 0s 932ms
15439105 alignments calculated
48374 sequence pairs passed the thresholds (0.003133 of overall calculated)
0.018504 hits per query sequence
Time for processing: 0h 1m 31s 722ms
concatdbs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_ungapped /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_gapped /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln --preserve-keys --take-larger-entry --threads 110 --compressed 0 -v 3

[=================================================================] 100.00% 2.61M 0s 896ms
[=================================================================] 100.00% 2.61M 1s 720ms
Time for merging to aln: 0h 0m 0s 967ms
Time for processing: 0h 0m 3s 241ms
offsetalignment /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/query_seqs /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_off --chain-alignments 0 --merge-query 1 --search-type 3 --threads 110 --compressed 0 --db-load-mode 0 -v 3

Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 414ms
Writing results to: /home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/aln_off
Invalid database read for id=4294967295, database index=/home/USER/nvme/mmseqs2_tmp/7212019490085045169/clu_tmp/11346147783894810752/input_step_redundancy.index
getSeqLen: local id (4294967295) >= db size (1307095)
Error: Offset step died                                           ] 6.00% 283.74K eta 0s
Error: Search died
Command exited with non-zero status 1
        Command being timed: "mmseqs easy-cluster INPUT.fasta OUTPUT /home/USER/nvme/mmseqs2_tmp --cluster-mode 2 --cov-mode 1 --min-seq-id 0.98 --threads 110 -k 11"
        User time (seconds): 4070647.01
        System time (seconds): 3998.65
        Percent of CPU this job got: 10796%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 10:29:01
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 61698568
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 7828714
        Minor (reclaiming a frame) page faults: 402295918
        Voluntary context switches: 8461717
        Involuntary context switches: 14515385
        Swaps: 0
        File system inputs: 14099428
        File system outputs: 74391361
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 1

Environment

  • MMseqs2 version: 17.b804f
  • OS: Ubuntu 22.04.5 LTS
  • Kernel: 6.8.0-65-generic
  • CPU / RAM / NUMA: Dual Intel(R) Xeon(R) Platinum 8180M CPU @ 2.50GHz/512GB/NUMA ON
  • Storage & FS: ~170 TB RAID5 with 11TB NVME SSD RADI as tmp

Input data characteristics

  • Nucleotide FASTA (U-RVDBv31.0.fasta`)
  • Multi-line FASTA (tool auto-recomputed with --createdb-mode 1)
  • ~4,725,489 sequences initial DB; reduced to ~1,468,102 after redundancy step

Reproducibility notes

  • Crash occurs specifically in prefilter after index table build, with --exact-kmer-matching 1, -k 11, --cov-mode 1, --min-seq-id 0.98.
  • The same error occurs with other two Ubuntu workstations with the similar configurations.
  • Segment fault occurs with version 17.b804f even ulimit -s unlimited was issued.
  • The same error occurs when using "step-by-step" command: mmseqs createdb, followed by mmseqs cluster (the error occurs here).
  • Reduced --threads 110 to --threads 32, and no luck.
  • I also tried to limit the RAM use by "--split-memory-limit 480G", but it does not help.
  • time -v was used to check the system load. The error still occurs without time -v.
  • The input fasta file has been gone through sanity check by BioPython code as below and is valid.
from Bio import SeqIO

def is_fasta_valid(filename):
    return any(SeqIO.parse(filename, "fasta"))

print("Valid FASTA?" , is_fasta_valid("my_sequences.fasta"))

I was trying to share my fasta input. It seems the fasta.gz file is too big (1.7GB) so GitHub does not let me to share.

Thank you!


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions