Prefilter process is Killed during nucleotide search (--search-type 3)

Dear MMseqs2 team,

I am unable to successfully run nucleotide-vs-nucleotide  searches for taxonomic annotation.

**Environment:**

    MMseqs2 Version: 18.8cc5c

    OS: Linux (HPC environment)

    Installation method: Conda

**Bug Description**

When performing a nucleotide-vs-nucleotide search (--search-type 3) using a set of assembled contigs against the ref_prok_rep_genomes database, the prefilter subprocess is being terminated with a Killed signal. 

I  have observed this exact behavior with two different approaches, both following official documentation:

- Using the mmseqs taxonomy easy-workflow.
- Using an explicit, modular workflow (mmseqs createdb -> mmseqs search -> mmseqs lca).

This behavior could be related to the problems discussed in Issue #932?

**Database Preparation**

For full context, the target MMseqs2 database was created from a local BLAST database (ref_prok_rep_genomes) and the NCBI taxdump, following the standard procedure outlined in the MMseqs2 User Guide.

```
# 1. Download NCBI taxdump
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
mkdir taxonomy && tar -xzf taxdump.tar.gz -C taxonomy

# 2. Extract FASTA and mapping file from BLAST DB
blastdbcmd \
    -db ref_prok_rep_genomes \
    -entry all > ref_prok_rep_genomes.fna
blastdbcmd \
    -db ref_prok_rep_genomes \
    -entry all \
    -outfmt "%a %T" > ref_prok_rep_genomes.taxidmapping

# 3. Create the MMseqs2 sequence database
mmseqs createdb \
    ref_prok_rep_genomes.fna \
    ref_prok_rep_genomes_db \
    --dbtype 2

# 4. Create the final taxonomically-annotated database
mmseqs createtaxdb \
    ref_prok_rep_genomes_db \
    tmp_taxdb \
    --ncbi-tax-dump taxonomy/ \
    --tax-mapping-file ref_prok_rep_genomes.taxidmapping
```

**Steps to Reproduce**

The query is a standard set of metagenomic contigs.

```
# Step 1: Create query database
mmseqs createdb \
    path/to/contigs.fna \
    path/to/queryDB \
    --compressed 1 \
    --dbtype 2

# Step 2: Perform nucleotide search
mmseqs search \
    path/to/queryDB \
    path/to/ref_prok_rep_genomes_db \
    path/to/search_results.db \
    path/to/tmp_dir \
    --split-memory-limit 250G \
    --max-seq-len 300000000 \
    --search-type 3 \
    -s 4.0 \
    --compress 1

# Step 3: LCA
mmseqs lca \
    path/to/ref_prok_rep_genomes_db \
    path/to/search_results.db \
    path/to/lca.db \
    --tax-lineage 1

# Step 4: Create TSV report
mmseqs createtsv \
    path/to/queryDB \
    path/to/lca.db \
    path/to/tax.tsv \
    --compressed 1

# Step 5: Generate Kraken-style report
mmseqs taxonomyreport \
    path/to/ref_prok_rep_genomes_db \
    path/to/lca.db \
    path/to/tax.report \
    --report-mode 0

# Step 6: Generate Krona report
mmseqs taxonomyreport \
    path/to/ref_prok_rep_genomes_db \
    path/to/lca.db \
    path/to/tax.html \
    --report-mode 1
```

**Observed Behavior**

The workflow fails during the prefilter step. The log output shows that the process is Killed after estimating memory consumption and starting the first of three prefiltering steps.

```
Query database size: 19348 type: Nucleotide
Target split mode. Searching through 3 splits
Estimated memory consumption: 222G
Target database size: 1102829 type: Nucleotide
The output of the prefilter cannot be compressed during target split mode. Prefilter result will not be compressed.
Process prefiltering step 1 of 3

Index table k-mer threshold: 0 at k-mer size 15 
Index table: counting k-mers
[=================================================================]
/path/to/blastp.sh: line 144: 1652760 Killed                  $RUNNER "$MMSEQS" prefilter "$INPUT" "$TARGET" "$TMP_PATH/pref_$STEP" $PREFILTER_PAR -s "$SENS"
Error: Prefilter died
Error: Search step died
```

[mmseqs2_contigs.mg.log](https://github.com/user-attachments/files/21679058/mmseqs2_contigs.mg.log)

**Further Questions**

Could you clarify the behavior of the --compress 1 flag? Is it safe to use this flag at every possible step (createdb, search, etc.)?

What are the best practices fro nucleotide-vs-nucleotide searches?

Thank you for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefilter process is Killed during nucleotide search (--search-type 3) #1024

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prefilter process is Killed during nucleotide search (--search-type 3) #1024

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions