Description of the bug
Hello,
I encountered an error in the KMERFINDER_DOWNLOAD_REFERENCE step, specifically during the NCBI API-based retrieval of reference files (e.g., .gff, .fna). The failure appears to stem from a validation issue in how the downloaded content is handled.
Below are my observations:
The download_reference.py script fails with a BadZipFile: File is not a zip file error when the NCBI Datasets API returns HTTP 200 status with non-ZIP data (such as JSON error responses or HTML pages). The script assumes all 200 responses contain valid ZIP archives without validating the response content type or format download_reference.py:74-76 .
Version of nf-core/bacass: v2.5.0 and it was executed in an HPC system with Lustre file structure.
This bug blocks the entire kmerfinder workflow, preventing:
- Reference-based QUAST analysis
- LIFTOFF annotation transfer
- Pipeline completion with proper reference genome assessment
Here's the snippet of the error message:
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/bacass] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_BACASS:BACASS:KMERFINDER_SUMMARY_DOWNLOAD:KMERFINDER_DOWNLOAD_REFERENCE (NFCORE_BACASS:BACASS:KMERFINDER_SUMMARY_DOWNLOAD:KMERFINDER_DOWNLOAD_REFERENCE)'
Caused by:
Process `NFCORE_BACASS:BACASS:KMERFINDER_SUMMARY_DOWNLOAD:KMERFINDER_DOWNLOAD_REFERENCE (NFCORE_BACASS:BACASS:KMERFINDER_SUMMARY_DOWNLOAD:KMERFINDER_DOWNLOAD_REFERENCE)` terminated with an error exit status (1)
Command error:
Traceback (most recent call last):
File "/home/user/.nextflow/assets/nf-core/bacass/bin/download_reference.py", line 157, in <module>
sys.exit(main())
File "/home/user/.nextflow/assets/nf-core/bacass/bin/download_reference.py", line 148, in main
_extract_files(zip_bytes, acc, out_dir)
File "/home/user/.nextflow/assets/nf-core/bacass/bin/download_reference.py", line 97, in _extract_files
with zipfile.ZipFile(zip_bytes) as zf:
File "/usr/local/lib/python3.10/zipfile.py", line 1258, in __init__
self._RealGetContents()
File "/usr/local/lib/python3.10/zipfile.py", line 1325, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Command used and terminal output
nextflow run nf-core/bacass \
-profile singularity \
-r 2.5.0 \
--input /lustre/user/project_file/nanopore_run/fastq_pass/merged_fastq/samplesheet_bacass_exp2.tsv \
--outdir /lustre/user/project_file/nanopore_run/bacass_op_exp2_run5/ \
--skip_toulligqc \
--skip_fastqc \
--skip_nanoplot \
--assembly_type long \
--assembler dragonflye \
--kmerfinderdb /lustre/user/project_file/databases/kmerfinder/20190108_stable_dirs/bacteria \
--dragonflye_args "--gsize 5m" \
--polish_method medaka \
--annotation_tool prokka \
--kraken2db /lustre/user/project_file/databases/kraken_tar/k2_standard_8gb_20210517.tar \
--busco_mode genome \
--busco_db_path /lustre/user/project_file/databases/kmerfinder/20190108_stable_dirs/bacteria/databases/busco/bacteria_odb10 \
Relevant files
No response
System information
No response
Description of the bug
Hello,
I encountered an error in the
KMERFINDER_DOWNLOAD_REFERENCEstep, specifically during the NCBI API-based retrieval of reference files (e.g.,.gff,.fna). The failure appears to stem from a validation issue in how the downloaded content is handled.Below are my observations:
The
download_reference.pyscript fails with aBadZipFile: File is not a zip fileerror when the NCBI Datasets API returns HTTP 200 status with non-ZIP data (such as JSON error responses or HTML pages). The script assumes all 200 responses contain valid ZIP archives without validating the response content type or formatdownload_reference.py:74-76.Version of nf-core/bacass: v2.5.0 and it was executed in an HPC system with Lustre file structure.
This bug blocks the entire kmerfinder workflow, preventing:
Here's the snippet of the error message:
Command used and terminal output
Relevant files
No response
System information
No response