Skip to content

Conversation

@RudraMantri123
Copy link

@RudraMantri123 RudraMantri123 commented Nov 17, 2025

Summary

  • Reproduced the MS MARCO passage BM25 baseline using Anserini’s prebuilt msmarco-v1-passage index (JDK 21 on macOS).
  • Evaluated the run with trec_eval and confirmed the documented MRR@10 of 0.1875.
  • Downloaded the 26 GB msmarco-v1-passage.bge-base-en-v1.5.hnsw index and ran SearchHnswDenseVectors, producing run.msmarco-passage.dev.bge.txt.
  • Verified the dense run with trec_eval (MRR@10 0.3521).
  • Added the 2025-11-19 reproduction log entry to both docs/experiments-msmarco-passage.md and docs/start-here.md.

Setup

  • macOS (Apple Silicon)
  • Homebrew OpenJDK 21 (/opt/homebrew/opt/openjdk@21)
  • Anserini 1.3.0 fatjar (prebuilt)
  • Prebuilt indexes cached under ~/.cache/pyserini/indexes

Notes

  • Dense-index download (26 GB) succeeded after running via bin/run.sh with the prebuilt fatjar path fix.
  • No code changes beyond the run script adjustment and documentation update were needed.

@RudraMantri123 RudraMantri123 changed the title Add RudraMantri123 reproduction log entry Results reproduced by @RudraMantri123 on 2025-11-17 (commit 9406dd8) Setup: - Operating System: macOS 14 (Darwin 24.6.0) - Python: 3.13.6 - Commit: 9406dd893e02922cf2b690a8fabf181a14d36bf4 Status: Everything worked successfully - Successfully downloaded and verified MS MARCO passage collection (MD5: 31644046b18952c1386cd4564ba2ae69) - Converted collection from TSV to JSONL format (9 files, 8,841,823 documents) - Filtered queries successfully (6,980 queries in dev.small) - All verification steps passed Changes: - Added reproduction log entry to docs/start-here.md - Added reproduction log entry to docs/experiments-msmarco-passage.md Nov 17, 2025
@RudraMantri123 RudraMantri123 changed the title Results reproduced by @RudraMantri123 on 2025-11-17 (commit 9406dd8) Setup: - Operating System: macOS 14 (Darwin 24.6.0) - Python: 3.13.6 - Commit: 9406dd893e02922cf2b690a8fabf181a14d36bf4 Status: Everything worked successfully - Successfully downloaded and verified MS MARCO passage collection (MD5: 31644046b18952c1386cd4564ba2ae69) - Converted collection from TSV to JSONL format (9 files, 8,841,823 documents) - Filtered queries successfully (6,980 queries in dev.small) - All verification steps passed Changes: - Added reproduction log entry to docs/start-here.md - Added reproduction log entry to docs/experiments-msmarco-passage.md Results reproduced by @RudraMantri123 Nov 17, 2025
@lintool
Copy link
Member

lintool commented Nov 22, 2025

Please fix conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants