Skip to content

Unsupervised Bootstrapping of G Function with LLM Based Rerankers #9

@Yikai-Liao

Description

@Yikai-Liao

Hi MFAR authors,

Thank you for sharing the "Multi-Field Adaptive Retrieval" (ICLR 2025) paper and code! Last day I visited your poster and had a talk.

The adaptive field weighting via the G function is a compelling approach for semi-structured retrieval.

Training G typically requires labeled query-document pairs, which may be scarce for new datasets. Could G be trained unsupervised using specialized rerankers, possibly iteratively?

Proposed Approach:

  1. Initial Retrieval: Retrieve candidates using BM25 or a pre-trained dense retriever.
  2. Reranking: Rerank candidates with a fine-tuned reranker (e.g., ~some 7B-scale LLM from MTEB leaderboards).
  3. Pseudo-Labels: Use top-ranked documents as pseudo-positives and others as pseudo-negatives.
  4. Training:
    • Initial: Train MFAR (including G) with contrastive loss on pseudo-labels.
    • Iterative: Use trained MFAR for better retrieval, rerank, and refine G.

Has you considered similar approach? What challenges might arise, such as pseudo-label diversity or iterative convergence, for semi-structured data?

Thanks for your thoughts!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions