Unsupervised Bootstrapping of G Function with LLM Based Rerankers

Hi MFAR authors,

Thank you for sharing the "Multi-Field Adaptive Retrieval" (ICLR 2025) paper and code! Last day I visited your poster and had a talk.

The adaptive field weighting via the G function is a compelling approach for semi-structured retrieval.

Training G typically requires labeled query-document pairs, which may be scarce for new datasets. Could G be trained unsupervised using specialized rerankers, possibly iteratively?

**Proposed Approach:**
1. **Initial Retrieval**: Retrieve candidates using BM25 or a pre-trained dense retriever.
2. **Reranking**: Rerank candidates with a fine-tuned reranker (e.g., ~some 7B-scale LLM from MTEB leaderboards).
3. **Pseudo-Labels**: Use top-ranked documents as pseudo-positives and others as pseudo-negatives.
4. **Training**:
   - **Initial**: Train MFAR (including G) with contrastive loss on pseudo-labels.
   - **Iterative**: Use trained MFAR for better retrieval, rerank, and refine G.

Has you considered similar approach? What challenges might arise, such as pseudo-label diversity or iterative convergence, for semi-structured data?

Thanks for your thoughts!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unsupervised Bootstrapping of G Function with LLM Based Rerankers #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unsupervised Bootstrapping of G Function with LLM Based Rerankers #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions