Breaking the Top-k Assumption in Pseudo-Relevance Feedback: An Empirical Analysis of Relevance Distribution in BM25 Rankings

Khaled Albishre

doi:10.48084/etasr.18857

Authors

Khaled Albishre Department of Computer Science, University College of Al Jamoum, Umm Al-Qura University, Al Jumum, Saudi Arabia

Volume: 16 | Issue: 4 | Pages: 37231-37238 | August 2026 | https://doi.org/10.48084/etasr.18857

Received: 21 March 2026 | Revised: 27 April 2026 and 16 May 2026 | Accepted: 17 May 2026 | Online: 17 June 2026

Corresponding author: Khaled Albishre

Abstract

Pseudo-Relevance Feedback (PRF) has remained a cornerstone of unsupervised retrieval since Rocchio (1971), yet the foundational assumption that the top- retrieved documents are the best available feedback has received limited direct empirical scrutiny, despite widespread adoption in both classical and neural approaches. This study analyzed BM25 retrieval on the TREC Deep Learning 2019 and 2020 test collections (97 queries, 20,646 graded relevance judgements) and found that 79% of relevant documents fall outside the top-10, with a mean rank of 37.3. An oracle selection strategy achieved 0.324 higher feedback precision at , defined as the proportion of graded-relevant documents within the selected for expansion, with a large effect size that is consistent across all tested values of and both test collections. LLM-based analysis of 61 extreme cases identified vocabulary gap as the dominant failure mode in 96.7% of cases, driven primarily by implicit relevance (35.6%) and hypernym-hyponym mismatch (27.1%). These findings establish that document selection, rather than term weighting, is the primary lever for PRF improvement and identify the vocabulary gap as the principal target for next-generation methods. The results demonstrate that improving feedback-document selection represents a largely unexplored avenue for PRF advancement.

Keywords:

pseudo-relevance feedback, BM25, TREC deep learning, vocabulary gap, oracle experiment, query expansion

References

[1] S. Khalid, S. Khusro, I. Ullah, and G. Dawson-Amoah, "On The Current State of Scholarly Retrieval Systems," Engineering, Technology & Applied Science Research, vol. 9, no. 1, pp. 3863–3870, Feb. 2019.

[2] J. J. Rocchio Jr, "Relevance feedback in information retrieval," The SMART retrieval system: experiments in automatic document processing, pp. 313–323, 1971.

[3] H. Li, A. Mourad, S. Zhuang, B. Koopman, and G. Zuccon, "Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls," ACM Transactions on Information Systems, vol. 41, no. 3, pp. 1–40, July 2023.

[4] N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models." arXiv, Oct. 21, 2021.

[5] N. Abdul-Jaleel et al., "UMass at TREC 2004: Novelty and HARD:," Defense Technical Information Center, Jan. 2004.

[6] V. Lavrenko and W. B. Croft, "Relevance-Based Language Models," ACM SIGIR Forum, vol. 51, no. 2, pp. 260–267, Aug. 2017.

[7] G. Amati, "Probability models for information retrieval based on divergence from randomness," Ph.D. dissertation, University of Glasgow, UK, 2003.

[8] C. Zhai and J. Lafferty, "Model-based feedback in the language modeling approach to information retrieval," in Proceedings of the tenth international conference on Information and knowledge management, July 2001, pp. 403–410.

[9] Y. Lv and C. Zhai, "A comparative study of methods for estimating query language models with pseudo feedback," in Proceedings of the 18th ACM conference on Information and knowledge management, Aug. 2009, pp. 1895–1898.

[10] H. Yu, C. Xiong, and J. Callan, "Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback," in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Oct. 2021, pp. 3592–3596.

[11] X. Wang, C. MacDonald, N. Tonellotto, and I. Ounis, "ColBERT-PRF: Semantic Pseudo-Relevance Feedback for Dense Passage and Document Retrieval," ACM Transactions on the Web, vol. 17, no. 1, pp. 1–39, Feb. 2023.

[12] X. Wang, C. Macdonald, N. Tonellotto, and I. Ounis, "Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval," in Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, July 2021, pp. 297–306.

[13] Y. Tu et al., "Generalized Pseudo-Relevance Feedback." arXiv, Oct. 29, 2025.

[14] H. Li, S. Zhuang, B. Koopman, and G. Zuccon, "LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback." arXiv, Apr. 02, 2025.

[15] H. Li, X. Wang, B. Koopman, and G. Zuccon, "Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models." arXiv, June 06, 2025.

[16] K. Albishre, "Anchored dense-chain pseudo-relevance feedback: sequential state refinement for neural retrieval," Journal of King Saud University Computer and Information Sciences,Mar. 2026.

[17] S. Datta, D. Ganguly, S. MacAvaney, and D. Greene, "A Deep Learning Approach for Selective Relevance Feedback," in Advances in Information Retrieval, vol. 14609, N. Goharian, N. Tonellotto, Y. He, A. Lipani, G. McDonald, C. Macdonald, and I. Ounis, Eds. Cham: Springer Nature Switzerland, 2024, pp. 189–204.

[18] K. Albishre, Y. Li, and Y. Xu, "Query-Based Automatic Training Set Selection for Microblog Retrieval," in Advances in Knowledge Discovery and Data Mining, 2018, pp. 325–336.

[19] N. Jedidi and J. Lin, "A Systematic Study of Pseudo-Relevance Feedback with LLMs." arXiv, 2026.

[20] S. Robertson and H. Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc, 2009.

[21] L. Gao, X. Ma, J. Lin, and J. Callan, "Precise Zero-Shot Dense Retrieval without Relevance Labels," in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Apr. 2023, pp. 1762–1777.

[22] L. Wang, N. Yang, and F. Wei, "Query2doc: Query Expansion with Large Language Models," in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Sept. 2023, pp. 9414–9423.

[23] Y. Lei, T. Shen, and A. Yates, "ThinkQE: Query Expansion via an Evolving Thinking Process." arXiv, Mar. 09, 2026.

[24] S. Khalid and S. Wu, "Supporting Scholarly Search by Query Expansion and Citation Analysis," Engineering, Technology & Applied Science Research, vol. 10, no. 4, pp. 6102–6108, Aug. 2020.

[25] K. Albishre, Y. Li, Y. Xu, and W. Huang, "Query-based unsupervised learning for improving social media search," World Wide Web, vol. 23, no. 3, pp. 1791–1809, May 2020.

[26] E. Voorhees, N. Craswell, B. Mitra, D. Campos, and E. Yilmaz, "Overview of the TREC 2019 Deep Learning Track," National Institute of Standards and Technology, SP1250, 2020.

[27] N. Craswell, B. Mitra, E. Yilmaz, and D. Campos, "Overview of the TREC 2020 deep learning track." arXiv, 2021.

[28] P. Bajaj et al., "MS MARCO: A Human Generated MAchine Reading COmprehension Dataset." arXiv, Oct. 31, 2018.

[29] J. Lin, X. Ma, S. C. Lin, J. H. Yang, R. Pradeep, and R. Nogueira, "Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations," in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2021, pp. 2356–2362.

[30] A. Q. Jiang et al., "Mistral 7B." arXiv, Oct. 10, 2023.

[31] R. Nogueira, Z. Jiang, R. Pradeep, and J. Lin, "Document Ranking with a Pretrained Sequence-to-Sequence Model," in Findings of the Association for Computational Linguistics: EMNLP 2020, Aug. 2020, pp. 708–718.

[32] S. Khalid, S. Wu, and F. Zhang, "A multi-objective approach to determining the usefulness of papers in academic search," Data Technologies and Applications, vol. 55, no. 5, pp. 734–748, Oct. 2021.