Topic Model and Deep Reinforcement Learning Applied to the Extractive Query-Based Summarization Task

Abeer Hussien; Wafaa Al Hameed

doi:10.48084/etasr.14632

Authors

Abeer Hussien Software Department, College of Information Technology of University of Babylon, Iraq
Wafaa Al Hameed Software Department, College of Information Technology of University of Babylon, Iraq

Volume: 15 | Issue: 6 | Pages: 30087-30096 | December 2025 | https://doi.org/10.48084/etasr.14632

Received: 9 September 2025 | Revised: 18 October 2025 and 19 October 2025 | Accepted: 21 October 2025 | Online: 8 December 2025

Corresponding author: Abeer Hussien

Abstract

The rapid expansion of digital information has generated an increasing desire for intelligent systems that can produce abstract and relevance summaries from a large document corpus. Conventional summarization methods sometimes fail to address multiple documents effectively, particularly when the summaries should meet a specific user's query. Although significant advances have been made, several extractive summarization methods face challenges in preserving non-redundancy, coherence, and relevance, especially when dealing with different query information and multiple document inputs. Additionally, traditional methods lack mechanisms for balancing diversity and semantic similarity while creating summaries that align with the intention of queries. To address these challenges, this study suggests an extractive query-based summarization model that combines BERT embeddings, semantic clustering (K-means), topic modeling (LDA), and Deep Reinforcement Learning (DRL), identifying a sentence and choosing it or skipping it based on a reward function that is designed with multi-objective integration of BERT-based coherence scores with Maximal Marginal Relevance (MMR). The proposed system was trained on the QuerySum dataset and tested on the CNN/Daily Mail dataset. The experimental results show that the proposed system outperforms traditional approaches in various measures. Combining the BERT-based coherence score and MMR for designing a reward function helps to improve ROUGE scores [ROUGE-1 (50.03%), ROUGE-2 (27.30%), and ROUGE-L (39.86%)] and increases the BERT score (88.70%). Additionally, the generated summaries were relevant, coherent, concise, and less redundant compared to existing approaches.

Keywords:

Maximal Marginal Relevance (MMR), Deep Reinforcement Learning (DRL), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), topic modeling (LDA)

References

D. H. Nguyen et al., "Robust Deep Reinforcement Learning for Extractive Legal Summarization," in Neural Information Processing, vol. 1517, T. Mantoro, M. Lee, M. A. Ayu, K. W. Wong, and A. N. Hidayanto, Eds. Springer International Publishing, 2021, pp. 597–604. DOI: https://doi.org/10.1007/978-3-030-92310-5_69

M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, and X. Huang, "Extractive Summarization as Text Matching." arXiv, Apr. 19, 2020. DOI: https://doi.org/10.18653/v1/2020.acl-main.552

L. Abualigah, M. Q. Bashabsheh, H. Alabool, and M. Shehab, "Text Summarization: A Brief Review," in Recent Advances in NLP: The Case of Arabic Language, vol. 874, M. Abd Elaziz, M. A. A. Al-qaness, A. A. Ewees, and A. Dahou, Eds. Springer International Publishing, 2020, pp. 1–15. DOI: https://doi.org/10.1007/978-3-030-34614-0_1

Z. H. Ali, A. K. Hussein, H. K. Abass, and E. Fadel, "Extractive multi document summarization using harmony search algorithm," TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 19, no. 1, Feb. 2021, Art. no. 89. DOI: https://doi.org/10.12928/telkomnika.v19i1.15766

M. S. Bewoor and S. H. Patil, "Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms," Engineering, Technology & Applied Science Research, vol. 8, no. 1, pp. 2562–2567, Feb. 2018. DOI: https://doi.org/10.48084/etasr.1775

S. R. Basha, J. K. Rani, and J. J. C. P. Yadav, "A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy," Engineering, Technology & Applied Science Research, vol. 9, no. 6, pp. 5001–5005, Dec. 2019. DOI: https://doi.org/10.48084/etasr.3173

N. Dutta, "Topic Modelling With LDA -A Hands-on Introduction," Analytics Vidhya, Jul. 08, 2021. https://www.analyticsvidhya.com/blog/2021/07/topic-modelling-with-lda-a-hands-on-introduction/.

F. Tan, P. Yan, and X. Guan, "Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning," in Neural Information Processing, vol. 10637, D. Liu, S. Xie, Y. Li, D. Zhao, and E. S. M. El-Alfy, Eds. Springer International Publishing, 2017, pp. 475–483. DOI: https://doi.org/10.1007/978-3-319-70093-9_50

A. Mahmud, "Query-Based Summarization using Reinforcement Learning and Transformer Model," M.S. thesis, University of Lethbridge, Canada, 2020.

Y. Mao, Y. Qu, Y. Xie, X. Ren, and J. Han, "Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning." arXiv, Sep. 30, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.136

A. Srikanth, A. S. Umasankar, S. Thanu, and S. J. Nirmala, "Extractive Text Summarization using Dynamic Clustering and Co-Reference on BERT," in 2020 5th International Conference on Computing, Communication and Security (ICCCS), Patna, India, Oct. 2020, pp. 1–5. DOI: https://doi.org/10.1109/ICCCS49678.2020.9277220

N. Rahman and B. Borah, "Improvement of query-based text summarization using word sense disambiguation," Complex & Intelligent Systems, vol. 6, no. 1, pp. 75–85, Apr. 2020. DOI: https://doi.org/10.1007/s40747-019-0115-2

F. Bayatmakou, A. Mohebi, and A. Ahmadi, "An interactive query-based approach for summarizing scientific documents," Information Discovery and Delivery, vol. 50, no. 2, pp. 176–191, Apr. 2022. DOI: https://doi.org/10.1108/IDD-10-2020-0124

S. Lamsiyah, A. El Mahdaouy, B. Espinasse, and S. El Alaoui Ouatik, "An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings," Expert Systems with Applications, vol. 167, Apr. 2021, Art. no. 114152. DOI: https://doi.org/10.1016/j.eswa.2020.114152

O. Shapira, R. Pasunuru, M. Bansal, I. Dagan, and Y. Amsterdamer, "Interactive Query-Assisted Summarization via Deep Reinforcement Learning," in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA Apr. 2022, pp. 2551–2568. DOI: https://doi.org/10.18653/v1/2022.naacl-main.184

R. C. Belwal, S. Rai, and A. Gupta, "Extractive text summarization using clustering-based topic modeling," Soft Computing, vol. 27, no. 7, pp. 3965–3982, Apr. 2023. DOI: https://doi.org/10.1007/s00500-022-07534-6

C. Zhou et al., "A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT." arXiv, May 01, 2023. DOI: https://doi.org/10.1007/s13042-024-02443-6

M. Ramezani, M. S. Shahryari, A. R. Feizi-Derakhshi, and M. R. Feizi-Derakhshi, "Unsupervised Broadcast News Summarization; a Comparative Study on Maximal Marginal Relevance (MMR) and Latent Semantic Analysis (LSA)," in 2023 28th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, Jan. 2023, pp. 1–7. DOI: https://doi.org/10.1109/CSICC58665.2023.10105403

R. Padaki, Z. Dai, and J. Callan, "Rethinking Query Expansion for BERT Reranking," in Advances in Information Retrieval, vol. 12036, J. M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. J. Silva, and F. Martins, Eds. Springer International Publishing, 2020, pp. 297–304. DOI: https://doi.org/10.1007/978-3-030-45442-5_37

Z. Zheng, K. Hui, B. He, X. Han, L. Sun, and A. Yates, "BERT-QE: Contextualized Query Expansion for Document Re-ranking." arXiv, Nov. 03, 2020. DOI: https://doi.org/10.18653/v1/2020.findings-emnlp.424

V. H. Nguyen, S. T. Mai, and M. T. Nguyen, "Learning to summarize multi-documents with local and global information," Progress in Artificial Intelligence, vol. 12, no. 3, pp. 275–286, Sep. 2023. DOI: https://doi.org/10.1007/s13748-023-00302-z

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, Mar. 2019, pp. 4171–4186. DOI: https://doi.org/10.18653/v1/N19-1423

W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, "Automatic text summarization: A comprehensive survey," Expert Systems with Applications, vol. 165, Mar. 2021, Art. no. 113679. DOI: https://doi.org/10.1016/j.eswa.2020.113679

M. Akter, N. Bansal, and S. K. Karmaker, "Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE?," in Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, Feb. 2022, pp. 1547–1560. DOI: https://doi.org/10.18653/v1/2022.findings-acl.122

P. Gupta, "Evaluating the BERTScore of synthetic text and its sentiment analysis." In Review, Aug. 16, 2023. DOI: https://doi.org/10.21203/rs.3.rs-3248507/v1

Y. Liu, Z. Wang, and R. Yuan, "QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, pp. 18725–18732, Mar. 2024. DOI: https://doi.org/10.1609/aaai.v38i17.29836

K. Yao, L. Zhang, T. Luo, and Y. Wu, "Deep reinforcement learning for extractive document summarization," Neurocomputing, vol. 284, pp. 52–62, Apr. 2018. DOI: https://doi.org/10.1016/j.neucom.2018.01.020