Supporting Scholarly Search by Query Expansion and Citation Analysis

  • S. Khalid School of Computer Science and Communication Engineering, Jiangsu University, China | National University of Sciences and Technology (NUST), Islamabad, Pakistan
  • S. Wu School of Computer Science and Communication Engineering, Jiangsu University, China
Keywords: academic search, query expansion, citation analysis, pseudo relevance feedback, user relevance feedback

Abstract

Published scholarly articles have increased exponentially in recent years. This growth has brought challenges for academic researchers in locating the most relevant papers in their fields of interest. The reasons for this vary. There is the fundamental problem of synonymy and polysemy, the query terms might be too short, thus making it difficult to distinguish between papers. Also, a new researcher has limited knowledge and often is not sure about what she is looking for until the results are displayed. These issues obstruct scholarly retrieval systems in locating highly relevant publications for a given search query. Researchers seek to tackle these issues. However, the user's intent cannot be addressed entirely by introducing a direct information retrieval technique. In this paper, a novel approach is proposed, which combines query expansion and citation analysis for supporting the scholarly search. It is a two-stage academic search process. Upon receiving the initial search query, in the first stage, the retrieval system provides a ranked list of results. In the second stage, the highest-scoring Term Frequency–Inverse Document Frequency (TF-IDF) terms are obtained from a few top-ranked papers for query expansion behind the scene. In both stages, citation analysis is used in further refining the quality of the academic search. The originality of the approach lies in the combined exploitation of both query expansion by pseudo relevance feedback and citation networks analysis that may bring the most relevant papers to the top of the search results list. The approach is evaluated on the ACL dataset. The experimental results reveal that the technique is effective and robust for locating relevant papers regarding normalized Discounted Cumulative Gain (nDCG), precision, and recall.

Downloads

Download data is not yet available.

References

J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: A literature survey,” International Journal on Digital Libraries, vol. 17, pp. 305–338, 2016, doi: 10.1007/s00799-015-0156-0.

S. Khalid, S. Khusro, I. Ullah, and G. Dawson-Amoah, “On The Current State of Scholarly Retrieval Systems,” Engineering, Technology & Applied Science Research, vol. 9, no. 1, pp. 3863–3870, Feb. 2019.

C. Carpineto and G. Romano, “A Survey of Automatic Query Expansion in Information Retrieval,” Acm Computing Surveys, vol. 44, pp. 1–50, Jan. 2012, doi: 10.1145/2071389.2071390.

P. Sharma and N. Joshi, “Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet,” Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 3985–3989, Apr. 2019.

A. Spink, D. Wolfram, J. Jansen, and T. Saracevic, “Searching the Web: The Public and Their Queries,” Journal of the American Society for Information Science and Technology, vol. 52, pp. 226–234, Feb. 2001, doi: 10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.0.CO;2-R.

J. Clement, “Average number of search terms for online search queries in the United States as of January 2020,” Statista. https://www.statista.com/statistics/269740/number-of-search-terms-in-internet-research-in-the-us/ (accessed Jul. 21, 2020).

S. Khalid, S. Wu, A. Alam, and I. Ullah, “Real-time feedback query expansion technique for supporting scholarly search using citation network analysis,” Journal of Information Science, Jul. 2019, doi: 10.1177/0165551519863346.

J. L. Ortega, Academic Search Engines: A Quantitative Outlook. Oxford, UK: Chandos, 2014.

E. Amolochitis, Algorithms and Applications for Academic Search, Recommendation and Quantitative Association Rule Mining. Denmark: River, 2018.

D. Mirylenka, Towards structured representation of academic search results. Italy: University of Trento, 2015.

E. Amolochitis, “Algorithms for Academic Search and Recommendation Systems,” Ph.D. dissertation, Aalborg University, Denmark, 2014.

M. Kluck and M. Stempfhuber, “Domain-Specific Track CLEF 2005: Overview of Results and Approaches, Remarks on the Assessment Analysis,” in Workshop of the Cross-Language Evaluation Forum for European Languages, vol. 4022, 2005, pp. 212–221.

M. Kluck, “The Domain-Specific Track in CLEF 2004: Overview of the Results and Remarks on the Assessment Process,” in Workshop of the Cross-Language Evaluation Forum for European Languages, vol. 3491, 2004, pp. 260–270.

B. Golshan, T. Lappas, and E. Terzi, “Sofia search: a tool for automating related-work search,” presented at the ACM SIGMOD International Conference on Management of Data, Scottsdale, Arizona, USA, May 2012, pp. 621–624.

T. Chakraborty and R. Narayanam, “All Fingers are not Equal: Intensity of References in Scientific Articles,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, Nov. 2016, pp. 1348–1358, doi: 10.18653/v1/D16-1142.

R.-L. Liu, “Retrieval of Scholarly Articles with Similar Core Contents,” International Journal of Knowledge Content Development & Technology, vol. 7, no. 3, pp. 5–27, 2017.

N. Fiorini et al., “Best Match: New relevance search for PubMed,” PLOS Biology, vol. 16, no. 8, 2018, doi: 10.1371/journal.pbio.2005343, Art no. e2005343.

O. A. Abass, O. Folorunso, and B. O. Samuel, “Automatic Query Expansion for Information Retrieval: A Survey and Problem Definition,” American Journal of Computer Science and Information Engineering, vol. 4, no. 3, pp. 24–30, 2017.

Y. Lu, H. Fang, and C. Zhai, “An empirical study of gene synonym query expansion in biomedical information retrieval,” Information Retrieval, vol. 12, no. 1, pp. 51–68, Feb. 2009, doi: 10.1007/s10791-008-9075-7.

L. Milliken, S. Motomarry, and A. Kulkarni, “ARtPM: Article Retrieval for Precision Medicine,” Journal of Biomedical Informatics, vol. 95, Jun. 2019, doi: 10.1016/j.jbi.2019.103224, Art no. 103224.

M. Dunaiski, G. J. Greene, and B. Fischer, “Exploratory search of academic publication and citation data using interactive tag cloud visualizations,” Scientometrics, vol. 110, no. 3, pp. 1539–1571, Mar. 2017, doi: 10.1007/s11192-016-2236-3.

M. Hagen, A. Beyer, T. Gollub, K. Komlossy, and B. Stein, “Supporting Scholarly Search with Keyqueries,” in European Conference on Information Retrieval, vol. 9626, 2016, pp. 507–520.

S. Liu, C. Chen, K. Ding, B. Wang, K. Xu, and Y. Lin, “Literature retrieval based on citation context,” Scientometrics, vol. 101, no. 2, pp. 1293–1307, Nov. 2014, doi: 10.1007/s11192-014-1233-7.

C. Xiong, R. Power, and J. Callan, “Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding,” presented at the 26th International Conference on World Wide Web, Perth, Australia, Apr. 2017, pp. 1271–1279.

A. Di Iorio, R. Giannella, F. Poggi, S. Peroni, and F. Vitali, “Exploring Scholarly Papers Through Citations,” presented at the ACM Symposium on Document Engineering, New York,United States, Sep. 2015, pp. 107–116.

J. Sankhavara and P. Majumder, “Biomedical Information Retrieval,” in Fire (Working Notes), 2017.

J. Xu and W. B. Croft, “Quary Expansion Using Local and Global Document Analysis,” ACM SIGIR Forum, vol. 51, no. 2, pp. 168–175, Aug. 2017, doi: 10.1145/3130348.3130364.

B. He, “Rocchio’s Formula,” in Encyclopedia of Database Systems, L. Liu and M. T. Ozsu, Eds. Boston, Massachusetts: Springer, 2009, pp. 2447–2447.

J. Sankhavara, “Biomedical Document Retrieval for Clinical Decision Support System,” presented at the ACL Student Research Workshop, Melbourne, Australia, Jul. 2018, pp. 1–7, doi: 10.18653/v1/P18-3012.

C. Lucchese, F. M. Nardini, R. Perego, R. Trani, and R. Venturini, “Efficient and Effective Query Expansion for Web Search,” presented at the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, Oct. 2018, pp. 1551–1554.

“relevancy-feedback-plugin,” GitHub. https://github.com/topics/relevancy-feedback-plugin (accessed Jul. 21, 2020).

Z. A. Shaikh, “Keyword Detection Techniques: A Comprehensive Study,” Engineering, Technology & Applied Science Research, vol. 8, no. 1, pp. 2590–2594, Feb. 2018.

J. Rocchio, “Relevance feedback in information retrieval,” in The Smart Retrieval System-Experiments in Automatic Document Processing, Prentice Hall, 1971, pp. 313–323.

T. Grainger and T. Potter, Solr in Action. Shelter Island, New York: Manning, 2014.

D. R. Radev, P. Muthukrishnan, V. Qazvinian, and A. Abu-Jbara, “The ACL anthology network corpus,” Language Resources and Evaluation, vol. 47, no. 4, pp. 919–944, Dec. 2013, doi: 10.1007/s10579-012-9211-2.

A. A. Jbara and D. R. Radev, “The ACL Anthology Network Corpus as a Resource for NLP-based Bibliometrics,” 2013.

G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, “The vocabulary problem in human-system communication,” Communications of the ACM, vol. 30, no. 11, pp. 964–971, Nov. 1987, doi: 10.1145/32206.32212.

Metrics

Abstract Views: 80
PDF Downloads: 51

Metrics Information
Bookmark and Share