Supporting Scholarly Search by Query Expansion and Citation Analysis
Published scholarly articles have increased exponentially in recent years. This growth has brought challenges for academic researchers in locating the most relevant papers in their fields of interest. The reasons for this vary. There is the fundamental problem of synonymy and polysemy, the query terms might be too short, thus making it difficult to distinguish between papers. Also, a new researcher has limited knowledge and often is not sure about what she is looking for until the results are displayed. These issues obstruct scholarly retrieval systems in locating highly relevant publications for a given search query. Researchers seek to tackle these issues. However, the user's intent cannot be addressed entirely by introducing a direct information retrieval technique. In this paper, a novel approach is proposed, which combines query expansion and citation analysis for supporting the scholarly search. It is a two-stage academic search process. Upon receiving the initial search query, in the first stage, the retrieval system provides a ranked list of results. In the second stage, the highest-scoring Term Frequency–Inverse Document Frequency (TF-IDF) terms are obtained from a few top-ranked papers for query expansion behind the scene. In both stages, citation analysis is used in further refining the quality of the academic search. The originality of the approach lies in the combined exploitation of both query expansion by pseudo relevance feedback and citation networks analysis that may bring the most relevant papers to the top of the search results list. The approach is evaluated on the ACL dataset. The experimental results reveal that the technique is effective and robust for locating relevant papers regarding normalized Discounted Cumulative Gain (nDCG), precision, and recall.
J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Research-paper recommender systems: A literature survey,” International Journal on Digital Libraries, vol. 17, pp. 305–338, 2016, doi: 10.1007/s00799-015-0156-0.
S. Khalid, S. Khusro, I. Ullah, and G. Dawson-Amoah, “On The Current State of Scholarly Retrieval Systems,” Engineering, Technology & Applied Science Research, vol. 9, no. 1, pp. 3863–3870, Feb. 2019.
C. Carpineto and G. Romano, “A Survey of Automatic Query Expansion in Information Retrieval,” Acm Computing Surveys, vol. 44, pp. 1–50, Jan. 2012, doi: 10.1145/2071389.2071390.
P. Sharma and N. Joshi, “Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet,” Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 3985–3989, Apr. 2019.
A. Spink, D. Wolfram, J. Jansen, and T. Saracevic, “Searching the Web: The Public and Their Queries,” Journal of the American Society for Information Science and Technology, vol. 52, pp. 226–234, Feb. 2001, doi: 10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.0.CO;2-R.
J. Clement, “Average number of search terms for online search queries in the United States as of January 2020,” Statista. https://www.statista.com/statistics/269740/number-of-search-terms-in-internet-research-in-the-us/ (accessed Jul. 21, 2020).
S. Khalid, S. Wu, A. Alam, and I. Ullah, “Real-time feedback query expansion technique for supporting scholarly search using citation network analysis,” Journal of Information Science, Jul. 2019, doi: 10.1177/0165551519863346.
J. L. Ortega, Academic Search Engines: A Quantitative Outlook. Oxford, UK: Chandos, 2014.
E. Amolochitis, Algorithms and Applications for Academic Search, Recommendation and Quantitative Association Rule Mining. Denmark: River, 2018.
D. Mirylenka, Towards structured representation of academic search results. Italy: University of Trento, 2015.
E. Amolochitis, “Algorithms for Academic Search and Recommendation Systems,” Ph.D. dissertation, Aalborg University, Denmark, 2014.
M. Kluck and M. Stempfhuber, “Domain-Specific Track CLEF 2005: Overview of Results and Approaches, Remarks on the Assessment Analysis,” in Workshop of the Cross-Language Evaluation Forum for European Languages, vol. 4022, 2005, pp. 212–221.
M. Kluck, “The Domain-Specific Track in CLEF 2004: Overview of the Results and Remarks on the Assessment Process,” in Workshop of the Cross-Language Evaluation Forum for European Languages, vol. 3491, 2004, pp. 260–270.
B. Golshan, T. Lappas, and E. Terzi, “Sofia search: a tool for automating related-work search,” presented at the ACM SIGMOD International Conference on Management of Data, Scottsdale, Arizona, USA, May 2012, pp. 621–624.
T. Chakraborty and R. Narayanam, “All Fingers are not Equal: Intensity of References in Scientific Articles,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, Nov. 2016, pp. 1348–1358, doi: 10.18653/v1/D16-1142.
R.-L. Liu, “Retrieval of Scholarly Articles with Similar Core Contents,” International Journal of Knowledge Content Development & Technology, vol. 7, no. 3, pp. 5–27, 2017.
N. Fiorini et al., “Best Match: New relevance search for PubMed,” PLOS Biology, vol. 16, no. 8, 2018, doi: 10.1371/journal.pbio.2005343, Art no. e2005343.
O. A. Abass, O. Folorunso, and B. O. Samuel, “Automatic Query Expansion for Information Retrieval: A Survey and Problem Definition,” American Journal of Computer Science and Information Engineering, vol. 4, no. 3, pp. 24–30, 2017.
Y. Lu, H. Fang, and C. Zhai, “An empirical study of gene synonym query expansion in biomedical information retrieval,” Information Retrieval, vol. 12, no. 1, pp. 51–68, Feb. 2009, doi: 10.1007/s10791-008-9075-7.
L. Milliken, S. Motomarry, and A. Kulkarni, “ARtPM: Article Retrieval for Precision Medicine,” Journal of Biomedical Informatics, vol. 95, Jun. 2019, doi: 10.1016/j.jbi.2019.103224, Art no. 103224.
M. Dunaiski, G. J. Greene, and B. Fischer, “Exploratory search of academic publication and citation data using interactive tag cloud visualizations,” Scientometrics, vol. 110, no. 3, pp. 1539–1571, Mar. 2017, doi: 10.1007/s11192-016-2236-3.
M. Hagen, A. Beyer, T. Gollub, K. Komlossy, and B. Stein, “Supporting Scholarly Search with Keyqueries,” in European Conference on Information Retrieval, vol. 9626, 2016, pp. 507–520.
S. Liu, C. Chen, K. Ding, B. Wang, K. Xu, and Y. Lin, “Literature retrieval based on citation context,” Scientometrics, vol. 101, no. 2, pp. 1293–1307, Nov. 2014, doi: 10.1007/s11192-014-1233-7.
C. Xiong, R. Power, and J. Callan, “Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding,” presented at the 26th International Conference on World Wide Web, Perth, Australia, Apr. 2017, pp. 1271–1279.
A. Di Iorio, R. Giannella, F. Poggi, S. Peroni, and F. Vitali, “Exploring Scholarly Papers Through Citations,” presented at the ACM Symposium on Document Engineering, New York,United States, Sep. 2015, pp. 107–116.
J. Sankhavara and P. Majumder, “Biomedical Information Retrieval,” in Fire (Working Notes), 2017.
J. Xu and W. B. Croft, “Quary Expansion Using Local and Global Document Analysis,” ACM SIGIR Forum, vol. 51, no. 2, pp. 168–175, Aug. 2017, doi: 10.1145/3130348.3130364.
B. He, “Rocchio’s Formula,” in Encyclopedia of Database Systems, L. Liu and M. T. Ozsu, Eds. Boston, Massachusetts: Springer, 2009, pp. 2447–2447.
J. Sankhavara, “Biomedical Document Retrieval for Clinical Decision Support System,” presented at the ACL Student Research Workshop, Melbourne, Australia, Jul. 2018, pp. 1–7, doi: 10.18653/v1/P18-3012.
C. Lucchese, F. M. Nardini, R. Perego, R. Trani, and R. Venturini, “Efficient and Effective Query Expansion for Web Search,” presented at the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, Oct. 2018, pp. 1551–1554.
“relevancy-feedback-plugin,” GitHub. https://github.com/topics/relevancy-feedback-plugin (accessed Jul. 21, 2020).
Z. A. Shaikh, “Keyword Detection Techniques: A Comprehensive Study,” Engineering, Technology & Applied Science Research, vol. 8, no. 1, pp. 2590–2594, Feb. 2018.
J. Rocchio, “Relevance feedback in information retrieval,” in The Smart Retrieval System-Experiments in Automatic Document Processing, Prentice Hall, 1971, pp. 313–323.
T. Grainger and T. Potter, Solr in Action. Shelter Island, New York: Manning, 2014.
D. R. Radev, P. Muthukrishnan, V. Qazvinian, and A. Abu-Jbara, “The ACL anthology network corpus,” Language Resources and Evaluation, vol. 47, no. 4, pp. 919–944, Dec. 2013, doi: 10.1007/s10579-012-9211-2.
A. A. Jbara and D. R. Radev, “The ACL Anthology Network Corpus as a Resource for NLP-based Bibliometrics,” 2013.
G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, “The vocabulary problem in human-system communication,” Communications of the ACM, vol. 30, no. 11, pp. 964–971, Nov. 1987, doi: 10.1145/32206.32212.
MetricsAbstract Views: 80
PDF Downloads: 51
Copyright (c) 2020 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
Most read articles by the same author(s)
- S. Khalid, S. Khusro, I. Ullah, G. Dawson-Amoah, On The Current State of Scholarly Retrieval Systems , Engineering, Technology & Applied Science Research: Vol. 9 No. 1 (2019): February, 2019