Keyword Detection Techniques: A Comprehensive Study
Abstract
Automatic identification of influential segments from a large amount of data is an important part of topic detection and tracking (TDT). This can be done using keyword identification via collocation techniques, word co-occurrence networks, topic modeling and other machine learning techniques. This paper reviews existing traditional keyword extraction techniques and analyzes them to make useful insights and to give future directions for better automatic, unsupervised and language independent research. The paper reviews extant literature on existing traditional TDT approaches for automatic identification of influential segments from a large amount of data in keyword detection task. The current keyword detection techniques used by researchers have been discussed. Inferences have been drawn from current keyword detection techniques used by researchers, their advantages and disadvantages over the previous studies and the analysis results have been provided in tabular form. Although keyword detection has been widely explored, there is still a large scope and need for identifying topics from the uncertain user-generated data.
Keywords:
keyword detection, information retrieval, topic detection, machine learning, comprehensive studyDownloads
References
E. Landhuis, “Neuroscience: Big brain, big data”, Nature, Vol. 541, No. 7638, pp. 559-561, 2017 DOI: https://doi.org/10.1038/541559a
G. Ercan, I. Cicekli, “Using lexical chains for keyword extraction”, Information Processing & Management, Vol. 43, No. 6, pp. 1705-1714, 2007 DOI: https://doi.org/10.1016/j.ipm.2007.01.015
R. S. Ramya, K. R. Venugopal, S. S. Iyengar, L. M. Patnaik, “Feature extraction and duplicate detection for text mining: A survey”, Global Journal of Computer Science and Technology, Vol. 16, No. 5, pp. 1-20, 2016
J. Allan, J. G. Carbonell, G. Doddington, J. Yamron, Y. Yang, Topic detection and tracking pilot study final report, DARPA Broadcast News Transcription and Understanding Workshop, 1998
P. Eckersley, G. F. Egan, S. Amari, F. Beltrame, R. Bennett, J. G. Bjaalie,T. Dalkara, E. De Schutter, C. Gonzalez, S. Grillner, A. Herz, K. P. Hoffmann, I. P. Jaaskelainen, S. H. Koslow, S.-Y. Lee, L. Matthiessen, P. L. Miller, F. M. da Silva, M. Novak,V. Ravindranath, R. Ritz, U. Ruotsalainen, S. Subramaniam, A. W.Toga, S. Usui, J. van Pelt, P. Verschure, D. Willshaw, A. Wrobel, Tang Yiyuan, “Neuroscience data and tool sharing”, Neuroinformatics, Vol. 1, No. 2, pp. 149-165, 2003 DOI: https://doi.org/10.1007/s12021-003-0002-1
D. Kuttiyapillai, R. Rajeswari, “Insight into information extraction method using natural language processing technique”, International Journal of Computer Science and Mobile Applications, Vol. 1, No. 5, pp. 97-109, 2013
S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic keyword extraction from individual documents, Text Mining: Applications and Theory, John Wiley & Sons, 2010 DOI: https://doi.org/10.1002/9780470689646.ch1
J. Wu, S. R. Choudhury, A. Chiatti, C. Liang, C. L. Giles, “HESDK: A hybrid approach to extracting scientific domain knowledge entities”, In ACM/IEEE Joint Conference on Digital Libraries, pp. 1-4, 2017 DOI: https://doi.org/10.1109/JCDL.2017.7991580
D. B. Bracewell, F. Ren, S. Kuriowa, “Multilingual single document keyword extraction for information retrieval”, IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 517-522, 2005
D. Kuttiyapillai, R. Rajeswari, “Extended text feature classification with information extraction”, International Journal of Applied Engineering Research, Vol. 10, No. 29, pp. 22671-22676, 2015
S. C. Watkins, The young and the digital: What the migration to social-network sites, games, and anytime, anywhere media means for our future, Beacon Press, 2009
I. M. Soboroff, D. P. McCullough, J. Lin, C. Macdonald, I. Ounis, R. McCreadie, “Evaluating real-time search over tweets”, International Conference on Weblogs and Social Media, pp. 943-961, 2012
H. L. Yang, A. F. Chao, “Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations”, Information Systems Frontiers, Vol. 17, No. 6, pp. 1335-1352, 2015 DOI: https://doi.org/10.1007/s10796-014-9498-1
J. Yang, J. Leskovec, “Patterns of temporal variation in online media”, 4rth ACM international conference on Web search and data mining, pp. 177-186, 2011 DOI: https://doi.org/10.1145/1935826.1935863
D. M. Blei, A. Y. Ng, M. I. Jordan, “Latent dirichlet allocation”, Journal of Machine Learning Research, Vol. 3, No. Jan, pp. 993-1022, 2003
D. M. Blei, J. D. Lafferty, “Dynamic topic models”, 23rd international conference on Machine learning, pp. 113-120, 2006 DOI: https://doi.org/10.1145/1143844.1143859
M. Habibi, A. Popescu-Belis, “Keyword extraction and clustering for document recommendation in conversations”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, No. 4, pp. 746-759, 2015 DOI: https://doi.org/10.1109/TASLP.2015.2405482
S. Beliga, Keyword extraction: A review of methods and approaches, University of Rijeka, Department of Informatics, 2014
S. Usui, P. Palmes, K. Nagata, T. Taniguchi, N. Ueda, “Keyword extraction, ranking, and organization for the neuroinformatics platform”, Biosystems, Vol. 88, No. 3, pp. 334-342, 2007 DOI: https://doi.org/10.1016/j.biosystems.2006.08.015
H. Zhao, Q. Zeng, “Micro-blog keyword extraction method based on graph model and semantic space”, Journal of Multimedia, Vol. 8, No. 5, pp. 611-617, 2013 DOI: https://doi.org/10.4304/jmm.8.5.611-617
H. Hromic, N. Prangnawarat, I. Hulpus, M. Karnstedt, C. Hayes, “Graph-based methods for clustering topics of interest in Twitter”, International Conference on Web Engineering, pp. 701-704, Springer, 2015 DOI: https://doi.org/10.1007/978-3-319-19890-3_61
L. Marujo, W. Ling, I. Trancoso, C. Dyer, A. W. Black, A. Gershman, D. M. de Matos, J. P. Neto, J. G. Carbonell, “Automatic keyword extraction on Twitter”, ACL (2), pp. 637-643, 2015 DOI: https://doi.org/10.3115/v1/P15-2105
D. Kim, D. Kim, S. Rho, E. Hwang, “Detecting trend and bursty keywords using characteristics of Twitter stream data”, International Journal of Smart Home, Vol. 7, No. 1, pp. 209-220, 2013
P. Torres-Tramon, H. Hromic, B. R. Heravi, “Topic detection in Twitter using topology data analysis”, International Conference on Web Engineering, pp. 186-197, 2015 DOI: https://doi.org/10.1007/978-3-319-24800-4_16
S. Beliga, A. Mestrovic, S. Martincic-Ipsic, “An overview of graph-based keyword extraction methods and approaches”, Journal of Information and Organizational Sciences, Vol. 39, No. 1, pp. 1-20, 2015
W. D. Abilhoa, L. N. De Castro, “A keyword extraction method from Twitter messages represented as graphs”, Applied Mathematics and Computation, Vol. 240, pp. 308-325, 2014 DOI: https://doi.org/10.1016/j.amc.2014.04.090
A. Benny, M. Philip, “Keyword based tweet extraction and detection of related topics”, Procedia Computer Science, Vol. 46, pp. 364-371, 2015 DOI: https://doi.org/10.1016/j.procs.2015.02.032
W. Chung, H. Chen, J. F. Nunamaker Jr, “A visual framework for knowledge discovery on the web: An empirical study of business intelligence exploration”, Journal of Management Information Systems, Vol. 21, No. 4, pp. 57-84, 2005 DOI: https://doi.org/10.1080/07421222.2005.11045821
D. Isa, L. H. Lee, V. P. Kallimani, R. Rajkumar, “Text document preprocessing with the bayes formula for classification using the support vector machine”, IEEE Transactions on Knowledge and Data engineering, Vol. 20, No. 9, pp. 1264-1272, 2008 DOI: https://doi.org/10.1109/TKDE.2008.76
K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, N. A. Smith, “Part-of-speech tagging for Twitter: Annotation, features, and experiments”, 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, Vol. 2, pp. 42-47, 2011 DOI: https://doi.org/10.21236/ADA547371
P. Carpena, P. A. Bernaola-Galvan, C. Carretero-Campos, A. V. Coronado, “Probability distribution of intersymbol distances in random symbolic sequences: Applications to improving detection of keywords in texts and of amino acid clustering in proteins”, Physical Review E, Vol. 94, No. 5, pp. 052302, 2016 DOI: https://doi.org/10.1103/PhysRevE.94.052302
Z. Yang, K. Gao, K. Fan, Y. Lai, “Sensational headline identification by normalized cross entropy-based metric”, The Computer Journal, Vol. 58, No. 4, pp. 644-655, 2014 DOI: https://doi.org/10.1093/comjnl/bxu107
C. Li, A. Sun, J. Weng, Q. He, “Exploiting hybrid contexts for tweet segmentation”, 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 523–532, 2013 DOI: https://doi.org/10.1145/2484028.2484044
J. M. J. Ventura, Automatic extraction of concepts from texts and applications, Diss. Universidade Nova de Lisboa, 2014
B. Hong, D. Zhen, “An extended keyword extraction method”, Physics Procedia, Vol. 24B, pp. 1120-1127, 2012 DOI: https://doi.org/10.1016/j.phpro.2012.02.167
C. W. Wong, R. W. Luk, E. K. Ho, “Discovering ‘title-like2 terms”, Information Processing & Management, Vol. 41, No. 4, pp. 789–800, 2005 DOI: https://doi.org/10.1016/j.ipm.2004.05.007
D. Kuttiyapillai, R. Rajeswari, “A method for extracting task-oriented information from biological text sources”, International Journal of Data Mining and Bioinformatics, Vol. 12, No. 4, pp. 387-399, 2015 DOI: https://doi.org/10.1504/IJDMB.2015.070072
Downloads
How to Cite
License
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.