Keyword Detection Techniques: A Comprehensive Study

Z. A. Shaikh

Abstract


Automatic identification of influential segments from a large amount of data is an important part of topic detection and tracking (TDT). This can be done using keyword identification via collocation techniques, word co-occurrence networks, topic modeling and other machine learning techniques. This paper reviews existing traditional keyword extraction techniques and analyzes them to make useful insights and to give future directions for better automatic, unsupervised and language independent research. The paper reviews extant literature on existing traditional TDT approaches for automatic identification of influential segments from a large amount of data in keyword detection task. The current keyword detection techniques used by researchers have been discussed. Inferences have been drawn from current keyword detection techniques used by researchers, their advantages and disadvantages over the previous studies and the analysis results have been provided in tabular form. Although keyword detection has been widely explored, there is still a large scope and need for identifying topics from the uncertain user-generated data.


Keywords


keyword detection; information retrieval; topic detection; machine learning; comprehensive study

Full Text:

PDF

References


E. Landhuis, “Neuroscience: Big brain, big data”, Nature, Vol. 541, No. 7638, pp. 559-561, 2017

G. Ercan, I. Cicekli, “Using lexical chains for keyword extraction”, Information Processing & Management, Vol. 43, No. 6, pp. 1705-1714, 2007

R. S. Ramya, K. R. Venugopal, S. S. Iyengar, L. M. Patnaik, “Feature extraction and duplicate detection for text mining: A survey”, Global Journal of Computer Science and Technology, Vol. 16, No. 5, pp. 1-20, 2016

J. Allan, J. G. Carbonell, G. Doddington, J. Yamron, Y. Yang, Topic detection and tracking pilot study final report, DARPA Broadcast News Transcription and Understanding Workshop, 1998

P. Eckersley, G. F. Egan, S. Amari, F. Beltrame, R. Bennett, J. G. Bjaalie,T. Dalkara, E. De Schutter, C. Gonzalez, S. Grillner, A. Herz, K. P. Hoffmann, I. P. Jaaskelainen, S. H. Koslow, S.-Y. Lee, L. Matthiessen, P. L. Miller, F. M. da Silva, M. Novak,V. Ravindranath, R. Ritz, U. Ruotsalainen, S. Subramaniam, A. W.Toga, S. Usui, J. van Pelt, P. Verschure, D. Willshaw, A. Wrobel, Tang Yiyuan, “Neuroscience data and tool sharing”, Neuroinformatics, Vol. 1, No. 2, pp. 149-165, 2003

D. Kuttiyapillai, R. Rajeswari, “Insight into information extraction method using natural language processing technique”, International Journal of Computer Science and Mobile Applications, Vol. 1, No. 5, pp. 97-109, 2013

S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic keyword extraction from individual documents, Text Mining: Applications and Theory, John Wiley & Sons, 2010

J. Wu, S. R. Choudhury, A. Chiatti, C. Liang, C. L. Giles, “HESDK: A hybrid approach to extracting scientific domain knowledge entities”, In ACM/IEEE Joint Conference on Digital Libraries, pp. 1-4, 2017

D. B. Bracewell, F. Ren, S. Kuriowa, “Multilingual single document keyword extraction for information retrieval”, IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 517-522, 2005

D. Kuttiyapillai, R. Rajeswari, “Extended text feature classification with information extraction”, International Journal of Applied Engineering Research, Vol. 10, No. 29, pp. 22671-22676, 2015

S. C. Watkins, The young and the digital: What the migration to social-network sites, games, and anytime, anywhere media means for our future, Beacon Press, 2009

I. M. Soboroff, D. P. McCullough, J. Lin, C. Macdonald, I. Ounis, R. McCreadie, “Evaluating real-time search over tweets”, International Conference on Weblogs and Social Media, pp. 943-961, 2012

H. L. Yang, A. F. Chao, “Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations”, Information Systems Frontiers, Vol. 17, No. 6, pp. 1335-1352, 2015

J. Yang, J. Leskovec, “Patterns of temporal variation in online media”, 4rth ACM international conference on Web search and data mining, pp. 177-186, 2011

D. M. Blei, A. Y. Ng, M. I. Jordan, “Latent dirichlet allocation”, Journal of Machine Learning Research, Vol. 3, No. Jan, pp. 993-1022, 2003

D. M. Blei, J. D. Lafferty, “Dynamic topic models”, 23rd international conference on Machine learning, pp. 113-120, 2006

M. Habibi, A. Popescu-Belis, “Keyword extraction and clustering for document recommendation in conversations”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, No. 4, pp. 746-759, 2015

S. Beliga, Keyword extraction: A review of methods and approaches, University of Rijeka, Department of Informatics, 2014

S. Usui, P. Palmes, K. Nagata, T. Taniguchi, N. Ueda, “Keyword extraction, ranking, and organization for the neuroinformatics platform”, Biosystems, Vol. 88, No. 3, pp. 334-342, 2007

H. Zhao, Q. Zeng, “Micro-blog keyword extraction method based on graph model and semantic space”, Journal of Multimedia, Vol. 8, No. 5, pp. 611-617, 2013

H. Hromic, N. Prangnawarat, I. Hulpus, M. Karnstedt, C. Hayes, “Graph-based methods for clustering topics of interest in Twitter”, International Conference on Web Engineering, pp. 701-704, Springer, 2015

L. Marujo, W. Ling, I. Trancoso, C. Dyer, A. W. Black, A. Gershman, D. M. de Matos, J. P. Neto, J. G. Carbonell, “Automatic keyword extraction on Twitter”, ACL (2), pp. 637-643, 2015

D. Kim, D. Kim, S. Rho, E. Hwang, “Detecting trend and bursty keywords using characteristics of Twitter stream data”, International Journal of Smart Home, Vol. 7, No. 1, pp. 209-220, 2013

P. Torres-Tramon, H. Hromic, B. R. Heravi, “Topic detection in Twitter using topology data analysis”, International Conference on Web Engineering, pp. 186-197, 2015

S. Beliga, A. Mestrovic, S. Martincic-Ipsic, “An overview of graph-based keyword extraction methods and approaches”, Journal of Information and Organizational Sciences, Vol. 39, No. 1, pp. 1-20, 2015

W. D. Abilhoa, L. N. De Castro, “A keyword extraction method from Twitter messages represented as graphs”, Applied Mathematics and Computation, Vol. 240, pp. 308-325, 2014

A. Benny, M. Philip, “Keyword based tweet extraction and detection of related topics”, Procedia Computer Science, Vol. 46, pp. 364-371, 2015

W. Chung, H. Chen, J. F. Nunamaker Jr, “A visual framework for knowledge discovery on the web: An empirical study of business intelligence exploration”, Journal of Management Information Systems, Vol. 21, No. 4, pp. 57-84, 2005

D. Isa, L. H. Lee, V. P. Kallimani, R. Rajkumar, “Text document preprocessing with the bayes formula for classification using the support vector machine”, IEEE Transactions on Knowledge and Data engineering, Vol. 20, No. 9, pp. 1264-1272, 2008

K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, N. A. Smith, “Part-of-speech tagging for Twitter: Annotation, features, and experiments”, 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, Vol. 2, pp. 42-47, 2011

P. Carpena, P. A. Bernaola-Galvan, C. Carretero-Campos, A. V. Coronado, “Probability distribution of intersymbol distances in random symbolic sequences: Applications to improving detection of keywords in texts and of amino acid clustering in proteins”, Physical Review E, Vol. 94, No. 5, pp. 052302, 2016

Z. Yang, K. Gao, K. Fan, Y. Lai, “Sensational headline identification by normalized cross entropy-based metric”, The Computer Journal, Vol. 58, No. 4, pp. 644-655, 2014

C. Li, A. Sun, J. Weng, Q. He, “Exploiting hybrid contexts for tweet segmentation”, 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 523–532, 2013

J. M. J. Ventura, Automatic extraction of concepts from texts and applications, Diss. Universidade Nova de Lisboa, 2014

B. Hong, D. Zhen, “An extended keyword extraction method”, Physics Procedia, Vol. 24B, pp. 1120-1127, 2012

C. W. Wong, R. W. Luk, E. K. Ho, “Discovering ‘title-like2 terms”, Information Processing & Management, Vol. 41, No. 4, pp. 789–800, 2005

D. Kuttiyapillai, R. Rajeswari, “A method for extracting task-oriented information from biological text sources”, International Journal of Data Mining and Bioinformatics, Vol. 12, No. 4, pp. 387-399, 2015




eISSN: 1792-8036     pISSN: 2241-4487