HCLPars: Α New Hierarchical Clustering Log Parsing Method
Received: 5 May 2023 | Revised: 17 May 2023 | Accepted: 18 May 2023 | Online: 9 August 2023
Corresponding author: Arwa Bin Lashram
Abstract
Event logs are essential in many software systems’ maintenance and development, as detailed runtime information is recorded in them, allowing support engineers and developers to monitor systems, understand behaviors, and identify errors. With the increasing size and complexity of modern software systems, parsing their logs by the traditional (manual) method is cumbersome and useless. For this reason, recent studies have focused on automatically parsing log files. This paper presents the Hierarchical Clustering Log Parsing method, called HCLPars, for automatically parsing log files, consisting of 3 steps: parameter removal according to acquired knowledge in order to avoid errors, grouping similar raw log messages, and getting the set of keys that make up the log. Experiments were run on 16 real system log data, and the performance of the proposed algorithm was compared with the one of other 14 algorithms. It was shown that the HCLPars outperformed the other log parsers in terms of accuracy, efficiency, and robustness.
Keywords:
event log mining, system logs, log parsing, log analysis, log management, execution trace, HCLPars, agentDownloads
References
J. Svacina et al., "On Vulnerability and Security Log analysis: A Systematic Literature Review on Recent Trends," in International Conference on Research in Adaptive and Convergent Systems, Gwangju, Korea, Oct. 2020, pp. 175–180.
J. Sun, B. Liu, and Y. Hong, "LogBug: Generating Adversarial System Logs in Real Time," in 29th ACM International Conference on Information & Knowledge Management, New York, NY, USA, Oct. 2020, pp. 2229–2232.
D. Yuan, H. Mai, W. Xiong, L. Tan, Y. Zhou, and S. Pasupathy, "SherLog: error diagnosis by connecting clues from run-time logs," in Fifteenth International Conference on Architectural support for programming languages and operating systems, Pittsburgh, PA, USA, Mar. 2010, pp. 143–154.
X. Xu, L. Zhu, I. Weber, L. Bass, and D. Sun, "POD-Diagnosis: Error Diagnosis of Sporadic Operations on Cloud Applications," in 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, USA, Jun. 2014, pp. 252–263.
A. Oliner, A. Ganapathi, and W. Xu, "Advances and challenges in log analysis," Communications of the ACM, vol. 55, no. 2, pp. 55–61, Oct. 2012.
X. Xie, Z. Wang, X. Xiao, Y. Lu, S. Huang, and T. Li, "A Confidence-Guided Evaluation for Log Parsers Inner Quality," Mobile Networks and Applications, vol. 26, no. 4, pp. 1638–1649, Aug. 2021.
H. Dai, "logram: efficient log paring using n-gram model," M.S. thesis, Concordia University, Montreal, QC, Canada, 2020.
D. Aroussi, B. Aour, and A. S. Bouaziz, "A Comparative Study of 316L Stainless Steel and a Titanium Alloy in an Aggressive Biological Medium," Engineering, Technology & Applied Science Research, vol. 9, no. 6, pp. 5093–5098, Dec. 2019.
M. V. Japitana and M. E. C. Burce, "A Satellite-based Remote Sensing Technique for Surface Water Quality Estimation," Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 3965–3970, Apr. 2019.
J. Zhu et al., "Tools and Benchmarks for Automated Log Parsing," in 41st International Conference on Software Engineering: Software Engineering in Practice, Montreal, QC, Canada, Dec. 2019, pp. 121–130.
C. Gormley and Z. Tong, Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. Sebastopol, CA, USA: O’Reilly Media, 2015.
M. Nagappan, K. Wu, and M. A. Vouk, "Efficiently Extracting Operational Profiles from Execution Logs Using Suffix Arrays," in 20th International Symposium on Software Reliability Engineering, Mysuru, India, Nov. 2009, pp. 41–50.
W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, "Detecting large-scale system problems by mining console logs," in 22nd Symposium on Operating Systems Principles, Big Sky, MT, USA, Oct. 2009, pp. 117–132.
R. Vaarandi, "A data clustering algorithm for mining patterns from event logs," in 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764), Kansas City, MO, USA, Oct. 2003, pp. 119–126.
A. A. O. Makanju, A. N. Zincir-Heywood, and E. E. Milios, "Clustering event logs using iterative partitioning," in 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, Jul. 2009, pp. 1255–1264.
A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, "A Lightweight Algorithm for Message Type Extraction in System Application Logs," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 11, pp. 1921–1936, Aug. 2012.
P. He, J. Zhu, S. He, J. Li, and M. R. Lyu, "Towards Automated Log Parsing for Large-Scale Log Data Analysis," IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 6, pp. 931–944, Aug. 2018.
Y. Ohno, S. Morishima, and H. Matsutani, "Accelerating Spark RDD Operations with Local and Remote GPU Devices," in 22nd International Conference on Parallel and Distributed Systems, Wuhan, China, Dec. 2016, pp. 791–799.
M. Cinque, D. Cotroneo, and A. Pecchia, "Event Logs for the Analysis of Software Failures: A Rule-Based Approach," IEEE Transactions on Software Engineering, vol. 39, no. 6, pp. 806–821, Jun. 2013.
M. Du, F. Li, G. Zheng, and V. Srikumar, "DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning," in ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, Nov. 2017, pp. 1285–1298.
M. Zaharia et al., "Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing," in 9th USENIX conference on Networked Systems Design and Implementation, Berkeley, CA, United States, Apr. 2012, pp. 1–14.
T.-F. Yen et al., "Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks," in 29th Annual Computer Security Applications Conference, New Orleans, LA, USA, Dec. 2013, pp. 199–208.
Z. M. Jiang, A. E. Hassan, P. Flora, and G. Hamann, "Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper)," in The Eighth International Conference on Quality Software, Oxford, UK, Aug. 2008, pp. 181–186.
J. C. Gower and G. J. S. Ross, "Minimum Spanning Trees and Single Linkage Cluster Analysis," Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 18, no. 1, pp. 54–64, 1969.
E. F. Krause, "Taxicab Geometry," The Mathematics Teacher, vol. 66, no. 8, pp. 695–706, Dec. 1973.
"Apache SparkTM - Unified Engine for large-scale data analytics," Apache Spark. https://spark.apache.org/.
M. A. Biberci and M. B. Celik, "Dynamic Modeling and Simulation of a PEM Fuel Cell (PEMFC) during an Automotive Vehicle’s Driving Cycle," Engineering, Technology & Applied Science Research, vol. 10, no. 3, pp. 5796–5802, Jun. 2020.
S. He, J. Zhu, P. He, and M. R. Lyu, "Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics." arXiv, Aug. 14, 2020.
T.-K. Hu, T. Chen, H. Wang, and Z. Wang, "Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference." arXiv, Feb. 24, 2020.
W. Xu, "System Problem Detection by Mining Console Logs," Ph.D. dissertation, University of California, Berkeley, CA, USA, 2010.
Downloads
How to Cite
License
Copyright (c) 2023 Arwa Bin Lashram, Lobna Hsairi, Haneen Al Ahmadi
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.