HCLPars: Α New Hierarchical Clustering Log Parsing Method


  • Arwa Bin Lashram University of Jeddah, Saudi Arabia
  • Lobna Hsairi University of Jeddah, Saudi Arabia
  • Haneen Al Ahmadi University of Jeddah, Saudi Arabia
Volume: 13 | Issue: 4 | Pages: 11130-11138 | August 2023 | https://doi.org/10.48084/etasr.6013


Event logs are essential in many software systems’ maintenance and development, as detailed runtime information is recorded in them, allowing support engineers and developers to monitor systems, understand behaviors, and identify errors. With the increasing size and complexity of modern software systems, parsing their logs by the traditional (manual) method is cumbersome and useless. For this reason, recent studies have focused on automatically parsing log files. This paper presents the Hierarchical Clustering Log Parsing method, called HCLPars, for automatically parsing log files, consisting of 3 steps: parameter removal according to acquired knowledge in order to avoid errors, grouping similar raw log messages, and getting the set of keys that make up the log. Experiments were run on 16 real system log data, and the performance of the proposed algorithm was compared with the one of other 14 algorithms. It was shown that the HCLPars outperformed the other log parsers in terms of accuracy, efficiency, and robustness.


event log mining, system logs, log parsing, log analysis, log management, execution trace, HCLPars, agent


Download data is not yet available.


J. Svacina et al., "On Vulnerability and Security Log analysis: A Systematic Literature Review on Recent Trends," in International Conference on Research in Adaptive and Convergent Systems, Gwangju, Korea, Oct. 2020, pp. 175–180.

J. Sun, B. Liu, and Y. Hong, "LogBug: Generating Adversarial System Logs in Real Time," in 29th ACM International Conference on Information & Knowledge Management, New York, NY, USA, Oct. 2020, pp. 2229–2232.

D. Yuan, H. Mai, W. Xiong, L. Tan, Y. Zhou, and S. Pasupathy, "SherLog: error diagnosis by connecting clues from run-time logs," in Fifteenth International Conference on Architectural support for programming languages and operating systems, Pittsburgh, PA, USA, Mar. 2010, pp. 143–154.

X. Xu, L. Zhu, I. Weber, L. Bass, and D. Sun, "POD-Diagnosis: Error Diagnosis of Sporadic Operations on Cloud Applications," in 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, USA, Jun. 2014, pp. 252–263.

A. Oliner, A. Ganapathi, and W. Xu, "Advances and challenges in log analysis," Communications of the ACM, vol. 55, no. 2, pp. 55–61, Oct. 2012.

X. Xie, Z. Wang, X. Xiao, Y. Lu, S. Huang, and T. Li, "A Confidence-Guided Evaluation for Log Parsers Inner Quality," Mobile Networks and Applications, vol. 26, no. 4, pp. 1638–1649, Aug. 2021.

H. Dai, "logram: efficient log paring using n-gram model," M.S. thesis, Concordia University, Montreal, QC, Canada, 2020.

D. Aroussi, B. Aour, and A. S. Bouaziz, "A Comparative Study of 316L Stainless Steel and a Titanium Alloy in an Aggressive Biological Medium," Engineering, Technology & Applied Science Research, vol. 9, no. 6, pp. 5093–5098, Dec. 2019.

M. V. Japitana and M. E. C. Burce, "A Satellite-based Remote Sensing Technique for Surface Water Quality Estimation," Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 3965–3970, Apr. 2019.

J. Zhu et al., "Tools and Benchmarks for Automated Log Parsing," in 41st International Conference on Software Engineering: Software Engineering in Practice, Montreal, QC, Canada, Dec. 2019, pp. 121–130.

C. Gormley and Z. Tong, Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. Sebastopol, CA, USA: O’Reilly Media, 2015.

M. Nagappan, K. Wu, and M. A. Vouk, "Efficiently Extracting Operational Profiles from Execution Logs Using Suffix Arrays," in 20th International Symposium on Software Reliability Engineering, Mysuru, India, Nov. 2009, pp. 41–50.

W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, "Detecting large-scale system problems by mining console logs," in 22nd Symposium on Operating Systems Principles, Big Sky, MT, USA, Oct. 2009, pp. 117–132.

R. Vaarandi, "A data clustering algorithm for mining patterns from event logs," in 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764), Kansas City, MO, USA, Oct. 2003, pp. 119–126.

A. A. O. Makanju, A. N. Zincir-Heywood, and E. E. Milios, "Clustering event logs using iterative partitioning," in 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, Jul. 2009, pp. 1255–1264.

A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, "A Lightweight Algorithm for Message Type Extraction in System Application Logs," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 11, pp. 1921–1936, Aug. 2012.

P. He, J. Zhu, S. He, J. Li, and M. R. Lyu, "Towards Automated Log Parsing for Large-Scale Log Data Analysis," IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 6, pp. 931–944, Aug. 2018.

Y. Ohno, S. Morishima, and H. Matsutani, "Accelerating Spark RDD Operations with Local and Remote GPU Devices," in 22nd International Conference on Parallel and Distributed Systems, Wuhan, China, Dec. 2016, pp. 791–799.

M. Cinque, D. Cotroneo, and A. Pecchia, "Event Logs for the Analysis of Software Failures: A Rule-Based Approach," IEEE Transactions on Software Engineering, vol. 39, no. 6, pp. 806–821, Jun. 2013.

M. Du, F. Li, G. Zheng, and V. Srikumar, "DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning," in ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, Nov. 2017, pp. 1285–1298.

M. Zaharia et al., "Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing," in 9th USENIX conference on Networked Systems Design and Implementation, Berkeley, CA, United States, Apr. 2012, pp. 1–14.

T.-F. Yen et al., "Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks," in 29th Annual Computer Security Applications Conference, New Orleans, LA, USA, Dec. 2013, pp. 199–208.

Z. M. Jiang, A. E. Hassan, P. Flora, and G. Hamann, "Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper)," in The Eighth International Conference on Quality Software, Oxford, UK, Aug. 2008, pp. 181–186.

J. C. Gower and G. J. S. Ross, "Minimum Spanning Trees and Single Linkage Cluster Analysis," Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 18, no. 1, pp. 54–64, 1969.

E. F. Krause, "Taxicab Geometry," The Mathematics Teacher, vol. 66, no. 8, pp. 695–706, Dec. 1973.

"Apache SparkTM - Unified Engine for large-scale data analytics," Apache Spark. https://spark.apache.org/.

M. A. Biberci and M. B. Celik, "Dynamic Modeling and Simulation of a PEM Fuel Cell (PEMFC) during an Automotive Vehicle’s Driving Cycle," Engineering, Technology & Applied Science Research, vol. 10, no. 3, pp. 5796–5802, Jun. 2020.

S. He, J. Zhu, P. He, and M. R. Lyu, "Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics." arXiv, Aug. 14, 2020.

T.-K. Hu, T. Chen, H. Wang, and Z. Wang, "Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference." arXiv, Feb. 24, 2020.

W. Xu, "System Problem Detection by Mining Console Logs," Ph.D. dissertation, University of California, Berkeley, CA, USA, 2010.


How to Cite

A. Bin Lashram, L. Hsairi, and H. Al Ahmadi, “HCLPars: Α New Hierarchical Clustering Log Parsing Method”, Eng. Technol. Appl. Sci. Res., vol. 13, no. 4, pp. 11130–11138, Aug. 2023.


Abstract Views: 218
PDF Downloads: 177

Metrics Information