Model-Driven Engineering and Machine Learning for Legacy System Modernization

Authors

  • Hamza Abdelmalek GLISI Team, Faculty of Sciences and Techniques of Errachidia, Moulay Ismail University, Morocco
  • Zakaria Babaalla GLISI Team, Faculty of Sciences and Techniques of Errachidia, Moulay Ismail University, Morocco
  • Charaf Ouaddi GLISI Team, Faculty of Sciences and Techniques of Errachidia, Moulay Ismail University, Morocco
  • Lamya Benaddi GLISI Team, Faculty of Sciences and Techniques of Errachidia, Moulay Ismail University, Morocco
  • Abdeslam Jakimi GLISI Team, Faculty of Sciences and Techniques of Errachidia, Moulay Ismail University, Morocco
Volume: 16 | Issue: 1 | Pages: 32285-32291 | February 2026 | https://doi.org/10.48084/etasr.16444

Abstract

Model-driven engineering is significant in modern software development, as it leverages models to guide software design and implementation. However, legacy systems are characterized by outdated architectures and poor documentation, which pose challenges for maintenance. Model-Driven Reverse Engineering (MDRE) addresses these issues by extracting models from legacy systems, enabling better comprehension. This study explores clustering techniques to extract high-level concepts from source code. Clustering reveals system concepts that can be transformed into models by grouping similar code entities. These models provide a foundation for system modernization and further development. Unlike existing MDRE approaches, this study systematically evaluates multiple clustering techniques and preprocessing scenarios to assess their ability to recover high-level system concepts from legacy source code. The results highlight the impact of preprocessing and noise on concept extraction, providing insights for applying clustering within model-driven modernization workflows.

Keywords:

clustering, machine learning, model-driven architecture, model-driven engineering, source code

Downloads

Download data is not yet available.

References

M. W. Godfrey and D. M. German, "The Past, Present, and Future of Software Evolution," in 2008 Frontiers of Software Maintenance, Beijing, China, Sept. 2008, pp. 129–138. DOI: https://doi.org/10.1109/FOSM.2008.4659256

C. Verbruggen and M. Snoeck, "Practitioners’ Experiences with Model-Driven Engineering: A Meta-Review," Software and Systems Modeling, vol. 22, no. 1, pp. 111–129, Feb. 2023. DOI: https://doi.org/10.1007/s10270-022-01020-1

J. Miller and J. Mukerji, MDA Guide Version 1.0.1, Needham, MA, USA: Object Management Group, 2003.

H. A. Siala, K. Lano, and H. Alfraihi, "Model-Driven Approaches for Reverse Engineering—A Systematic Literature Review," IEEE Access, vol. 12, pp. 62558–62580, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3394732

A. Kuhn, S. Ducasse, and T. Gîrba, "Semantic Clustering: Identifying Topics in Source Code," Information and Software Technology, vol. 49, no. 3, pp. 230–243, Mar. 2007. DOI: https://doi.org/10.1016/j.infsof.2006.10.017

H. Abdelmalek, G. Chénard, I. Khriss, and A. Jakimi, "A Bimodal Approach for the Discovery of a View of the Implementation Platform of Legacy Object-Oriented Systems under Modernization Process," in Proceedings of 35th International Conference on Computers and Their Applications, pp. 83–93, Mar. 2020.

Q. I. Sarhan, B. S. Ahmed, M. Bures, and K. Z. Zamli, "Software Module Clustering: An In-Depth Literature Analysis," IEEE Transactions on Software Engineering, vol. 48, no. 6, pp. 1905–1928, Jun. 2022. DOI: https://doi.org/10.1109/TSE.2020.3042553

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by Latent Semantic Analysis," Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391–407, Sept. 1990. DOI: https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

Y. Kanellopoulos, C. Makris, and C. Tjortjis, "An Improved Methodology on Information Distillation by Mining Program Source Code," Data & Knowledge Engineering, vol. 61, no. 2, pp. 359–383, May 2007. DOI: https://doi.org/10.1016/j.datak.2006.06.002

A. Ashish, "Clones Clustering Using K-means," in 10th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, India, Jan. 2016, pp. 1–6. DOI: https://doi.org/10.1109/ISCO.2016.7726943

H. A. Siala and K. Lano, "Towards Using LLMs in the Reverse Engineering of Software Systems to Object Constraint Language," in 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, QC, Canada, Mar. 2025, pp. 1–6. DOI: https://doi.org/10.1109/SANER64311.2025.00096

H. Abdelmalek, I. Khriss, and A. Jakimi, "Towards an Effective Approach for Composition of Model Transformations," Frontiers in Computer Science, vol. 6, Jun. 2024, Art. no. 1357845. DOI: https://doi.org/10.3389/fcomp.2024.1357845

A. O. Salau and S. Jain, "Feature Extraction: A Survey of the Types, Techniques, Applications," in 2019 International Conference on Signal Processing and Communication (ICSC), NOIDA, India, Mar. 2019, pp. 158–164. DOI: https://doi.org/10.1109/ICSC45622.2019.8938371

S. Qaiser and R. Ali, "Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents," International Journal of Computer Applications, vol. 181, no. 1, pp. 25–29, Jul. 2018. DOI: https://doi.org/10.5120/ijca2018917395

T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space." arXiv, 2013.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." arXiv, 2018.

ZeroMQ, "CLRZMQ." GitHub, 2011, [Online]. Available: https://github.com/zeromq/clrzmq.

G. Marland, "GhostRunner." GitHub, 2014, [Online]. Available: https://github.com/gmarland/GhostRunner.

D. S. Guard, "RoslynSecurityGuard." GitHub, 2017, [Online]. Available: https://github.com/dotnet-security-guard/roslyn-security-guard.

P. Stack, "TeamCitySharp." GitHub, 2016, [Online]. Available: https://github.com/stack72/TeamCitySharp.

T. M. Kodinariya, "Review on Determining Number of Cluster in K-Means Clustering," International Journal of Advance Research in Computer Science and Management Studies, vol. 1, no. 6, Jan. 2013. pp. 90–95.

H. Abdelmalek, "Source Code Clustering." GitHub, 2024, [Online]. Available: https://github.com/AHamza14/Source-code-clustering.

P. J. Rousseeuw, "Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis," Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, Nov. 1987. DOI: https://doi.org/10.1016/0377-0427(87)90125-7

T. Calinski and J. Harabasz, "A Dendrite Method for Cluster Analysis," Communications in Statistics - Theory and Methods, vol. 3, no. 1, pp. 1–27, 1974. DOI: https://doi.org/10.1080/03610927408827101

D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, Apr. 1979. DOI: https://doi.org/10.1109/TPAMI.1979.4766909

M. H. Ahmed, S. Tiun, N. Omar, and N. S. Sani, "Short Text Clustering Algorithms, Application and Challenges: A Survey," Applied Sciences, vol. 13, no. 1, Dec. 2022, Art. no. 342. DOI: https://doi.org/10.3390/app13010342

S. Askari, N. Montazerin, and M. H. Fazel Zarandi, "Generalized Possibilistic Fuzzy C-Means with Novel Cluster Validity Indices for Clustering Noisy Data," Applied Soft Computing, vol. 53, pp. 262–283, Apr. 2017. DOI: https://doi.org/10.1016/j.asoc.2016.12.049

K. Golalipour, E. Akbari, S. S. Hamidi, M. Lee, and R. Enayatifar, "From Clustering-to-Clustering Ensemble Selection: A Review," Engineering Applications of Artificial Intelligence, vol. 104, Sept. 2021, Art. no. 104388. DOI: https://doi.org/10.1016/j.engappai.2021.104388

Downloads

How to Cite

[1]
H. Abdelmalek, Z. Babaalla, C. Ouaddi, L. Benaddi, and A. Jakimi, “Model-Driven Engineering and Machine Learning for Legacy System Modernization”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 1, pp. 32285–32291, Feb. 2026.

Metrics

Abstract Views: 187
PDF Downloads: 82

Metrics Information