Model-Driven Engineering and Machine Learning for Legacy System Modernization
Received: 22 November 2025 | Revised: 13 December 2025 | Accepted: 26 December 2025 | Online: 8 January 2026
Corresponding author: Hamza Abdelmalek
Abstract
Model-driven engineering is significant in modern software development, as it leverages models to guide software design and implementation. However, legacy systems are characterized by outdated architectures and poor documentation, which pose challenges for maintenance. Model-Driven Reverse Engineering (MDRE) addresses these issues by extracting models from legacy systems, enabling better comprehension. This study explores clustering techniques to extract high-level concepts from source code. Clustering reveals system concepts that can be transformed into models by grouping similar code entities. These models provide a foundation for system modernization and further development. Unlike existing MDRE approaches, this study systematically evaluates multiple clustering techniques and preprocessing scenarios to assess their ability to recover high-level system concepts from legacy source code. The results highlight the impact of preprocessing and noise on concept extraction, providing insights for applying clustering within model-driven modernization workflows.
Keywords:
clustering, machine learning, model-driven architecture, model-driven engineering, source codeDownloads
References
M. W. Godfrey and D. M. German, "The Past, Present, and Future of Software Evolution," in 2008 Frontiers of Software Maintenance, Beijing, China, Sept. 2008, pp. 129–138. DOI: https://doi.org/10.1109/FOSM.2008.4659256
C. Verbruggen and M. Snoeck, "Practitioners’ Experiences with Model-Driven Engineering: A Meta-Review," Software and Systems Modeling, vol. 22, no. 1, pp. 111–129, Feb. 2023. DOI: https://doi.org/10.1007/s10270-022-01020-1
J. Miller and J. Mukerji, MDA Guide Version 1.0.1, Needham, MA, USA: Object Management Group, 2003.
H. A. Siala, K. Lano, and H. Alfraihi, "Model-Driven Approaches for Reverse Engineering—A Systematic Literature Review," IEEE Access, vol. 12, pp. 62558–62580, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3394732
A. Kuhn, S. Ducasse, and T. Gîrba, "Semantic Clustering: Identifying Topics in Source Code," Information and Software Technology, vol. 49, no. 3, pp. 230–243, Mar. 2007. DOI: https://doi.org/10.1016/j.infsof.2006.10.017
H. Abdelmalek, G. Chénard, I. Khriss, and A. Jakimi, "A Bimodal Approach for the Discovery of a View of the Implementation Platform of Legacy Object-Oriented Systems under Modernization Process," in Proceedings of 35th International Conference on Computers and Their Applications, pp. 83–93, Mar. 2020.
Q. I. Sarhan, B. S. Ahmed, M. Bures, and K. Z. Zamli, "Software Module Clustering: An In-Depth Literature Analysis," IEEE Transactions on Software Engineering, vol. 48, no. 6, pp. 1905–1928, Jun. 2022. DOI: https://doi.org/10.1109/TSE.2020.3042553
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by Latent Semantic Analysis," Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391–407, Sept. 1990. DOI: https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Y. Kanellopoulos, C. Makris, and C. Tjortjis, "An Improved Methodology on Information Distillation by Mining Program Source Code," Data & Knowledge Engineering, vol. 61, no. 2, pp. 359–383, May 2007. DOI: https://doi.org/10.1016/j.datak.2006.06.002
A. Ashish, "Clones Clustering Using K-means," in 10th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, India, Jan. 2016, pp. 1–6. DOI: https://doi.org/10.1109/ISCO.2016.7726943
H. A. Siala and K. Lano, "Towards Using LLMs in the Reverse Engineering of Software Systems to Object Constraint Language," in 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, QC, Canada, Mar. 2025, pp. 1–6. DOI: https://doi.org/10.1109/SANER64311.2025.00096
H. Abdelmalek, I. Khriss, and A. Jakimi, "Towards an Effective Approach for Composition of Model Transformations," Frontiers in Computer Science, vol. 6, Jun. 2024, Art. no. 1357845. DOI: https://doi.org/10.3389/fcomp.2024.1357845
A. O. Salau and S. Jain, "Feature Extraction: A Survey of the Types, Techniques, Applications," in 2019 International Conference on Signal Processing and Communication (ICSC), NOIDA, India, Mar. 2019, pp. 158–164. DOI: https://doi.org/10.1109/ICSC45622.2019.8938371
S. Qaiser and R. Ali, "Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents," International Journal of Computer Applications, vol. 181, no. 1, pp. 25–29, Jul. 2018. DOI: https://doi.org/10.5120/ijca2018917395
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space." arXiv, 2013.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." arXiv, 2018.
ZeroMQ, "CLRZMQ." GitHub, 2011, [Online]. Available: https://github.com/zeromq/clrzmq.
G. Marland, "GhostRunner." GitHub, 2014, [Online]. Available: https://github.com/gmarland/GhostRunner.
D. S. Guard, "RoslynSecurityGuard." GitHub, 2017, [Online]. Available: https://github.com/dotnet-security-guard/roslyn-security-guard.
P. Stack, "TeamCitySharp." GitHub, 2016, [Online]. Available: https://github.com/stack72/TeamCitySharp.
T. M. Kodinariya, "Review on Determining Number of Cluster in K-Means Clustering," International Journal of Advance Research in Computer Science and Management Studies, vol. 1, no. 6, Jan. 2013. pp. 90–95.
H. Abdelmalek, "Source Code Clustering." GitHub, 2024, [Online]. Available: https://github.com/AHamza14/Source-code-clustering.
P. J. Rousseeuw, "Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis," Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, Nov. 1987. DOI: https://doi.org/10.1016/0377-0427(87)90125-7
T. Calinski and J. Harabasz, "A Dendrite Method for Cluster Analysis," Communications in Statistics - Theory and Methods, vol. 3, no. 1, pp. 1–27, 1974. DOI: https://doi.org/10.1080/03610927408827101
D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, Apr. 1979. DOI: https://doi.org/10.1109/TPAMI.1979.4766909
M. H. Ahmed, S. Tiun, N. Omar, and N. S. Sani, "Short Text Clustering Algorithms, Application and Challenges: A Survey," Applied Sciences, vol. 13, no. 1, Dec. 2022, Art. no. 342. DOI: https://doi.org/10.3390/app13010342
S. Askari, N. Montazerin, and M. H. Fazel Zarandi, "Generalized Possibilistic Fuzzy C-Means with Novel Cluster Validity Indices for Clustering Noisy Data," Applied Soft Computing, vol. 53, pp. 262–283, Apr. 2017. DOI: https://doi.org/10.1016/j.asoc.2016.12.049
K. Golalipour, E. Akbari, S. S. Hamidi, M. Lee, and R. Enayatifar, "From Clustering-to-Clustering Ensemble Selection: A Review," Engineering Applications of Artificial Intelligence, vol. 104, Sept. 2021, Art. no. 104388. DOI: https://doi.org/10.1016/j.engappai.2021.104388
Downloads
How to Cite
License
Copyright (c) 2026 Hamza Abdelmalek, Zakaria Babaalla, Charaf Ouaddi, Lamya Benaddi, Abdeslam Jakimi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
