Ensemble Machine Learning for Reliable Water Potability Prediction with Optimized Physicochemical Feature Engineering
Received: 25 November 2025 | Revised: 28 December 2025, 3 February 2026, and 12 February 2026 | Accepted: 13 February 2026 | Online: 27 February 2026
Corresponding author: Reham Alsabet
Abstract
Access to safe drinking water is a global challenge for both health and sustainability. This study presents a comprehensive examination of supervised machine learning models for water potability classification using physicochemical water quality characteristics, and a leakage-safe preprocessing pipeline incorporating stratified data splitting, mean imputation, and feature standardization. Class imbalance is handled by the Synthetic Minority Over-Sampling Technique (SMOTE), and feature selection and normalization were used to preprocess the dataset. Grid search and 5-fold cross-validation were used to train and optimize models, namely Random Forest (RFC), Logistic Regression (LR), Decision Tree (DTC), AdaBoost, K-Nearest Neighbor (KNN), Support Vector Classifier (SVC), and Gradient Boosting (GBC). Accuracy, precision, recall, and F1 score on the test set were used to assess performance, showing that ensemble approaches, namely RFC and GBC, performed better, having outstanding stability and prediction accuracy. Preprocessing contributes to increasing model sensitivity to the minority class, while balancing trade-offs between interpretability and model complexity are discussed. The findings contribute to the development of reliable, data-driven water quality monitoring systems that can align with sustainable development and public health goals. Since the limited dataset size may affect the generalizability to other regions, future studies should validate these findings on more diverse and comprehensive datasets.
Keywords:
water potability, water quality, machine learning, feature selection, class imbalance, SMOTE, ensemble methods, mutual information, variance inflation factor, recursive feature eliminationDownloads
References
M. G. Uddin, S. Nash, A. Rahman, and A. I. Olbert, "Performance analysis of the water quality index model for predicting water state using machine learning techniques," Process Safety and Environmental Protection, vol. 169, pp. 808–828, Jan. 2023.
A. Aldrees, M. F. Javed, A. T. Bakheit Taha, A. Mustafa Mohamed, M. Jasiński, and M. Gono, "Evolutionary and ensemble machine learning predictive models for evaluation of water quality," Journal of Hydrology: Regional Studies, vol. 46, Apr. 2023, Art. no. 101331.
M. Koranga, P. Pant, T. Kumar, D. Pant, A. K. Bhatt, and R. P. Pant, "Efficient water quality prediction models based on machine learning algorithms for Nainital Lake, Uttarakhand," Materials Today: Proceedings, vol. 57, pp. 1706–1712, 2022.
M. Zhu et al., "A review of the application of machine learning in water quality evaluation," Eco-Environment & Health, vol. 1, no. 2, pp. 107–116, June 2022.
P. William, O. J. Oyebode, G. Ramu, M. Gupta, D. Bordoloi, and A. Shrivastava, "Artificial Intelligence based Models to Support Water Quality Prediction using Machine Learning Approach," in 2023 International Conference on Circuit Power and Computing Technologies (ICCPCT), Aug. 2023, pp. 1496–1501.
E. Dritsas and M. Trigka, "Efficient Data-Driven Machine Learning Models for Water Quality Prediction," Computation, vol. 11, no. 2, Jan. 2023, Art. no. 16.
A. F. Zambrano, L. F. Giraldo, J. Quimbayo, B. Medina, and E. Castillo, "Machine learning for manually-measured water quality prediction in fish farming," PLOS ONE, vol. 16, no. 8, Aug. 2021, Art. no. e0256380.
M. Y. Shams, A. M. Elshewey, E. S. M. El-kenawy, A. Ibrahim, F. M. Talaat, and Z. Tarek, "Water quality prediction using machine learning models based on grid search method," Multimedia Tools and Applications, vol. 83, no. 12, pp. 35307–35334, Sept. 2023.
S. Patel, K. Shah, S. Vaghela, M. Aglodiya, and R. Bhattad, "Water Potability Prediction Using Machine Learning." In Review, May 25, 2023.
N. S. Pagadala, M. Marri, A. Myla, B. Abburi, and K. S. Ramtej, "Water Quality Prediction Using Machine Learning Techniques," in 2023 10th International Conference on Signal Processing and Integrated Networks (SPIN), Mar. 2023, pp. 358–362.
A. Kadiwal, "Water Quality." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/adityakadiwal/water-potability.
P. Kamath B., G. Sharma, A. Bongale, D. Dharrao, and M. Seitshiro, "Exploratory Data Analysis and Water Potability Classification using Supervised Machine Learning Algorithms," Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 20898–20903, Apr. 2025.
R. J. May, G. C. Dandy, H. R. Maier, and J. B. Nixon, "Application of partial mutual information variable selection to ANN forecasting of water quality in water distribution systems," Environmental Modelling & Software, vol. 23, no. 10, pp. 1289–1299, Oct. 2008.
M. Na, X. Liu, Z. Tong, B. Sudu, J. Zhang, and R. Wang, "Analysis of water quality influencing factors under multi-source data fusion based on PLS-SEM model: An example of East-Liao River in China," Science of The Total Environment, vol. 907, Jan. 2024, Art. no. 168126.
N. G. Rezk, S. Alshathri, A. Sayed, and E. El-Din Hemdan, "EWAIS: An Ensemble Learning and Explainable AI Approach for Water Quality Classification Toward IoT-Enabled Systems," Processes, vol. 12, no. 12, Dec. 2024, Art. no. 2771.
A. Zaghini et al., "A Pragmatic Approach for Chlorine Decay Modeling in Multiple-Source Water Distribution Networks Based on Trace Analysis," Water, vol. 16, no. 2, Jan. 2024, Art. no. 345.
Downloads
How to Cite
License
Copyright (c) 2026 Rohit Srivastava, Keshav Sinha, Dheeraj Samanta, Reham Alsabet, Walid El-Shafai, Ahmad Taher Azar

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
