This is a preview and has not been published. View submission

Ensemble Machine Learning for Reliable Water Potability Prediction with Optimized Physicochemical Feature Engineering

Authors

  • Rohit Srivastava CSE Department, NIIT University, Neemrana, Rajasthan, India
  • Keshav Sinha School of Computer Science, UPES, Dehradun, India
  • Dheeraj Samanta School of Computer Science, UPES, Dehradun, India
  • Reham Alsabet College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia | Automated Systems and Computing Lab (ASCL), Prince Sultan University, Riyadh, Saudi Arabia
  • Walid El-Shafai College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia | Automated Systems and Computing Lab (ASCL), Prince Sultan University, Riyadh, Saudi Arabia | Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt
  • Ahmad Taher Azar College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia | Automated Systems and Computing Lab (ASCL), Prince Sultan University, Riyadh, Saudi Arabia
Volume: 16 | Issue: 2 | Pages: 33647-33659 | April 2026 | https://doi.org/10.48084/etasr.16495

Abstract

Access to safe drinking water is a global challenge for both health and sustainability. This study presents a comprehensive examination of supervised machine learning models for water potability classification using physicochemical water quality characteristics, and a leakage-safe preprocessing pipeline incorporating stratified data splitting, mean imputation, and feature standardization. Class imbalance is handled by the Synthetic Minority Over-Sampling Technique (SMOTE), and feature selection and normalization were used to preprocess the dataset. Grid search and 5-fold cross-validation were used to train and optimize models, namely Random Forest (RFC), Logistic Regression (LR), Decision Tree (DTC), AdaBoost, K-Nearest Neighbor (KNN), Support Vector Classifier (SVC), and Gradient Boosting (GBC). Accuracy, precision, recall, and F1 score on the test set were used to assess performance, showing that ensemble approaches, namely RFC and GBC, performed better, having outstanding stability and prediction accuracy. Preprocessing contributes to increasing model sensitivity to the minority class, while balancing trade-offs between interpretability and model complexity are discussed. The findings contribute to the development of reliable, data-driven water quality monitoring systems that can align with sustainable development and public health goals. Since the limited dataset size may affect the generalizability to other regions, future studies should validate these findings on more diverse and comprehensive datasets.

Keywords:

water potability, water quality, machine learning, feature selection, class imbalance, SMOTE, ensemble methods, mutual information, variance inflation factor, recursive feature elimination

Downloads

Download data is not yet available.

References

M. G. Uddin, S. Nash, A. Rahman, and A. I. Olbert, "Performance analysis of the water quality index model for predicting water state using machine learning techniques," Process Safety and Environmental Protection, vol. 169, pp. 808–828, Jan. 2023.

A. Aldrees, M. F. Javed, A. T. Bakheit Taha, A. Mustafa Mohamed, M. Jasiński, and M. Gono, "Evolutionary and ensemble machine learning predictive models for evaluation of water quality," Journal of Hydrology: Regional Studies, vol. 46, Apr. 2023, Art. no. 101331.

M. Koranga, P. Pant, T. Kumar, D. Pant, A. K. Bhatt, and R. P. Pant, "Efficient water quality prediction models based on machine learning algorithms for Nainital Lake, Uttarakhand," Materials Today: Proceedings, vol. 57, pp. 1706–1712, 2022.

M. Zhu et al., "A review of the application of machine learning in water quality evaluation," Eco-Environment & Health, vol. 1, no. 2, pp. 107–116, June 2022.

P. William, O. J. Oyebode, G. Ramu, M. Gupta, D. Bordoloi, and A. Shrivastava, "Artificial Intelligence based Models to Support Water Quality Prediction using Machine Learning Approach," in 2023 International Conference on Circuit Power and Computing Technologies (ICCPCT), Aug. 2023, pp. 1496–1501.

E. Dritsas and M. Trigka, "Efficient Data-Driven Machine Learning Models for Water Quality Prediction," Computation, vol. 11, no. 2, Jan. 2023, Art. no. 16.

A. F. Zambrano, L. F. Giraldo, J. Quimbayo, B. Medina, and E. Castillo, "Machine learning for manually-measured water quality prediction in fish farming," PLOS ONE, vol. 16, no. 8, Aug. 2021, Art. no. e0256380.

M. Y. Shams, A. M. Elshewey, E. S. M. El-kenawy, A. Ibrahim, F. M. Talaat, and Z. Tarek, "Water quality prediction using machine learning models based on grid search method," Multimedia Tools and Applications, vol. 83, no. 12, pp. 35307–35334, Sept. 2023.

S. Patel, K. Shah, S. Vaghela, M. Aglodiya, and R. Bhattad, "Water Potability Prediction Using Machine Learning." In Review, May 25, 2023.

N. S. Pagadala, M. Marri, A. Myla, B. Abburi, and K. S. Ramtej, "Water Quality Prediction Using Machine Learning Techniques," in 2023 10th International Conference on Signal Processing and Integrated Networks (SPIN), Mar. 2023, pp. 358–362.

A. Kadiwal, "Water Quality." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/adityakadiwal/water-potability.

P. Kamath B., G. Sharma, A. Bongale, D. Dharrao, and M. Seitshiro, "Exploratory Data Analysis and Water Potability Classification using Supervised Machine Learning Algorithms," Engineering, Technology & Applied Science Research, vol. 15, no. 2, pp. 20898–20903, Apr. 2025.

R. J. May, G. C. Dandy, H. R. Maier, and J. B. Nixon, "Application of partial mutual information variable selection to ANN forecasting of water quality in water distribution systems," Environmental Modelling & Software, vol. 23, no. 10, pp. 1289–1299, Oct. 2008.

M. Na, X. Liu, Z. Tong, B. Sudu, J. Zhang, and R. Wang, "Analysis of water quality influencing factors under multi-source data fusion based on PLS-SEM model: An example of East-Liao River in China," Science of The Total Environment, vol. 907, Jan. 2024, Art. no. 168126.

N. G. Rezk, S. Alshathri, A. Sayed, and E. El-Din Hemdan, "EWAIS: An Ensemble Learning and Explainable AI Approach for Water Quality Classification Toward IoT-Enabled Systems," Processes, vol. 12, no. 12, Dec. 2024, Art. no. 2771.

A. Zaghini et al., "A Pragmatic Approach for Chlorine Decay Modeling in Multiple-Source Water Distribution Networks Based on Trace Analysis," Water, vol. 16, no. 2, Jan. 2024, Art. no. 345.

Downloads

How to Cite

[1]
R. Srivastava, K. Sinha, D. Samanta, R. Alsabet, W. El-Shafai, and A. T. Azar, “Ensemble Machine Learning for Reliable Water Potability Prediction with Optimized Physicochemical Feature Engineering”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 2, pp. 33647–33659, Apr. 2026.

Metrics

Abstract Views: 117
PDF Downloads: 58

Metrics Information