IRFCA: A Hybrid Fuzzy-Possibilistic Clustering Algorithm for High-Dimensional Data

Sumrana Siddiqui; Nandita Bhanja Chaudhuri

doi:10.48084/etasr.16837

Authors

Sumrana Siddiqui Computer Science Engineering Department, GITAM School of Technology, India
Nandita Bhanja Chaudhuri Computer Science Engineering Department, GITAM School of Technology, India

Volume: 16 | Issue: 2 | Pages: 33065-33071 | April 2026 | https://doi.org/10.48084/etasr.16837

Received: 11 December 2025 | Revised: 29 December 2025, 13 January 2026, and 25 January 2026 | Accepted: 27 January 2026 | Online: 1 March 2026

Corresponding author: Nandita Bhanja Chaudhuri

Abstract

Clustering high-dimensional datasets poses significant challenges due to the "curse of dimensionality," noise, and outliers, which render traditional algorithms like FCM and K-Means ineffective. The primary objective of this study was to develop a robust framework that balances noise resilience with the scalability required for large-scale data. The proposed Intelligent Robust Fuzzy Clustering Algorithm (IRFCA) is a hybrid pipeline that integrates Truncated SVD for dimensionality reduction, K-Means++ for stable initialization, and a fused FCM-PCM mechanism with dynamic outlier trimming. Tested on large-scale datasets including Network Attack Data (10.9 GB) and CORD-19, IRFCA demonstrates superior clustering quality, achieving Silhouette scores up to 0.63 and Davies-Bouldin Indices as low as 0.34, significantly outperforming baselines (FCM, FCMedian, PCM). These results quantify IRFCA's ability to produce compact, well-separated clusters while maintaining competitive scalability. IRFCA offers a resilient solution for high-dimensional analysis, with transformative potential for cybersecurity, business analytics, and scientific research.

Keywords:

hybrid clustering, high-dimensional data, Principal Component Analysis (PCA), fuzzy clustering, Possibilistic C-Means (PCM), outlier trimming

References

N. L. G. P. Suwirmayanti, I. K. G. D. Putra, M. Sudarma, I. M. Sukarsa, E. Setyaningsih, and R. A. N. Diaz, "Invasive Weed Optimization K-Means Performance Robust Operations (IWOKM PRO) in High-Dimensional Datasets," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 24390–24395, Aug. 2025. DOI: https://doi.org/10.48084/etasr.11112

E. A. Sarwoko, E. Vianita, and A. Wibowo, "Evaluating the Integration of Fuzzy and Non-Fuzzy Clustering Approaches into LSTM for the Power Consumption Forecasting Utilizing the Case Study Dataset of Tetuan City," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 26689–26696, Oct. 2025. DOI: https://doi.org/10.48084/etasr.11938

D. Li, S. Zhou, and W. Pedrycz, "Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and Membership Scaling," IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12337–12349, Dec. 2023. DOI: https://doi.org/10.1109/TKDE.2023.3273274

H. Chhajer and R. Roy, "Rationalised experiment design for parameter estimation with sensitivity clustering," Scientific Reports, vol. 14, no. 1, Oct. 2024, Art. no. 25864. DOI: https://doi.org/10.1038/s41598-024-75539-2

P. S. Thakur and S. Mohapatra, "A Survey of Fuzzy Clustering Methods and Validation Approaches," in Data Science and Applications, vol. 1266, S. J. Nanda, R. P. Yadav, A. H. Gandomi, and M. Saraswat, Eds. Springer Nature Singapore, 2025, pp. 183–194. DOI: https://doi.org/10.1007/978-981-96-2647-2_12

N. Halko, P. G. Martinsson, and J. A. Tropp, "Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions," SIAM Review, vol. 53, no. 2, pp. 217–288, Jan. 2011. DOI: https://doi.org/10.1137/090771806

X. Zhang et al., "ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases," in Proceedings of the 2021 International Conference on Management of Data, June 2021, pp. 2102–2114. DOI: https://doi.org/10.1145/3448016.3457291

"Kitsune Network Attack Dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/ymirsky/network-attack-dataset-kitsune.

"Meta Kaggle." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/kaggle/meta-kaggle.

"COVID-19 Open Research Dataset Challenge (CORD-19)." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge.

D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, Apr. 1979. DOI: https://doi.org/10.1109/TPAMI.1979.4766909

M. J. Warrens and H. Van Der Hoef, "Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs," Journal of Classification, vol. 39, no. 3, pp. 487–509, Nov. 2022. DOI: https://doi.org/10.1007/s00357-022-09413-z

A. K. Jain and R. C. Dubes, Algorithms for clustering data. Prentice-Hall, Inc., 1988.

H. Li and J. Wang, "From Soft Clustering to Hard Clustering: A Collaborative Annealing Fuzzy c-Means Algorithm," IEEE Transactions on Fuzzy Systems, vol. 32, no. 3, pp. 1181–1194, Mar. 2024. DOI: https://doi.org/10.1109/TFUZZ.2023.3319663

S. Lloyd, "Least squares quantization in PCM," IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, Mar. 1982. DOI: https://doi.org/10.1109/TIT.1982.1056489

M. M. U. Rony et al., "Augmenting Visualizations with Predictive and Investigative Insights to Facilitate Decision Making," in Companion Proceedings of the ACM Web Conference 2023, Dec. 2023, pp. 77–81. DOI: https://doi.org/10.1145/3543873.3587317

F. Ros, R. Riad, and S. Guillaume, "PDBI: A partitioning Davies-Bouldin index for clustering evaluation," Neurocomputing, vol. 528, pp. 178–199, Apr. 2023. DOI: https://doi.org/10.1016/j.neucom.2023.01.043

N. Cicekli and I. Cicekli, "Formalizing the specification and execution of workflows using the event calculus," Information Sciences, vol. 176, no. 15, pp. 2227–2267, Aug. 2006. DOI: https://doi.org/10.1016/j.ins.2005.10.007

D. Feldman, M. Schmidt, and C. Sohler, "Turning Big Data Into Tiny Data: Constant-Size Coresets for k-Means, PCA, and Projective Clustering," SIAM Journal on Computing, vol. 49, no. 3, pp. 601–657, Jan. 2020. DOI: https://doi.org/10.1137/18M1209854