IRFCA: A Hybrid Fuzzy-Possibilistic Clustering Algorithm for High-Dimensional Data
Received: 11 December 2025 | Revised: 29 December 2025, 13 January 2026, and 25 January 2026 | Accepted: 27 January 2026 | Online: 1 March 2026
Corresponding author: Nandita Bhanja Chaudhuri
Abstract
Clustering high-dimensional datasets poses significant challenges due to the "curse of dimensionality," noise, and outliers, which render traditional algorithms like FCM and K-Means ineffective. The primary objective of this study was to develop a robust framework that balances noise resilience with the scalability required for large-scale data. The proposed Intelligent Robust Fuzzy Clustering Algorithm (IRFCA) is a hybrid pipeline that integrates Truncated SVD for dimensionality reduction, K-Means++ for stable initialization, and a fused FCM-PCM mechanism with dynamic outlier trimming. Tested on large-scale datasets including Network Attack Data (10.9 GB) and CORD-19, IRFCA demonstrates superior clustering quality, achieving Silhouette scores up to 0.63 and Davies-Bouldin Indices as low as 0.34, significantly outperforming baselines (FCM, FCMedian, PCM). These results quantify IRFCA's ability to produce compact, well-separated clusters while maintaining competitive scalability. IRFCA offers a resilient solution for high-dimensional analysis, with transformative potential for cybersecurity, business analytics, and scientific research.
Keywords:
hybrid clustering, high-dimensional data, Principal Component Analysis (PCA), fuzzy clustering, Possibilistic C-Means (PCM), outlier trimmingDownloads
References
N. L. G. P. Suwirmayanti, I. K. G. D. Putra, M. Sudarma, I. M. Sukarsa, E. Setyaningsih, and R. A. N. Diaz, "Invasive Weed Optimization K-Means Performance Robust Operations (IWOKM PRO) in High-Dimensional Datasets," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 24390–24395, Aug. 2025.
E. A. Sarwoko, E. Vianita, and A. Wibowo, "Evaluating the Integration of Fuzzy and Non-Fuzzy Clustering Approaches into LSTM for the Power Consumption Forecasting Utilizing the Case Study Dataset of Tetuan City," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 26689–26696, Oct. 2025.
D. Li, S. Zhou, and W. Pedrycz, "Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and Membership Scaling," IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12337–12349, Dec. 2023.
H. Chhajer and R. Roy, "Rationalised experiment design for parameter estimation with sensitivity clustering," Scientific Reports, vol. 14, no. 1, Oct. 2024, Art. no. 25864.
P. S. Thakur and S. Mohapatra, "A Survey of Fuzzy Clustering Methods and Validation Approaches," in Data Science and Applications, vol. 1266, S. J. Nanda, R. P. Yadav, A. H. Gandomi, and M. Saraswat, Eds. Springer Nature Singapore, 2025, pp. 183–194.
N. Halko, P. G. Martinsson, and J. A. Tropp, "Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions," SIAM Review, vol. 53, no. 2, pp. 217–288, Jan. 2011.
X. Zhang et al., "ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases," in Proceedings of the 2021 International Conference on Management of Data, June 2021, pp. 2102–2114.
"Kitsune Network Attack Dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/ymirsky/network-attack-dataset-kitsune.
"Meta Kaggle." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/kaggle/meta-kaggle.
"COVID-19 Open Research Dataset Challenge (CORD-19)." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge.
D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, Apr. 1979.
M. J. Warrens and H. Van Der Hoef, "Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs," Journal of Classification, vol. 39, no. 3, pp. 487–509, Nov. 2022.
A. K. Jain and R. C. Dubes, Algorithms for clustering data. Prentice-Hall, Inc., 1988.
H. Li and J. Wang, "From Soft Clustering to Hard Clustering: A Collaborative Annealing Fuzzy c-Means Algorithm," IEEE Transactions on Fuzzy Systems, vol. 32, no. 3, pp. 1181–1194, Mar. 2024.
S. Lloyd, "Least squares quantization in PCM," IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, Mar. 1982.
M. M. U. Rony et al., "Augmenting Visualizations with Predictive and Investigative Insights to Facilitate Decision Making," in Companion Proceedings of the ACM Web Conference 2023, Dec. 2023, pp. 77–81.
F. Ros, R. Riad, and S. Guillaume, "PDBI: A partitioning Davies-Bouldin index for clustering evaluation," Neurocomputing, vol. 528, pp. 178–199, Apr. 2023.
N. Cicekli and I. Cicekli, "Formalizing the specification and execution of workflows using the event calculus," Information Sciences, vol. 176, no. 15, pp. 2227–2267, Aug. 2006.
D. Feldman, M. Schmidt, and C. Sohler, "Turning Big Data Into Tiny Data: Constant-Size Coresets for k-Means, PCA, and Projective Clustering," SIAM Journal on Computing, vol. 49, no. 3, pp. 601–657, Jan. 2020.
Downloads
How to Cite
License
Copyright (c) 2026 Sumrana Siddiqui, Nandita Bhanja Chaudhuri

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
