This is a preview and has not been published. View submission

IRFCA: A Hybrid Fuzzy-Possibilistic Clustering Algorithm for High-Dimensional Data

Authors

  • Sumrana Siddiqui Computer Science Engineering Department, GITAM School of Technology, India
  • Nandita Bhanja Chaudhuri Computer Science Engineering Department, GITAM School of Technology, India
Volume: 16 | Issue: 2 | Pages: 33065-33071 | April 2026 | https://doi.org/10.48084/etasr.16837

Abstract

Clustering high-dimensional datasets poses significant challenges due to the "curse of dimensionality," noise, and outliers, which render traditional algorithms like FCM and K-Means ineffective. The primary objective of this study was to develop a robust framework that balances noise resilience with the scalability required for large-scale data. The proposed Intelligent Robust Fuzzy Clustering Algorithm (IRFCA) is a hybrid pipeline that integrates Truncated SVD for dimensionality reduction, K-Means++ for stable initialization, and a fused FCM-PCM mechanism with dynamic outlier trimming. Tested on large-scale datasets including Network Attack Data (10.9 GB) and CORD-19, IRFCA demonstrates superior clustering quality, achieving Silhouette scores up to 0.63 and Davies-Bouldin Indices as low as 0.34, significantly outperforming baselines (FCM, FCMedian, PCM). These results quantify IRFCA's ability to produce compact, well-separated clusters while maintaining competitive scalability. IRFCA offers a resilient solution for high-dimensional analysis, with transformative potential for cybersecurity, business analytics, and scientific research.

Keywords:

hybrid clustering, high-dimensional data, Principal Component Analysis (PCA), fuzzy clustering, Possibilistic C-Means (PCM), outlier trimming

Downloads

Download data is not yet available.

References

N. L. G. P. Suwirmayanti, I. K. G. D. Putra, M. Sudarma, I. M. Sukarsa, E. Setyaningsih, and R. A. N. Diaz, "Invasive Weed Optimization K-Means Performance Robust Operations (IWOKM PRO) in High-Dimensional Datasets," Engineering, Technology & Applied Science Research, vol. 15, no. 4, pp. 24390–24395, Aug. 2025.

E. A. Sarwoko, E. Vianita, and A. Wibowo, "Evaluating the Integration of Fuzzy and Non-Fuzzy Clustering Approaches into LSTM for the Power Consumption Forecasting Utilizing the Case Study Dataset of Tetuan City," Engineering, Technology & Applied Science Research, vol. 15, no. 5, pp. 26689–26696, Oct. 2025.

D. Li, S. Zhou, and W. Pedrycz, "Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and Membership Scaling," IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12337–12349, Dec. 2023.

H. Chhajer and R. Roy, "Rationalised experiment design for parameter estimation with sensitivity clustering," Scientific Reports, vol. 14, no. 1, Oct. 2024, Art. no. 25864.

P. S. Thakur and S. Mohapatra, "A Survey of Fuzzy Clustering Methods and Validation Approaches," in Data Science and Applications, vol. 1266, S. J. Nanda, R. P. Yadav, A. H. Gandomi, and M. Saraswat, Eds. Springer Nature Singapore, 2025, pp. 183–194.

N. Halko, P. G. Martinsson, and J. A. Tropp, "Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions," SIAM Review, vol. 53, no. 2, pp. 217–288, Jan. 2011.

X. Zhang et al., "ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases," in Proceedings of the 2021 International Conference on Management of Data, June 2021, pp. 2102–2114.

"Kitsune Network Attack Dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/ymirsky/network-attack-dataset-kitsune.

"Meta Kaggle." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/kaggle/meta-kaggle.

"COVID-19 Open Research Dataset Challenge (CORD-19)." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge.

D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, Apr. 1979.

M. J. Warrens and H. Van Der Hoef, "Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs," Journal of Classification, vol. 39, no. 3, pp. 487–509, Nov. 2022.

A. K. Jain and R. C. Dubes, Algorithms for clustering data. Prentice-Hall, Inc., 1988.

H. Li and J. Wang, "From Soft Clustering to Hard Clustering: A Collaborative Annealing Fuzzy c-Means Algorithm," IEEE Transactions on Fuzzy Systems, vol. 32, no. 3, pp. 1181–1194, Mar. 2024.

S. Lloyd, "Least squares quantization in PCM," IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, Mar. 1982.

M. M. U. Rony et al., "Augmenting Visualizations with Predictive and Investigative Insights to Facilitate Decision Making," in Companion Proceedings of the ACM Web Conference 2023, Dec. 2023, pp. 77–81.

F. Ros, R. Riad, and S. Guillaume, "PDBI: A partitioning Davies-Bouldin index for clustering evaluation," Neurocomputing, vol. 528, pp. 178–199, Apr. 2023.

N. Cicekli and I. Cicekli, "Formalizing the specification and execution of workflows using the event calculus," Information Sciences, vol. 176, no. 15, pp. 2227–2267, Aug. 2006.

D. Feldman, M. Schmidt, and C. Sohler, "Turning Big Data Into Tiny Data: Constant-Size Coresets for k-Means, PCA, and Projective Clustering," SIAM Journal on Computing, vol. 49, no. 3, pp. 601–657, Jan. 2020.

Downloads

How to Cite

[1]
S. Siddiqui and N. B. Chaudhuri, “IRFCA: A Hybrid Fuzzy-Possibilistic Clustering Algorithm for High-Dimensional Data”, Eng. Technol. Appl. Sci. Res., vol. 16, no. 2, pp. 33065–33071, Apr. 2026.

Metrics

Abstract Views: 50
PDF Downloads: 17

Metrics Information