An Algorithm to Optimize Frequent Pattern Mining in Parallel and Distributed Environment
Received: 4 December 2024 | Revised: 25 December 2024, 14 January 2025, 1 February 2025, 3 February 2025 | Accepted: 5 February 2025 | Online: 24 March 2025
Corresponding author: Anshu Singla
Abstract
Frequent Pattern Mining (FPM) is an important data mining task that involves identifying recurrent patterns or correlations in datasets. The main purpose of FPM algorithms is to find sets of items that frequently appear in transactional or relational databases. This study presents a Parallel and Distributed Recursive Elimination (PDReLim) algorithm, a novel FPM technique designed for parallel computing to improve efficiency compared to existing parallel FPM algorithms. PDReLim recursively deletes infrequent items on each node while using the capabilities of parallel and distributed systems or clusters. Its performance was evaluated on well-known datasets, namely Chess, Mushroom, and Connect, available in the UCI repository, with a focus on the lowest support threshold, which causes computational bottlenecks for many FPM algorithms. PDReLim, implemented in PySpark, outperforms standard MapReduce for iterative algorithms. Spark's execution is optimized for large databases by utilizing its proficient capabilities, such as the RDD data structure, in-memory processing, and shared variables. The results show that PDReLim was significantly faster than PApriori, PFP-Growth, and PFP-Max.
Keywords:
PySpark, Frequent Pattern Mining (FPM), parallel FPM, Spark, association rule mining, apriori, eclatDownloads
References
P. Gupta and V. Sawant, "A Parallel Apriori Algorithm and FP- Growth Based on SPARK," ITM Web of Conferences, vol. 40, 2021, Art. no. 03046.
M. J. Zaki, "Parallel and distributed association mining: a survey," IEEE Concurrency, vol. 7, no. 4, Oct. 1999, Art. no. 14–25.
S. Biswas, N. Biswas, and K. C. Mondal, "Parallel and Distributed Association Mining: A Recent Survey," Information Management and Computer Science, vol. 2, no. 1, pp. 15–24, Sep. 2019.
R. Khajuria, A. Sharma, S. Sharma, A. Sharma, J. Narayan Baliya, and P. Singh, "Performance analysis of frequent pattern mining algorithm on different real-life dataset," Indonesian Journal of Electrical Engineering and Computer Science, vol. 29, no. 3, Mar. 2023, Art. no. 1355.
M. R. Al-Bana, M. S. Farhan, and N. A. Othman, "An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data," Data, vol. 7, no. 1, Jan. 2022, Art. no. 11.
C. Fernandez-Basso, M. D. Ruiz, and M. J. Martin-Bautista, "New Spark solutions for distributed frequent itemset and association rule mining algorithms," Cluster Computing, vol. 27, no. 2, pp. 1217–1234, Apr. 2024.
L. Liu, J. Wen, Z. Zheng, and H. Su, "An improved approach for mining association rules in parallel using Spark Streaming," International Journal of Circuit Theory and Applications, vol. 49, no. 4, pp. 1028–1039, Apr. 2021.
J. J. Flores et al., "Parallel mining of frequent patterns for school records analytics at the Universidad Michoacana," in 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, Nov. 2017, pp. 1–6.
F. Gao, C. Bhowmick, and J. Liu, "Performance Analysis Using Apriori Algorithm Along with Spark and Python," in Proceedings of the 2018 International Conference on Computing and Big Data, Charleston, SC, USA, Sep. 2018, pp. 28–31.
A. Satty, M. M. Y. Salih, A. A. Hassaballa, E. A. E. Gumma, A. Abdallah, and G. S. Mohamed Khamis, "Comparative Analysis of Machine Learning Algorithms for Investigating Myocardial Infarction Complications," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12775–12779, Feb. 2024.
S. S. Alzahrani, "Data Mining Regarding Cyberbullying in the Arabic Language on Instagram Using KNIME and Orange Tools," Engineering, Technology & Applied Science Research, vol. 12, no. 5, pp. 9364–9371, Oct. 2022.
B. Bouaita, A. Beghriche, A. Kout, and A. Moussaoui, "A New Approach for Optimizing the Extraction of Association Rules," Engineering, Technology & Applied Science Research, vol. 13, no. 2, pp. 10496–10500, Apr. 2023.
D. J. I. Raj, V. S. Radhakrishnan, M. R. Reddy, N. S. Selvan, B. Elangovan, and M. Ganesan, "The Projection-Based Data Transformation Approach for Privacy Preservation in Data Mining," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15969–15974, Aug. 2024.
M. Sinthuja, S. Pravinthraja, B. K. Dhanalakshmi, H. L. Gururaj, V. Ravi, and G. Jyothish Lal, "An efficient and resilience linear prefix approach for mining maximal frequent itemset using clustering," Journal of Safety Science and Resilience, vol. 6, no. 1, pp. 93–104, Mar. 2025.
"Mushroom." UCI Machine Learning Repository, 1981.
J. Tromp, "Connect-4." UCI Machine Learning Repository, 1995.
R. Quinlan, "Chess (King-Rook vs. King-Knight)." UCI Machine Learning Repository, 1983.
Downloads
How to Cite
License
Copyright (c) 2025 Anshu Singla, Parul Gandhi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.