Performance Enhancement of Distributed Processing Systems Using Novel Hybrid Shard Selection Algorithm
Received: 22 February 2022 | Revised: 7 March 2024 | Accepted: 12 March 2024 | Online: 2 April 2024
Corresponding author: Praveen M. Dhulavvagol
Abstract
Distributed processing systems play a crucial role in query search operations, where large-scale data are partitioned across multiple nodes using shard selection algorithms. However, the existing shard selection algorithms pose significant challenges, such as shard ranking, shard cut-off estimation, high latency, low throughput, and high processing costs. These limitations become more pronounced as the data size increases, affecting the efficiency and effectiveness of search operations. To address these challenges, the novel Hybrid Shard Selection Algorithm (HSSA) is proposed as a solution in this paper, designed specifically to enhance the effectiveness and efficiency of search operations within distributed processing systems. HSSA employs an advanced sharding approach that adeptly navigates and targets pertinent shards based on specific queries. This not only curtails search-related overhead but also enhances operational efficiency. Through rigorous testing using the Gov2 dataset, the HSSA algorithm has proven its merits. When set against well-established algorithms like CORI, Rank-S, and SHiRE, HSSA stands out, registering remarkable gains in average throughput by 21%, 16%, and 12%, while also slashing latency by 14.2%, 9.4%, and 8.2%, respectively. The insights gained from this research underscore HSSA's capability to effectively bridge the gaps inherent in traditional shard selection strategies. Furthermore, its exemplary efficacy with datasets of varied sizes amplifies its relevance for practical integration within distributed processing landscapes.
Keywords:
sharding, cluster, indexing, partitioning, allocationDownloads
References
N. Venkateswaran and S. Changder, "Simplified data partitioning in a consistent hashing based sharding implementation," in TENCON 2017 - 2017 IEEE Region 10 Conference, Penang, Malaysia, Aug. 2017, pp. 895–900.
A. Kulkarni, A. S. Tigelaar, D. Hiemstra, and J. Callan, "Shard ranking and cutoff estimation for topically partitioned collections," in Proceedings of the 21st ACM international conference on Information and knowledge management, New York, NY, USA, Jul. 2012, pp. 555–564.
J. Kamal, M. Murshed, and R. Buyya, "Workload-aware incremental repartitioning of shared-nothing distributed databases for scalable OLTP applications," Future Generation Computer Systems, vol. 56, pp. 421–435, Mar. 2016.
H. R. Mohammad, K. Xu, J. Callan, and J. S. Culpepper, "Dynamic Shard Cutoff Prediction for Selective Search," in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, New York, NY, USA, Mar. 2018, pp. 85–94.
P. M. Dhulavvagol, V. H. Bhajantri, and S. G. Totad, "Performance Analysis of Distributed Processing System using Shard Selection Techniques on Elasticsearch," Procedia Computer Science, vol. 167, pp. 1626–1635, Jan. 2020.
Z. Dai, C. Xiong, and J. Callan, "Query-Biased Partitioning for Selective Search," in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, New York, NY, USA, Jul. 2016, pp. 1119–1128.
A. Kulkarni and J. Callan, "Selective Search: Efficient and Effective Search of Large Textual Collections," ACM Transactions on Information Systems, vol. 33, no. 4, pp. 17:1-17:33, Dec. 2015.
P. M. Dhulavvagol, S. G. Totad, and S. Sourabh, "Performance Analysis of Job Scheduling Algorithms on Hadoop Multi-cluster Environment," in Emerging Research in Electronics, Computer Science and Technology, Singapore, 2019, pp. 457–470.
N. C. Kundur, B. C. Anil, P. M. Dhulavvagol, R. Ganiger, and B. Ramadoss, "Pneumonia Detection in Chest X-Rays using Transfer Learning and TPUs," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11878–11883, Oct. 2023.
E. Rodrigues and R. Morla, "Run Time Prediction for Big Data Iterative ML Algorithms: a KMeans case study," Oct. 2017.
M. Ali, N. Q. Soomro, H. Ali, A. Awan, and M. Kirmani, "Distributed File Sharing and Retrieval Model for Cloud Virtual Environment," Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 4062–4065, Apr. 2019.
N. Jayakumar and A. M. Kulkarni, "A Simple Measuring Model for Evaluating the Performance of Small Block Size Accesses in Lustre File System," Engineering, Technology & Applied Science Research, vol. 7, no. 6, pp. 2313–2318, Dec. 2017.
Downloads
How to Cite
License
Copyright (c) 2024 Praveen M. Dhulavvagol, Sashikumar G. Totad
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.