Optimization of ETL Process in Data Warehouse Through a Combination of Parallelization and Shared Cache Memory
Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.
A. Simitsis, P. Vassiliadis, T. Sellis, “Optimizing ETL Processes in Data Warehouses”, IEEE 21st International Conference on Data Engineering (ICDE'05), pp. 2-4, 2005
J. A. Sharp, Data Flow Computing: Theory and Practice, Intellect Books, 1992.
M. Bala, O. Boussaid, Z. Alimazighi, “Big-ETL: Extracting-Transforming-Loading Approach for Big Data”, International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 1-4, 2015
A. V. Simitsis, P. Vassiliadis, T. Sellis “Optimizing ETL Processes in Data Warehouses”, 21st International Conference on Data Engineering (ICDE 2005), pp. 564–575, 2005
A. W. Simitsis, , K. Wilkinson, U. Dayal, M. Castellanos, “Optimizing ETL Workflows for Fault-tolerance”, 26st International Conference on Data Engineering, pp. 385–396, 2010
A. Behrend, “Optimized Incremental ETL Jobs for Maintaining Data Warehouses”, 14th International Database Engineering & Applications Symposium, pp. 216-224, Montreal, Quebec, Canada — August 16 - 18, 2010
S. H. A. El-Sappagh, A. M. A. Hendawi, A. H. El Bastawissy, “A proposed model for data warehouse ETL processes”, Journal of King Saud University Computer and Information Sciences, Vol. 23, No. 2, pp. 91-104, 2011
A. Longo, S. Giacovelli, M. Bochicchio, "Fact – Centered ETL: A Proposal for Speeding Business Analytics up", Procedia Technology, Vol. 16, pp. 471-480, 2014
P. Kettle, "Pentaho Kettle Project", Kettle Project, 2014
X. Liu, Optimizing ETL Dataflow Using Shared Caching and Parallelization Methods. Arxiv, CoRR abs/1409.1639, 2014
MetricsAbstract Views: 411
PDF Downloads: 342
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
Most read articles by the same author(s)
- A. G. Armaki, M. F. Fallah, M. Alborzi, A. Mohammadzadeh, A Hybrid Meta-Learner Technique for Credit Scoring of Banks’ Customers , Engineering, Technology & Applied Science Research: Vol. 7 No. 5 (2017): October, 2017
- S. N. Hojjati, A. R. Ghatari, M. Alborzi, G. Hassanzadeh, Logit Model of Computer-Based Data-Driven Creative Idea Generation in the Industry and Service Sectors , Engineering, Technology & Applied Science Research: Vol. 7 No. 6 (2017): December, 2017
- M. Faridi Masouleh, M. A. Afshar Kazemi, M. Alborzi, A. Toloie Eshlaghy, A Genetic-Firefly Hybrid Algorithm to Find the Best Data Location in a Data Cube , Engineering, Technology & Applied Science Research: Vol. 6 No. 5 (2016): October, 2016
- A. Bolhari, R. Radfar, M. Alborzi, A. Poorebrahimi, M. Dehghani, Perceived Possibility of Disclosure and Ethical Decision Making in an Information Technology Context , Engineering, Technology & Applied Science Research: Vol. 7 No. 2 (2017): April, 2017