Advantages of Giraph over Hadoop in Graph Processing

C. L. Vidal-Silva; E. Madariaga; T. Pham; J. M. Rubio; L. A. Urzua; L. Carter; F. Johnson

doi:10.48084/etasr.2715

Authors

C. L. Vidal-Silva Faculty of Economics and Administration, Catholic University of the North, Chile
E. Madariaga Faculty of Engineering, Bernardo O’Higgins University, Chile
T. Pham Information Technology Research Center, Faculty of Economics and Business, University of Talca, Chile
J. M. Rubio Technological University of Chile INACAP, Santiago, Chile
L. A. Urzua School of Kinesiology, Faculty of Health, Santo Tomas University, Chile
L. Carter Industrial Civil Engineering Department, Autonomous University of Chile, Chile
F. Johnson Computing and Information Department, University of Playa Ancha, Chile

Volume: 9 | Issue: 3 | Pages: 4112-4115 | June 2019 | https://doi.org/10.48084/etasr.2715

Corresponding author: C. L. Vidal-Silva

Abstract

This article presents a comparison of the computing performance of the MapReduce tool Hadoop and Giraph on large-scale graphs. The main ideas of MapReduce and bulk synchronous parallel (BSP) are reviewed as big data computing approaches to highlight their applicability in large-scale graph processing. This paper reviews the execution performance of Hadoop and Giraph on the PageRank algorithm to classify web pages according to their relevance, and on a few other algorithms to find the minimum spanning tree in a graph with the primary goal of finding the most efficient computing approach to work on large-scale graphs. Experimental results show that the use of Giraph for processing large-size graphs reduces the execution time by 25% in comparison with the results obtained using the Hadoop for the same experiments. Giraph represents the optimal option thanks to its in-memory computing approach that avoids secondary memory direct interaction.

Keywords:

Giraph, Hadoop, Graph, Big Data, Big Graph

References

I. Yaqoob, I. A. T. Hashem, A. Gani, S. Mokhtar, E. Ahmed, N. B. Anuar, A. V. Vasilakos, “Big data”, International Journal of Information Management, Vol. 36, No. 6, pp. 1231–1247, 2016 DOI: https://doi.org/10.1016/j.ijinfomgt.2016.07.009

A. K. Wahi, V. Ahuja, “The internet of things-new value streams for customers”, International Journal of Information Technology and Management, Vol. 16, No. 4, pp. 360–375, 2017 DOI: https://doi.org/10.1504/IJITM.2017.086863

P. Zikopoulos, C. Eaton, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, McGrawHill Osborne Media, 2011

T. White, Hadoop: The Definitive Guide, O’Reilly Media, Inc., 2015

J. Dean, S. Ghemawat, “Mapreduce: Simplified data processing on large clusters”, 6th Conference on Symposium on Operating Systems Design & Implementation, San Francisco, December 6-8, 2004

J. Dean, S. Ghemawat, “Mapreduce: Simplified data processing on large clusters”, Communications of the ACM, Vol. 51, No. 1, pp. 107–113, 2008 DOI: https://doi.org/10.1145/1327452.1327492

M. Aurelio, B. Fagnani, G. Lotz, Dynamic Graph Computations using Parallel Distributed Computing Solutions, Science without Borders, 3-Months Project Report, Queen Mary, University of London, 2013

R. Shaposhnik, C. Martella, D. Logothetis, Practical Graph Analytics with Apache Giraph, Apress, 2015 DOI: https://doi.org/10.1007/978-1-4842-1251-6

S. Sakr, F. M. Orakzai, I. Abdelaziz, Z. Khayyat, Large-Scale Graph Processing Using Apache Giraph, Springer, 2017 DOI: https://doi.org/10.1007/978-3-319-47431-1

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, “Pregel: A system for large-scale graph processing”, 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, USA, June 6-11, 2010 DOI: https://doi.org/10.1145/1807167.1807184

S. Valenzuela, C. Vidal, “Evaluacion de hadoop y giraph para el procesamiento de grafos”, Jornadas Chilenas de la Computacin, XXVII Encuentro Chileno de Computacin. Santiago, Chile, 2015 (in Spanish)

S. Karanth, Mastering Hadoop. Packt Publishing, 2015

S. Ghemawat, H. Gobioff, S. T. Leung, “The Google file system”, Nineteenth ACM Symposium on Operating Systems Principles, Bolton Landing, USA, October 19-22, 2003 DOI: https://doi.org/10.1145/945445.945450

S. Brin, L. Page, “The anatomy of a large-scale hypertextual web search engine”, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pp. 107–117, 1998. DOI: https://doi.org/10.1016/S0169-7552(98)00110-X

L. G. Valiant, “A bridging model for parallel computation”, Communications of the ACM, Vol. 33, No. 8, pp. 103–111, 1990 DOI: https://doi.org/10.1145/79173.79181

M. Han, K. Daudjee, K. Ammar, M. T. Ozsu, X. Wang, T. Jin, “An experimental comparison of pregel-like graph processing systems”, Proceedings of the VLDB Endowment, Vol. 7, No. 12, pp. 1047–1058, 2014 DOI: https://doi.org/10.14778/2732977.2732980

S. Salihoglu, J. Shin, V. Khanna, B. Q. Truong, J. Widom, “Graft: A debugging tool for apache giraph”, 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, May 31-June 4, 2015 DOI: https://doi.org/10.1145/2723372.2735353

M. Han, K. Daudjee, “Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems”, Proceedings of the VLDB Endowment, Vol. 8, No. 9, pp. 950–961, 2015 DOI: https://doi.org/10.14778/2777598.2777604

J. Lin, C. Dyer, Data-Intensive Text Processing with MapReduce, Morgan and Claypool, 2010 DOI: https://doi.org/10.3115/1620950.1620951

M. Held, R. M. Karp, “The traveling-salesman problem and minimum spanning trees: Part ii”, Mathematical Programming, Vol. 1, No. 1, pp. 6–25, 1971 DOI: https://doi.org/10.1007/BF01584070

R. C. Prim, “Shortest connection networks and some generalizations”, Bell System Technical Journal, Vol. 36, No. 6, pp. 1389–1401, 1957 DOI: https://doi.org/10.1002/j.1538-7305.1957.tb01515.x

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, I. Stoica, “Apache spark: A unified engine for big data processing”, Communications of the ACM, Vol. 59, No. 11, pp. 56–65, 2016 DOI: https://doi.org/10.1145/2934664

E. Friedman, K. Tzoumas, Introduction to Apache Flink: Stream Processing for Real Time and Beyond, O’Reilly Media, 2016

S. Papp, The Definitive Guide to Apache Flink: Next Generation Data Processing, Apress, 2016