A synchronized process based scheduling to improve map/reduce execution strategy
Keywords:
MapReduce, hadoop virtual setupAbstract
MapReduce is a widely used parallel computing framework for large scale data processing. The two major performance metrics in MapReduce are job execution time and cluster throughput. They can be seriously impacted by straggler machines. Speculative execution is a common approach for dealing with the straggler problem by simply backing up those slow running tasks on alternative machines. To improve speculative execution strategies MCP (Maximum Cost Performance) can identify slow task and EWMA (Exponentially Weighted Moving Average) to predict process speed and calculate task completion time. Multiple speculative execution strategies have been proposed, but there is a pitfall: incoming jobs are allocated to nodes present in server and fail to schedule process type allocate to node for processing. To overcome this process we proposed a new scheduling based speculative execution strategy. For scheduling we first calculate number of name node residing in server, minimum threshold of resources allocated to name node. We use minimum to avoid huge interaction among the name node when the competition for resources arises. To choose a proper work node for computing the task, we take both time scheduling and ability of work node to compute the task.
References
J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,”Commun. ACM, vol. 51, pp. 107–113, January
“Apache hadoop, http://hadoop.apache.org/.”
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in Proc. of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, ser. EuroSys ’07, 2007.
K. Avi, K. Yaniv, L. Dor, L. Uri, and L. Anthony, “Kvm : The linux virtual machine monitor,” Proc. of the Linux Symposium, Ottawa, Ontario, 2007, 2007.
R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou, “Scope: easy and efficient parallel processing of massive data sets,” Proc. VLDB Endow., vol. 1, pp. 1265–1276, August 2008.
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, “Pig latin: a not-so-foreign language for data processing,” in Proc. of the 2008 ACM SIGMOD international conference on Management of data, ser. SIGMOD ’08, 2008.
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U.Erlingsson, P. K. Gunda, and J. Currey,“Dryadlinq: a system for general-purpose distributed dataparallel computing using a highlevel language,” in Proc. of the 8th USENIX conference on Operating systems design and implementation, ser. OSDI’08, 2008.
H. Lin, X. Ma, J. Archuleta, W.-c. Feng, M.Gardner, and Z. Zhang, “Moon: Mapreduce on opportunistic environments,” in Proc. of the 19th ACM International Symposium on High Performance Distributed Computing, ser. HPDC ’10, 2010.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.