A synchronized process based scheduling to improve map/reduce execution strategy

N. Barathi; R. Dinesh Kumar

Authors

Barathi N Department of Computer Science and Engineering, Bharathiyar College of Engineering and Technology, karaikal/Pondicherry University, India
Kumar RD Department of Computer Science and Engineering, Bharathiyar College of Engineering and Technology, karaikal/Pondicherry University, India

Keywords:

MapReduce, hadoop virtual setup

Abstract

MapReduce is a widely used parallel computing framework for large scale data processing. The two major performance metrics in MapReduce are job execution time and cluster throughput. They can be seriously impacted by straggler machines. Speculative execution is a common approach for dealing with the straggler problem by simply backing up those slow running tasks on alternative machines. To improve speculative execution strategies MCP (Maximum Cost Performance) can identify slow task and EWMA (Exponentially Weighted Moving Average) to predict process speed and calculate task completion time. Multiple speculative execution strategies have been proposed, but there is a pitfall: incoming jobs are allocated to nodes present in server and fail to schedule process type allocate to node for processing. To overcome this process we proposed a new scheduling based speculative execution strategy. For scheduling we first calculate number of name node residing in server, minimum threshold of resources allocated to name node. We use minimum to avoid huge interaction among the name node when the competition for resources arises. To choose a proper work node for computing the task, we take both time scheduling and ability of work node to compute the task.

References

J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,”Commun. ACM, vol. 51, pp. 107–113, January

“Apache hadoop, http://hadoop.apache.org/.”

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in Proc. of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, ser. EuroSys ’07, 2007.

K. Avi, K. Yaniv, L. Dor, L. Uri, and L. Anthony, “Kvm : The linux virtual machine monitor,” Proc. of the Linux Symposium, Ottawa, Ontario, 2007, 2007.

R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou, “Scope: easy and efficient parallel processing of massive data sets,” Proc. VLDB Endow., vol. 1, pp. 1265–1276, August 2008.

C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, “Pig latin: a not-so-foreign language for data processing,” in Proc. of the 2008 ACM SIGMOD international conference on Management of data, ser. SIGMOD ’08, 2008.

Y. Yu, M. Isard, D. Fetterly, M. Budiu, U.Erlingsson, P. K. Gunda, and J. Currey,“Dryadlinq: a system for general-purpose distributed dataparallel computing using a highlevel language,” in Proc. of the 8th USENIX conference on Operating systems design and implementation, ser. OSDI’08, 2008.

H. Lin, X. Ma, J. Archuleta, W.-c. Feng, M.Gardner, and Z. Zhang, “Moon: Mapreduce on opportunistic environments,” in Proc. of the 19th ACM International Symposium on High Performance Distributed Computing, ser. HPDC ’10, 2010.

A synchronized process based scheduling to improve map/reduce execution strategy

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Download

Indexing

Information