Optimization of DBSCAN algorithm using MapReduce method on network traffic data
Keywords:
DBSCAN algorithm, MapReduce method, Network traffic dataAbstract
In this paper, a new method has been proposed to eliminate the weaknesses in the previous algorithms. The proposed method for data density clustering is reduced in the mapping programming model. Our analysis result shows that misleading data was presented to prove the function of the density-based clustering algorithm and the weakness of the base method on them has been represented. Then, local clustering was tested by competing methods for standard data clustering and its superiority to these methods was determined. When passing local clustering to distributed clustering, misleading data was again used to prove the quality of clustering. Distributed clustering quality is lower than local clustering, but it is still superior to the base method. The quality of clustering of the proposed method on competing methods was clearly determined by distributed network clustering. Finally, the method of choosing this parameter was described by evaluating the homogeneity and completeness criteria and the effect of the flexible parameter on different types of data.
References
Ester, M., Kriegel, H.P. et al. (1996) A DensityBased Algorithm for Discovering Clusters in Large Spatial Database with Noise. KDD, 226-231.
He YB, Tan HY, Luo WM, et al. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data, Front Comput Sci 2014; 8(1): 83–99, DOI 10.1007/s11704-013-3158-3.
Kim Y, Shim K, Kim MS, et al. DBCURE-MR: an efficient density-based clustering algorithm for large data using MapReduce. Inform Syst 2014; 42: 15–35.
Birant, D. and A. Kut (2007). "ST-DBSCAN: An algorithm for clusteringspatial–temporal data." Data & Knowledge Engineering 60(1): 208-221.
Liu, P., et al .)2007( ."VDBSCAN: varied density based spatial clustering of applications with noise". Service Systems and Service Management, 2007 InternationalConference on, IEEE.
Ting, K. M., et al. (2013). "DEMass: a new density estimator for big data." Knowledge and information systems 35.493-524 :)3(
Esfandani, G., et al. (2012). "GDCLU: a new Grid-Density based CLUstring algorithm". Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on, IEEE.
He, Y., et al. (2011). "Mr-dbscan: an efficient parallel density-basedclustering algorithm using mapreduce". Parallel and Distributed Systems (ICPADS), 2011 IEEE 17thInternational Conference on, IEEE.
Vinh, N. X., et al. (2009). "Information theoretic measures for clusterings comparison": is a correction for chance necessary? Proceedings of the 26th annual international conference on machine learning, ACM.
Rosenberg, A. and J. Hirschberg (2007). "VMeasure: A Conditional Entropy-Based External Cluster Evaluation Measure". EMNLP-CoNLL.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.