Clustering Techniques for Streaming Dynamic Nature of Data
Keywords:
Streaming data, Data Stream mining, Dynamic data, ClusteringAbstract
Nowadays many applications are generating streaming data for an example real-time surveillance, internet traffic, sensor data, health monitoring systems, communication networks, online transactions in the financial market and so on. Data Streams are temporally ordered, fast changing, massive, and potentially infinite sequence of data. Data Stream mining is a very challenging problem. This is due to the fact that data streams are of tremendous volume and flows at very high speed which makes it impossible to store and scan streaming data multiple time. Concept evolution in streaming data further magnifies the challenge of working with streaming data.
Clustering is a data stream mining task which is very useful to gain insight of data and data characteristics. Clustering is also used as a pre-processing step in over all mining process for an example clustering is used for outlier detection and for building classification model. In this paper we will focus on the challenges and necessary features of clustering techniques for streaming dynamic nature of data. Streaming data behaviour keeps on changing over time. Clustering model developed on partial data stream must be updated with new incoming data.
References
J. Han and M. Kamber, Data Mining: Concepts and Techniques, J. Kacprzyk and L. C. Jain, Eds. Morgan Kaufmann, 2006, vol. 54, no. Second Edition.
Yogita and D. Toshniwal, “A framework for outlier detection in evolving data streams by weighting attributes in clustering,” in Proceedings of the 2nd International Conference on Communication Computing and Security, India, 2012.
L. callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “Streaming-Data Algorithms for High-Quality Clustering,” in Proceedings of IEEE International Conference on Data Engineering, 2001.
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A framework for clustering evolving data streams,” in Proceedings of the 29th international conference on Very large data bases - Volume 29, ser. VLDB ’03. VLDB Endowment, 2003, pp. 81–92.
T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: an efficient data clustering method for very large databases,” in Proceedings of the 1996 ACM SIGMOD international conference on Management of data, ser. SIGMOD ’96, New York, NY, USA, 1996, pp. 103–114.
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A framework for projected clustering of high dimensional data streams,” in Proceedings of the Thirtieth international conference on Very large data bases – Volume 30, ser. VLDB ’04. VLDB Endowment, 2004, pp. 852–863.
F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-based clustering over an evolving data stream with noise,” in SIAM International Conference on Data Mining, 2006.
L. Li-xiong, H. Hai, G. Yun-fei, and C. Fu-cai, “rdenstream, a clustering algorithm over an evolving data stream,” in International Conference on Information Engineering and Computer Science, 2009, 2009, pp. 1–4.
Y. Chen and L. Tu, “Density-based clustering for real-time stream data,” in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’07, New York, NY, USA, 2007, pp. 133–142.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2015 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.