Clustering Techniques for Streaming Dynamic Nature of Data

Swapna Vanguru; Anusha Merugu; Y.Geetha Reddy

Authors

Vanguru S Assistant Professor, Sri Venkateswara Engineering College, Suryapet
Merugu A Dept. of Computer science and Engineering, Sri Venkateswara Engineering College, Suryapet
Reddy YG Dept. of Computer science and Engineering, Sri Venkateswara Engineering College, Suryapet

Keywords:

Streaming data, Data Stream mining, Dynamic data, Clustering

Abstract

Nowadays many applications are generating streaming data for an example real-time surveillance, internet traffic, sensor data, health monitoring systems, communication networks, online transactions in the financial market and so on. Data Streams are temporally ordered, fast changing, massive, and potentially infinite sequence of data. Data Stream mining is a very challenging problem. This is due to the fact that data streams are of tremendous volume and flows at very high speed which makes it impossible to store and scan streaming data multiple time. Concept evolution in streaming data further magnifies the challenge of working with streaming data.

Clustering is a data stream mining task which is very useful to gain insight of data and data characteristics. Clustering is also used as a pre-processing step in over all mining process for an example clustering is used for outlier detection and for building classification model. In this paper we will focus on the challenges and necessary features of clustering techniques for streaming dynamic nature of data. Streaming data behaviour keeps on changing over time. Clustering model developed on partial data stream must be updated with new incoming data.

References

J. Han and M. Kamber, Data Mining: Concepts and Techniques, J. Kacprzyk and L. C. Jain, Eds. Morgan Kaufmann, 2006, vol. 54, no. Second Edition.

Yogita and D. Toshniwal, “A framework for outlier detection in evolving data streams by weighting attributes in clustering,” in Proceedings of the 2nd International Conference on Communication Computing and Security, India, 2012.

L. callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “Streaming-Data Algorithms for High-Quality Clustering,” in Proceedings of IEEE International Conference on Data Engineering, 2001.

C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A framework for clustering evolving data streams,” in Proceedings of the 29th international conference on Very large data bases - Volume 29, ser. VLDB ’03. VLDB Endowment, 2003, pp. 81–92.

T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: an efficient data clustering method for very large databases,” in Proceedings of the 1996 ACM SIGMOD international conference on Management of data, ser. SIGMOD ’96, New York, NY, USA, 1996, pp. 103–114.

C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A framework for projected clustering of high dimensional data streams,” in Proceedings of the Thirtieth international conference on Very large data bases – Volume 30, ser. VLDB ’04. VLDB Endowment, 2004, pp. 852–863.

F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-based clustering over an evolving data stream with noise,” in SIAM International Conference on Data Mining, 2006.

L. Li-xiong, H. Hai, G. Yun-fei, and C. Fu-cai, “rdenstream, a clustering algorithm over an evolving data stream,” in International Conference on Information Engineering and Computer Science, 2009, 2009, pp. 1–4.

Y. Chen and L. Tu, “Density-based clustering for real-time stream data,” in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’07, New York, NY, USA, 2007, pp. 133–142.

Clustering Techniques for Streaming Dynamic Nature of Data

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Download

Indexing

Information