Map probabilistic density based subspace clustering for dimensionality reduction of big data analytics

Chitra K; Maheswari D

Authors

K Chitra Research Scholar, School of Computer Studies, Rathnavel Subramaniam College of Arts and Science, Coimbatore, India
Maheswari D Head and Research Coordinator, School of Computer Studies, Rathnavel Subramaniam College of Arts and Science, Coimbatore, India

Keywords:

Big data, Bit values, Fusion Tree Data Storage Structure, Maximum a posteriori (MAP) and Sketch Operation, subspace clustering

Abstract

Density based subspace clustering algorithms focus on finding dense clusters of random shape and size. Most of the existing density based subspace clustering algorithms in the literature is less effective and accuracy while taking big dataset as input. In order to overcome such limitations, a MAP Probabilistic Density based Subspace Clustering (MAPPD-SC) Technique is introduced. The MAPPD-SC technique is designed for high dimensional data to improve the clustering accuracy and dimensionality reduction. Initially MAPPD-SC technique designs Map Probabilistic Density Based Subspace Clustering (MPDSC) algorithm with aim of grouping the similar data with higher accuracy and minimum time utilization. During big data clustering, the MAPPD-SC technique applies the maximum a posteriori (MAP) calculation with the goal of clustering more related data together and thereby forming optimal number of clusters with high accuracy. After completing clustering process, the MAPPD-SC technique designs Fusion Tree Data Storage Structure (FTDSS) with objective of storing clustered big data with reduced space complexity. The FTDSS only stores bits values of clustered data in its memory by using fusion tree concepts. This generated bit values of input clustered data takes minimal amount of memory space. From that, proposed MAPPD-SC technique reduces the dimensionality of big data for effective big data analytics. Experimental evaluation of MAPPD-SC technique is carried out on factors such as clustering accuracy, clustering time and false positive rate and space complexity with respect to number of climate data using El Nino Data Set.

References

Fei Yan, Xiao-dong Wang, Zhi-qiang Zeng, Chao-qun Hong, “Adaptive Multi-view Subspace Clustering for Highdimensional Data”, Pattern Recognition Letters, Elsevier, Pages 1-10, 2019

Ning Pang, Jifu Zhang, Chaowei Zhang, and Xiao Qin, “Parallel Hierarchical Subspace Clustering of Categorical Data”, IEEE Transactions on Computers, Volume 68, Issue 4, Pages 542 –555, April 2019

hiwen Yu ; Peinan Luo ; Jane You ; Hau-San Wong ; Hareton Leung ; Si Wu ; Jun Zhang ; Guoqiang Han “Incremental SemiSupervised Clustering Ensemble for High Dimensional Data Clustering”, IEEE Transactions on Knowledge and Data Engineering, Volume 28, Issue 3, Pages 701 – 714, March 2016

Šárka Brodinová, Peter Filzmoser, Thomas Ortner, Christian Breiteneder, Maia Rohm, “Robust and sparse k-means clustering for high-dimensional data”, Advances in Data Analysis and Classification, Springer, Pages 1–28, 2019

Yuhua Qian, Feijiang Li, Jiye Liang, Bing Liu, and Chuangyin Dang, “Space Structure and Clustering of Categorical Data”, IEEE Transactions on Neural Networks and Learning Systems, Volume 27, Issue 10, Pages 2047-2059, October 2016

Can Wang, Xiangjun Dong, Fei Zhou, Longbing Cao and ChiHung Chi, “Coupled Attribute Similarity Learning on Categorical Data”, IEEE Transactions on Neural Networks and Learning Systems, Volume 26, Issue 4, Pages 781-797, April 2015

Yifan Fu, Junbin Gao, David Tien, Zhouchen Lin, and Xia Hong, “Tensor LRR and Sparse Coding-Based Subspace Clustering”, IEEE Transactions on Neural Networks and Learning Systems, Volume 27, Issue 10, Pages 2120 – 2133, October 2016

Xi Peng, Zhiding Yu, Zhang Yi, and Huajin Tang, “Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering”, IEEE Transactions on Cybernetics Volume 47, Issue 4, Pages 1053 – 1066, April 2017

B. JayaLakshmi, M.Shashi, K.B.Madhuri, “A rough set based subspace clustering technique for high dimensional data”, Journal of King Saud University - Computer and Information Sciences, Elsevier, Pages 1-6, 2017

Amardeep Kaur & Amitava Datta, “A novel algorithm for fast and scalable subspace clustering of high-dimensional data”, Journal of Big Data, Springer, Volume 2, Issue 17, Pages 1-24, 2015

Michele Ianni, Elio Masciari,Giuseppe M.Mazzeo, Mario Mezzanzanica, Carlo Zaniolo, “Fast and effective Big Data exploration by clustering”, Future Generation Computer Systems, Elsevier, Volume 102, Pages 84-94, 2019

Safanaz Heidari, Mahmood Alborzi, Reza Radfar, Mohammad Ali Afsharkazemi, Ali Rajabzadeh Ghatari, “Big data clustering with varied density based on MapReduce”, Journal of Big Data, Springer, Volume 6, Issue 7, December 2019

Victor M. Herrera, Taghi M. Khoshgoftaar, Flavio Villanustre, Borko Furht, “Random forest implementation and optimization for Big Data analytics on LexisNexis‟s high performance computing cluster platform”, Springer, Journal of Big Data, Volume 6, Issue 68, Pages 1-36, December 2019

Mohamed Aymen Ben HajKacem, Chiheb-Eddine Ben N‟cir, Nadia Essoussi, “One-pass MapReduce-based clustering method for mixed large scale data”, Journal of Intelligent Information Systems, Springer, Volume 52, Issue 3, Pages 619–636, June 2019

Qian Tao, Chunqin Gu, Zhenyu Wang, Daoning Jiang, “An intelligent clustering algorithm for high-dimensional multiview data in big data applications”, Neurocomputing, Elsevier, Pages July 2019

Yasmine Lamari, Said Chah Slaoui, “Clustering categorical data based on the relational analysis approach and MapReduce”, Journal of Big Data, Springer, Volume 4, Issue 28, Pages December 2017

Myat Cho MonOo, ThandarThein, “An efficient predictive analytics system for high dimensional big data”, Journal of King Saud University - Computer and Information Sciences, Elsevier, September 2019

Jorge Caiado, Nuno Crato, Pilar Poncela, “A fragmentedperiodogram approach for clustering big data time series”, Advances in Data Analysis and Classification, Springer, Pages 1–30, June 2019

Map probabilistic density based subspace clustering for dimensionality reduction of big data analytics

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Download

Indexing

Information