Map probabilistic density based subspace clustering for dimensionality reduction of big data analytics
Keywords:
Big data, Bit values, Fusion Tree Data Storage Structure, Maximum a posteriori (MAP) and Sketch Operation, subspace clusteringAbstract
Density based subspace clustering algorithms focus on finding dense clusters of random shape and size. Most of the existing density based subspace clustering algorithms in the literature is less effective and accuracy while taking big dataset as input. In order to overcome such limitations, a MAP Probabilistic Density based Subspace Clustering (MAPPD-SC) Technique is introduced. The MAPPD-SC technique is designed for high dimensional data to improve the clustering accuracy and dimensionality reduction. Initially MAPPD-SC technique designs Map Probabilistic Density Based Subspace Clustering (MPDSC) algorithm with aim of grouping the similar data with higher accuracy and minimum time utilization. During big data clustering, the MAPPD-SC technique applies the maximum a posteriori (MAP) calculation with the goal of clustering more related data together and thereby forming optimal number of clusters with high accuracy. After completing clustering process, the MAPPD-SC technique designs Fusion Tree Data Storage Structure (FTDSS) with objective of storing clustered big data with reduced space complexity. The FTDSS only stores bits values of clustered data in its memory by using fusion tree concepts. This generated bit values of input clustered data takes minimal amount of memory space. From that, proposed MAPPD-SC technique reduces the dimensionality of big data for effective big data analytics. Experimental evaluation of MAPPD-SC technique is carried out on factors such as clustering accuracy, clustering time and false positive rate and space complexity with respect to number of climate data using El Nino Data Set.
References
Fei Yan, Xiao-dong Wang, Zhi-qiang Zeng, Chao-qun Hong, “Adaptive Multi-view Subspace Clustering for Highdimensional Data”, Pattern Recognition Letters, Elsevier, Pages 1-10, 2019
Ning Pang, Jifu Zhang, Chaowei Zhang, and Xiao Qin, “Parallel Hierarchical Subspace Clustering of Categorical Data”, IEEE Transactions on Computers, Volume 68, Issue 4, Pages 542 –555, April 2019
hiwen Yu ; Peinan Luo ; Jane You ; Hau-San Wong ; Hareton Leung ; Si Wu ; Jun Zhang ; Guoqiang Han “Incremental SemiSupervised Clustering Ensemble for High Dimensional Data Clustering”, IEEE Transactions on Knowledge and Data Engineering, Volume 28, Issue 3, Pages 701 – 714, March 2016
Šárka Brodinová, Peter Filzmoser, Thomas Ortner, Christian Breiteneder, Maia Rohm, “Robust and sparse k-means clustering for high-dimensional data”, Advances in Data Analysis and Classification, Springer, Pages 1–28, 2019
Yuhua Qian, Feijiang Li, Jiye Liang, Bing Liu, and Chuangyin Dang, “Space Structure and Clustering of Categorical Data”, IEEE Transactions on Neural Networks and Learning Systems, Volume 27, Issue 10, Pages 2047-2059, October 2016
Can Wang, Xiangjun Dong, Fei Zhou, Longbing Cao and ChiHung Chi, “Coupled Attribute Similarity Learning on Categorical Data”, IEEE Transactions on Neural Networks and Learning Systems, Volume 26, Issue 4, Pages 781-797, April 2015
Yifan Fu, Junbin Gao, David Tien, Zhouchen Lin, and Xia Hong, “Tensor LRR and Sparse Coding-Based Subspace Clustering”, IEEE Transactions on Neural Networks and Learning Systems, Volume 27, Issue 10, Pages 2120 – 2133, October 2016
Xi Peng, Zhiding Yu, Zhang Yi, and Huajin Tang, “Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering”, IEEE Transactions on Cybernetics Volume 47, Issue 4, Pages 1053 – 1066, April 2017
B. JayaLakshmi, M.Shashi, K.B.Madhuri, “A rough set based subspace clustering technique for high dimensional data”, Journal of King Saud University - Computer and Information Sciences, Elsevier, Pages 1-6, 2017
Amardeep Kaur & Amitava Datta, “A novel algorithm for fast and scalable subspace clustering of high-dimensional data”, Journal of Big Data, Springer, Volume 2, Issue 17, Pages 1-24, 2015
Michele Ianni, Elio Masciari,Giuseppe M.Mazzeo, Mario Mezzanzanica, Carlo Zaniolo, “Fast and effective Big Data exploration by clustering”, Future Generation Computer Systems, Elsevier, Volume 102, Pages 84-94, 2019
Safanaz Heidari, Mahmood Alborzi, Reza Radfar, Mohammad Ali Afsharkazemi, Ali Rajabzadeh Ghatari, “Big data clustering with varied density based on MapReduce”, Journal of Big Data, Springer, Volume 6, Issue 7, December 2019
Victor M. Herrera, Taghi M. Khoshgoftaar, Flavio Villanustre, Borko Furht, “Random forest implementation and optimization for Big Data analytics on LexisNexis‟s high performance computing cluster platform”, Springer, Journal of Big Data, Volume 6, Issue 68, Pages 1-36, December 2019
Mohamed Aymen Ben HajKacem, Chiheb-Eddine Ben N‟cir, Nadia Essoussi, “One-pass MapReduce-based clustering method for mixed large scale data”, Journal of Intelligent Information Systems, Springer, Volume 52, Issue 3, Pages 619–636, June 2019
Qian Tao, Chunqin Gu, Zhenyu Wang, Daoning Jiang, “An intelligent clustering algorithm for high-dimensional multiview data in big data applications”, Neurocomputing, Elsevier, Pages July 2019
Yasmine Lamari, Said Chah Slaoui, “Clustering categorical data based on the relational analysis approach and MapReduce”, Journal of Big Data, Springer, Volume 4, Issue 28, Pages December 2017
Myat Cho MonOo, ThandarThein, “An efficient predictive analytics system for high dimensional big data”, Journal of King Saud University - Computer and Information Sciences, Elsevier, September 2019
Jorge Caiado, Nuno Crato, Pilar Poncela, “A fragmentedperiodogram approach for clustering big data time series”, Advances in Data Analysis and Classification, Springer, Pages 1–30, June 2019
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.