Outlier Detection via Online Oversampling in High Dimensional space

Authors

  • Menon KM PG Scholar, Hindustan Institute of Technology
  • Sakthi G Assistant Professor, Hindustan Institute of Technology

Keywords:

Anomaly detection, online updating, oversampling, principal component analysis

Abstract

Anomaly detection is an important topic in data mining. Many applications such as intrusion or credit card fraud detection require efficient method to identify deviated data instances, mostly anomaly detection methods are typically implemented in batch mode, and thus cannot be easily extended to large-scale problems without sacrificing computation and memory requirements. This paper proposes an online oversampling principal component analysis (osPCA) algorithm, and aim to detecting the presence of outliers from a large amount of data via an online updating technique. In prior principal component analysis (PCA)-based approaches, we store the entire data matrix or covariance matrix, and in proposed method there is no need to store the entire data matrix or covariance matrix thus this approach is especially of used in online or large-scale problems. Oversampling the target instance and extracting the principal direction of the data, the proposed osPCA allows determining the Anomaly or outlier of the target instance according to the variation of the resulting dominant eigenvector. In the proposed method osPCA need not perform Eigen analysis, so the proposed method is applicable for online applications Compared with the well-known power method for PCA and other popular anomaly detection algorithms, o ur experimental results verify proposed method is feasible in terms of both efficiency and accuracy.

References

Yuh-Jye Lee, Yi-Ren Yeh, and Yu-Chiang Frank Wang, “Anomaly Detection Via Online Oversampling Principle Component Analysis”,vol.25,no 7.2013

V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A Survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 15:1-15:58, 2009.

D.M. Hawkins, Identification of Outliers. Chapman and Hall, 1980.

M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander, “LOF: Identifying Density-Based Local Outliers,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 2000

H.-P. Kriegel, M. Schubert, and A. Zimek, “Angle-Based Outlier Detection in High-Dimensional Data,” Proc. 14th ACM SIGKDD Int’l Conf. Knowledge Discovery and data Mining, 2008.

X. Song, M. Wu, and C.J., and S. Ranka, “Conditional Anomaly Detection,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 631-645, May 2007.

W. Wang, X. Guan, and X. Zhang, “A Novel Intrusion Detection Method Based on Principal Component Analysis in Computer Security,” Proc. Int’l Symp. Neural Networks, 2004.

Downloads

Published

2024-02-26

How to Cite

Menon, K. M., & Sakthi, G. (2024). Outlier Detection via Online Oversampling in High Dimensional space. COMPUSOFT: An International Journal of Advanced Computer Technology, 3(01), 487–490. Retrieved from https://ijact.in/index.php/j/article/view/86

Issue

Section

Original Research Article