Logistic regression techniques based on different sample sizes in landslide susceptibility assessment: which performs better?

Authors

  • Gao H School of Mathematical Sciences, Universiti Sains Malaysia, 11800, USM, Penang
  • Fam PS School of Mathematical Sciences, Universiti Sains Malaysia, 11800, USM, Penang
  • Tay LT School of Electrical and Electronic Engineering, Universiti Sains Malaysia, 14300, Nibong Tebal, Penang
  • Low HC Research and Innovation Unit, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia

Keywords:

landslide susceptibility, penalty term, machine learning, ridge logistic regression, lasso logistic regression, receiver operating characteristic

Abstract

The main objective of this paper is to compare the landslide spatial prediction performance of logistic regression (LR) with different regularization methods, namely, Lasso LR and Ridge LR. Three types of training datasets with different sample sizes of 40,000, 4,000 and 400 are used to train and validate the models. ROC curves are used to evaluate the models’ performance. The results show that Lasso and Ridge LR models have comparative performance compared to the ordinary LR models based on the AUC values, which indicates that there are no redundant input features to remove from the models for the available data in this work to some degree. The penalty terms play a negligible role in the LR models trained with the three types of datasets. Lasso LR has a better performance than ridge LR, which may be due to that the L1 penalized parameter which can be exactly equal to zero. According to the AUC values, the group of models trained and validated using the dataset of 20,000 samples outperform the other two groups.

References

J. M. Pereira, M. Basto, A. F. Silva, The logistic lasso and ridge regression in predicting corporate failure, Procedia Economics and Finance, Vol. 39, pp. 634-641, 2016.

B. Pradhan, S. Lee, Delineation of landslide hazard areas on Penang Island, Malaysia, by using frequency ratio, logistic regression, and

artificial neural network models, Environmental Earth Sciences, Vol. 60, No. 5, pp. 1037-1054, 2010.

B. Pradhan, Landslide susceptibility mapping of acatchment area using frequency ratio, fuzzy logic and multivariate logistic regression

approaches, Journal of the Indian Society of Remote Sensing, Vol. 38, No. 2, pp. 301-320, 2010.

B. Pradhan, A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS, Computers & Geosciences, Vol. 5, pp. 350-365, 2013.

H. R. Pourghasemi, B. Pradhan, C. Gokceoglu, Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran, Natural Hazards, Vol. 63, No. 2, pp. 965-996, 2012.

X. Yao, L. Tham, F. Dai, Landslide susceptibility mapping based on support vector machine: a case study on natural slopes of Hong Kong, China, Geomorphology, Vol. 101, No. 4, pp. 572-582, 2008.

P. Tsangaratos, I. Ilia, Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece, Landslides, Vol. 13, No. 2, pp. 305-320, 2016.

Y. K. Yeon, J. G. Han, K. H. Ryu, Landslide susceptibility mapping in Injae, Korea, using a decision tree, Engineering Geology, Vol. 116, No. 3-4, pp. 274-283, 2010.

H. Gao, P. S. Fam, L. T. Tay, H. C. Low, An overview and comparison on recent landslide susceptibility mapping methods, Disaster Advances, Vol. 12, No. 12, pp. 46-64, 2019.

L. Ayalew, H. Yamagishi, The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko

Mountains, Central Japan, Geomorphology, Vol. 65, No. 12, pp. 15-31, 2005.

S. Lee, Application of logistic regression model and its validation for landslide susceptibility mapping using GIS and remote sensing data, International Journal of Remote Sensing, Vol. 26, No. 7, pp. 1477-1491, 2005.

S. Lee, B. Pradhan, Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models, Landslides, Vol. 4, No. 1, pp. 33-41, 2007.

G. C. Ohlmacher, J. C. Davis, Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA, Engineering Geology, Vol. 69, No. 3-4, pp. 331-343, 2003.

D. R. Cox, The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 20, No.

, pp. 215-232, 1958.

M. L. Süzen, V. Doyuran, A comparison of the GIS based landslide susceptibility assessment methods: multivariate versus bivariate,

Environmental Geology, Vol. 45, No. 5, pp. 665-679, 2004.

Z. H. Zhou, Machine Learning, Tinghua University Press, 2016 (Chinese).

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), Vol. 58, No. 1, pp. 267-288, 1996.

A. E. Hoerl, R. W. Kennard, Ridge Regression: Applications to Nonorthogonal Problems, Technometrics, Vol. 12, No. 1, pp. 69-82,

T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters, Vol. 27, No. 8, pp. 861-874, 2006.

Downloads

Published

2024-02-26

How to Cite

Gao, H., Fam, P. S., Tay, L. T., & Low, H. C. (2024). Logistic regression techniques based on different sample sizes in landslide susceptibility assessment: which performs better?. COMPUSOFT: An International Journal of Advanced Computer Technology, 9(04), 3624–3628. Retrieved from https://ijact.in/index.php/j/article/view/562

Issue

Section

Review Article