model selection
Type of resources
Keywords
Publication year
Topics
-
Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features. <b>Citation:</b> Jin Li, Belinda Alvarez, Justy Siwabessy, Maggie Tran, Zhi Huang, Rachel Przeslawski, Lynda Radke, Floyd Howard, Scott Nichol, Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness, <i>Environmental Modelling & Software</i>, Volume 97, 2017, Pages 112-129, https://doi.org/10.1016/j.envsoft.2017.07.016
-
Seabed sediment predictions at regional and national scales in Australia are mainly based on bathymetry-related variables due to the lack of backscatter-derived data. In this study, we applied random forests (RFs), hybrid methods of RF and geostatistics, and generalized boosted regression modelling (GBM), to seabed sand content point data and acoustic multibeam data and their derived variables, to develop an accurate model to predict seabed sand content at a local scale. We also addressed relevant issues with variable selection. It was found that: (1) backscatter-related variables are more important than bathymetry-related variables for sand predictive modelling; (2) the inclusion of highly correlated predictors can improve predictive accuracy; (3) the rank orders of averaged variable importance (AVI) and accuracy contribution change with input predictors for RF and are not necessarily matched; (4) a knowledge-informed AVI method (KIAVI2) is recommended for RF; (5) the hybrid methods and their averaging can significantly improve predictive accuracy and are recommended; (6) relationships between sand and predictors are non-linear; and (7) variable selection methods for GBM need further study. Accuracy-improved predictions of sand content are generated at high resolution, which provide important baseline information for environmental management and conservation. <b>Citation:</b> Li, J.; Siwabessy, J.; Huang, Z.; Nichol, S. Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods. <i>Geosciences</i> 2019, 9, 180. https://doi.org/10.3390/geosciences9040180