feature selection
Type of resources
Keywords
Publication year
Topics
-
Spatial predictive methods are increasingly being used to generate predictions across various disciplines in environmental sciences. Accuracy of the predictions is critical as they form the basis for environmental management and conservation. Therefore, improving the accuracy by selecting an appropriate method and then developing the most accurate predictive model(s) is essential. However, it is challenging to select an appropriate method and find the most accurate predictive model for a given dataset due to many aspects and multiple factors involved in the modeling process. Many previous studies considered only a portion of these aspects and factors, often leading to sub-optimal or even misleading predictive models. This study evaluates a spatial predictive modeling process, and identifies nine major components for spatial predictive modeling. Each of these nine components is then reviewed, and guidelines for selecting and applying relevant components and developing accurate predictive models are provided. Finally, reproducible examples using spm, an R package, are provided to demonstrate how to select and develop predictive models using machine learning, geostatistics, and their hybrid methods according to predictive accuracy for spatial predictive modeling; reproducible examples are also provided to generate and visualize spatial predictions in environmental sciences. <b>Citation:</b> Li, J. A Critical Review of Spatial Predictive Modeling Process in Environmental Sciences with Reproducible Examples in R. Appl. Sci. 2019, 9, 2048. https://doi.org/10.3390/app9102048
-
Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features. <b>Citation:</b> Jin Li, Belinda Alvarez, Justy Siwabessy, Maggie Tran, Zhi Huang, Rachel Przeslawski, Lynda Radke, Floyd Howard, Scott Nichol, Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness, <i>Environmental Modelling & Software</i>, Volume 97, 2017, Pages 112-129, https://doi.org/10.1016/j.envsoft.2017.07.016