machine learning - ML
Type of resources
Keywords
Publication year
Topics
-
The accuracy of spatially continuous environmental data, usually generated from point samples using spatial prediction methods, is crucial for evidence-informed environmental management and conservation. Improving the accuracy by identifying the most accurate methods is essential, but also challenging since the accuracy is often data specific and affected by multiple factors. Recently developed hybrid methods of machine learning methods and geostatistics have shown their advantages in spatial predictive modelling in environmental sciences and significantly improved predictive accuracy. An R package, ‘spm: Spatial Predictive Modelling’, has been developed to introduce these methods and has been recently released for R users. It not only introduces the hybrid methods for improving predictive accuracy, but can also be used to improve modelling efficiency. This presentation will briefly introduce the developmental history of novel hybrid geostatistical and machine learning methods in spm. It will introduce spm, by covering: 1) spatial predictive methods, 2) new hybrid methods of geostatistical and machine learning methods, 3) assessment of predictive accuracy, 4) applications of spatial predictive models, and 5) relevant functions in spm. It will then demonstrate how to apply some functions in spm to relevant datasets and to show the resultant improvement in predictive accuracy and modelling efficiency. Although in this presentation, spm is applied to data in environmental sciences, it can be applied to data in other relevant disciplines. Presentation at the 2018 useR! conference
-
Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features. <b>Citation:</b> Jin Li, Belinda Alvarez, Justy Siwabessy, Maggie Tran, Zhi Huang, Rachel Przeslawski, Lynda Radke, Floyd Howard, Scott Nichol, Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness, <i>Environmental Modelling & Software</i>, Volume 97, 2017, Pages 112-129, https://doi.org/10.1016/j.envsoft.2017.07.016