Authors / CoAuthors
Li, J. | Alvarez de Glasby, B. | Siwabessy, J. | Tran, M. | Huang, Z.
Abstract
Spatial distribution of sponge species richness and its relationship with environmental variables are important for the informed monitoring of ecosystem health and marine environmental management and conservation within the Oceanic Shoals Commonwealth Marine Reserve, in the Timor Sea region, northern Australia. However, the spatially continuous data of sponge species richness is not readily available, and the relationship is largely unknown. In this study, we modelled sponge species richness data of 77 samples using random forest (RF) and generalised linear model (glm) and their hybrid methods with geostatistical techniques (i.e. ordinary kriging (OK) and inverse distance weighting (IDW)) based on seabed biophysical variables. These methods are RF, RFOK, RFIDW, glm, glmok and glmidw that is a new hybrid method. We also examined effects of model averaging using four averaged methods (RFOKRFIDW, RFRFOKRFIDW, glmokglmidw and glmglmokglmidw) and the effects of various predictor sets on the accuracy of predictive models. Four feature selection methods, 1) averaged variable importance (AVI), 2) Boruta, 3) knowledge informed AVI (KIAVI) and 4) recursive feature selection (rfe), were used for RF; and four variable selection methods: 1) stepAIC, 2) dropterm, 3) anova and 4) RF, were employed to select glm predictive models. Predictive models were validated based on 10-fold cross validation. Finally the spatial distribution of sponge richness was predicted using the most accurate model and examined. The main findings are 1) the initial input predictors affect the status of important and unimportant variables; 2) AVI is not always reliable and KIAVI is recommended for selecting RF predictive model, 3) using Boruta can improve the accuracy in comparison with the full model, but it may lead to sub-optimal models; and features selected using rfe are not optimal and can be even misleading; 4) the accuracy of glm predictive model did not align with AIC, deviance explained (%) and deviance explained adjusted (%), suggesting that conventional model selection approaches for glm is unable to identify reliable predictive models; 5) joint application of RF and AIC is a useful model selection approach for developing glm predictive models; 6) the goodness of fit should not be used to assess glm predictive models; 7) the hybrid methods have significantly improved the predictive accuracy for both RF and glm; and the hybrid methods of RF and geostatistical methods are considerably more accurate and able to effectively model count data; and 8) the relationships of sponge species richness with the predictors are non-linear, and high sponge species richness is usually associated with hard seabed features. This study further confirms that: 1) the initial input predictors affect the model selection for RF; 2) the inclusion of highly correlated predictors could improve predictive accuracy, providing important guideline for pre-selecting predictors for RF; and 3) the effects of model averaging are method dependent or even data dependent. This study also provides important information for future monitoring design, particularly on the areas where the management and conservation of sponge gardens should be focused.
Product Type
nonGeographicDataset
eCat Id
89938
Contact for the resource
Custodian
Point of contact
Cnr Jerrabomberra Ave and Hindmarsh Dr GPO Box 378
Canberra
ACT
2601
Australia
Keywords
-
- Educational Product
- Australian and New Zealand Standard Research Classification (ANZSRC)
-
- Earth Sciences
-
- Published_Internal
Publication Date
2016-01-01T00:00:00
Creation Date
Security Constraints
Legal Constraints
Status
Purpose
Maintenance Information
unknown
Topic Category
geoscientificInformation
Series Information
Lineage
Unknown
Parent Information
Extents
Reference System
Spatial Resolution
Service Information
Associations
Downloads and Links
Source Information
Source data not available.