machine learning
Type of resources
Keywords
Publication year
Service types
Topics
-
Prediction of true classes of surficial and deep earth materials using multivariate geospatial data is a common challenge for geoscience modellers. Most geological processes leave a footprint that can be explored by geochemical data analysis. These footprints are normally complex statistical and spatial patterns buried deep in the high-dimensional compositional space. This paper proposes a spatial predictive model for classification of surficial and deep earth materials derived from the geochemical composition of surface regolith. The model is based on a combination of geostatistical simulation and machine learning approaches. A random forest predictive model is trained and features are ranked based on their contribution to the predictive model. To generate potential and uncertainty maps, compositional data are simulated at unsampled locations via a chain of transformations (isometric log-ratio transformation followed by flow anamorphosis) and geostatistical simulation. The simulated results are subsequently back-transformed to the original compositional space. The trained predictive model is used to estimate the probability of classes for simulated compositions. The proposed approach is illustrated through two case studies. In the first case study the major crustal blocks of the Australian continent are predicted from the surface regolith geochemistry of the National Geochemical Survey of Australia project. The aim of the second case study is to discover the superficial deposits (peat) from the regional-scale soil geochemical data of the Tellus project. The accuracy of the results in these two case studies confirms the usefulness of the proposed method for geological class prediction and geological process discovery.
-
<div>A national compilation of airborne electromagnetic (AEM) conductivity–depth models from AusAEM (Ley-Cooper et al. 2020) survey line data and other surveys (see reference list in the attachments) has been used to train a conductivity model prediction for the 0-4 m and 30 m depth intervals. Over 460,000 training points/measurements were used in a 5 K-Fold training and validation split. A further 28,626 points/measurements were used to assess the out of sample performance (OOS; i.e. points not used in the model validation). Modelling of the conductivity values (i.e. measurements along the AEM survey lines) was performed using the gradient boosted (GB) tree algorithm. The GB model is a machine learning (ML) ensemble technique used for both regression and classification tasks (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html). Samples along the flight-line were thinned to approximately one sample per 300 m. This avoided the situation where we could have more than one sample per pixel (i.e. features or covariates used in the model prediction have a cell or pixel size of 80 m) that could otherwise lead to over fitting. In addition, out of sample set used label clusters or groups to minimise overfitting. Here we use the median of the models as the conductivity prediction and the upper and lower percentiles (95th and 5th respectively) to measure the model uncertainty. Grids show conductivity (S/m) in log 10 units. The methodology used to generate these conductivity grids are overall similar to that described by Wilford, et al. 2022.</div><div> </div><div>Reported out-of-sample r-squares for the 0-4 m and 3 m depths are 0.76 and 0.74, respectively. The ML approach allows estimation of conductivity into areas where we do not have airborne electromagnetic survey coverage. Hence these model have a national extent. Where we do not have AEM survey coverage the model is finding relationships with the covariates and making informed estimates of conductivity in those areas. Where those relationships are not well understood (i.e. where we see a departure in the feature space characteristics from what the model can ‘see’) the model prediction is likely to be less certain. Differences in the features and their corresponding values ‘seen’ and used in the model versus the full feature space covering the entire continent are captured in the covariate shift map. High values in the shift model can indicate higher potential uncertainty or unreliability of the model prediction. Users therefore need to be mindful when interpreting this dataset, of the uncertainties shown by the 5th-95th percentiles, and high values in the covariate shift map.</div><div> </div><div>Datasets in this data package include:</div><div> </div><div>1. 0_4m_conductivity_prediction_median.tif</div><div>2. 0_4m_conductivity_lower_percentile_5th.tif</div><div>3. 0_4m_conductivity_upper_percentile_95th.tif</div><div>4. 30m_conductivity_prediction_median.tif</div><div>5.30m_conductivity_lower_percentile_5th.tif</div><div>6. 30m_conductivity_upper_percentile_95th.tif</div><div>7. National_conductivity_model_shift.tif</div><div>8. Full list of referenced AEM survey datasets used to train the model (word document)</div><div>9. Map showing the distribution of training and out-of-sample sites</div><div><br></div><div>All the Geotiffs (1-6) are in log (10) electrical conductivity siemens per metre (S/m).</div><div> </div><div>This work is part of Geoscience Australia’s Exploring for the Future program which provides precompetitive information to inform decision-making by government, community and industry on the sustainable development of Australia's mineral, energy and groundwater resources. By gathering, analysing and interpreting new and existing precompetitive geoscience data and knowledge, we are building a national picture of Australia’s geology and resource potential. This leads to a strong economy, resilient society and sustainable environment for the benefit of all Australians. This includes supporting Australia’s transition to net zero emissions, strong, sustainable resources and agriculture sectors, and economic opportunities and social benefits for Australia’s regional and remote communities. The Exploring for the Future program, which commenced in 2016, is an eight year, $225m investment by the Australian Government.</div><div><br></div><div><br></div><div><strong>Reference:</strong></div><div><br></div><div>Ley-Cooper, A. Y., Brodie, R.C., and Richardson, M. 2020. AusAEM: Australia’s airborne electromagnetic continental-scale acquisition program, Exploration Geophysics, 51:1, 193-202, DOI: 10.1080/08123985.2019.1694393</div><div><br></div><div>Wilford, J., LeyCooper, Y., Basak, S., Czarnota, K. 2022. High resolution conductivity mapping using regional AEM survey and machine learning. Geoscience Australia, Canberra. https://dx.doi.org/10.26186/146380</div>
-
Major oxides provide valuable information about the composition, origin, and properties of rocks and regolith. Analysing major oxides contributes significantly to understanding the nature of geological materials and processes (i.e. physical and chemical weathering) – with potential applications in resource exploration, engineering, environmental assessments, agriculture, and other fields. Traditionally most measurements of oxide concentrations are obtained by laboratory assay, often using X-ray fluorescence, on rock or regolith samples. To expand beyond the point measurements of the geochemical data, we have used a machine learning approach to produce seamless national scale grids for each of the major oxides. This approach builds predictive models by learning relationships between the site measurements of an oxide concentration (sourced from Geoscience Australia’s OZCHEM database and selected sites from state survey databases) and a comprehensive library of covariates (features). These covariates include: terrain derivatives; climate surfaces; geological maps; gamma-ray radiometric, magnetic, and gravity grids; and satellite imagery. This approach is used to derive national predictions for 10 major oxide concentrations at the resolution of the covariates (nominally 80 m). The models include the oxides of silicon (SiO2), aluminium (Al2O3), iron (Fe2O3tot), calcium (CaO), magnesium (MgO), manganese (MnO), potassium (K2O), sodium (Na2O), titanium (TiO2), and phosphorus (P2O5). The grids of oxide concentrations provided include the median of multiple models run as the prediction, and lower and upper (5th and 95th) percentiles as measures of the prediction’s uncertainty. Higher uncertainties correlate with greater spreads of model values. Differences in the features used in the model compared with the full feature space covering the entire continent are captured in the ‘covariate shift’ map. High values in the shift model can indicate higher potential uncertainty or unreliability of the model prediction. Users therefore need to be mindful, when interpreting this dataset, of the uncertainties shown by the 5th-95th percentiles, and high values in the covariate shift map. Details of the modelling approach, model uncertainties and datasets are describe in an attached word document “Model approach uncertainties”. This work is part of Geoscience Australia’s Exploring for the Future program that provides precompetitive information to inform decision-making by government, community and industry on the sustainable development of Australia's mineral, energy and groundwater resources. By gathering, analysing and interpreting new and existing precompetitive geoscience data and knowledge, we are building a national picture of Australia’s geology and resource potential. This leads to a strong economy, resilient society and sustainable environment for the benefit of all Australians. This includes supporting Australia’s transition to net zero emissions, strong, sustainable resources and agriculture sectors, and economic opportunities and social benefits for Australia’s regional and remote communities. The Exploring for the Future program, which commenced in 2016, is an eight year, $225m investment by the Australian Government. These data are published with the permission of the CEO, Geoscience Australia.
-
<div>Disruptions to the global supply chains of critical raw materials (CRM) have the potential to delay or increase the cost of the renewable energy transition. However, for some CRM, the primary drivers of these supply chain disruptions are likely to be issues related to environmental, social, and governance (ESG) rather than geological scarcity. Herein we combine public geospatial data as mappable proxies for key ESG indicators (e.g., conservation, biodiversity, freshwater, energy, waste, land use, human development, health and safety, and governance) and a global dataset of news events to train and validate three models for predicting “conflict” events (e.g., disputes, protests, violence) that can negatively impact CRM supply chains: (1) a knowledge-driven fuzzy logic model that yields an area under the curve (AUC) for the receiver operating characteristics plot of 0.72 for the entire model; (2) a naïve Bayes model that yields an AUC of 0.81 for the test set; and (3) a deep learning model comprising stacked autoencoders and a feed-forward artificial neural network that yields an AUC of 0.91 for the test set. The high AUC of the deep learning model demonstrates that public geospatial data can accurately predict natural resources conflicts, but we show that machine learning results are biased by proxies for population density and likely underestimate the potential for conflict in remote areas. Knowledge-driven methods are the least impacted by population bias and are used to calculate an ESG rating that is then applied to a global dataset of lithium occurrences as a case study. We demonstrate that giant lithium brine deposits (i.e., >10 Mt Li2O) are restricted to regions with higher spatially situated risks relative to a subset of smaller pegmatite-hosted deposits that yield higher ESG ratings (i.e., lower risk). Our results reveal trade-offs between the sources of lithium, resource size, and spatially situated risks. We suggest that this type of geospatial ESG rating is broadly applicable to other CRM and that mapping spatially situated risks prior to mineral exploration has the potential to improve ESG outcomes and government policies that strengthen supply chains. <b>Citation:</b> Haynes M, Chudasama B, Goodenough K, Eerola T, Golev A, Zhang SE, Park J and Lèbre E (2024) Geospatial Data and Deep Learning Expose ESG Risks to Critical Raw Materials Supply: The Case of Lithium. <i>Earth Sci. Syst. Soc. </i>4:10109. doi: 10.3389/esss.2024.10109
-
The Proterozoic succession in the NDI Carrara 1 drill hole, Northern Territory, consists predominantly of tight shales, siltstones, and calcareous clastic rocks. As part of Geoscience Australia’s Exploring for the Future program, this study aims to derive porosity, permeability and gas content from both laboratory testing and well log interpretation from machine learning approaches, to improve the Proterozoic shale gas reservoir characterisation. The Proterozoic Lawn Hill Formation was divided into four chemostratigraphic packages. The middle two packages were further divided into seven internal units according to principal component analysis and self-organising map clustering on well logs and inorganic geochemical properties. Artificial neural networks were then applied to interpret the mineral compositions, porosity and permeability from well logs, density and neutron-density crossplot interpretations. Gas content was estimated from the interpreted porosity, gas saturation, total organic carbon and clay contents. Petrophysical interpretation results are summarised for all chemostratigraphic packages and units. Package 2 (1116–1430.1 m) has the highest potential among the four chemostratigraphic packages. P2U1 (1116–1271 m) and P2U3 (1335.5–1430.1 m) units have the most favourable petrophysical properties for organic-rich shales with the average total gas contents of 1.25 cm3/g and 1.30 cm3/g, geometric mean permeability of 4.79 µD and 17.56 µD, and net shale thickness of 54.4 m and 85.3 m, respectively. P3U4 unit (687.9–697.9 m) has high gas content and permeability, with the net shale thickness of 29.1 m. Besides the organic-rich shales, the tight non-organic-rich siltstone and shale reservoirs in package 1 (below 1430.1 m) have average gas saturation of 14% and geometric mean permeability of 1.31 µD, respectively. Published in The APPEA Journal 2023. <b>Citation:</b> Wang Liuqi, Bailey Adam H. E., Grosjean Emmanuelle, Carson Chris, Carr Lidena K., Butcher Grace, Boreham Christopher J., Dewhurst Dave, Esteban Lionel, Southby Chris, Henson Paul A. (2023) Petrophysical interpretation and reservoir characterisation on Proterozoic shales in National Drilling Initiative Carrara 1, Northern Territory. <i>The APPEA Journal</i><b> 63</b>, 230-246. https://doi.org/10.1071/AJ22049
-
In the first half of 2019, a collaborative mineral potential mapping project was undertaken between the Geological Survey of New South Wales (GSNSW) and Kenex to examine the mineral potential in the eastern Lachlan Orogen (ELO; Ford et al., 2019b). This project was part of a broader state-wide study that utilised the high quality publicly available geoscience data provided by the GSNSW to generate data-driven mineral potential maps using the weights of evidence (WofE) technique for different mineral systems in key metallogenic districts within NSW (Ford et al., 2019a). The aim of this collaborative project was to deliver a product that could be used to provide justifiable land use planning advice to key government stakeholders, as well as to highlight the exploration potential for key mineral systems at a regional scale. One key mineral system that was included in the 2019 ELO study was the porphyry Cu-Au mineral system, which was constrained to the Macquarie Arc. The results of the WofE mineral potential mapping for this porphyry model were broadly successful in terms of predicting the location of both the training data used in the WofE model, as well as a separate set of validation porphyry Cu-Au occurrences. However, the model failed to predict the location of one of the training points, Kaiser, in the prospective area. This failure to predict Kaiser led to a re-evaluation of the data using a variety of different machine learning techniques, in particular random forests (RF; Ford, 2020) and neural networks (NN). No additional or updated data was incorporated, and the maps used in the machine learning were the same maps made as part of the initial WofE study in 2019. The results show that the use of input maps that have been pre-classified to determine optimal thresholds outperform input maps that have had no favourability criteria applied when typical benchmarks for exploration targeting are considered. In addition, the NN analysis shows strong evidence of overfitting to the training data when a large number of input maps are used. A moderate degree of success for targeting under cover was achieved when only geophysical maps were included in the models. Abstract presented at the 8th Mines & Wines Conference 2022 (https://www.aig.org.au/events/8th-mines-wines-conference-2022/)
-
<div>With a higher demand for lithium (Li), a better understanding of its concentration and spatial distribution is important to delineate potential anomalous areas. This study uses a digital soil mapping framework to combine data from recent geochemical surveys and environmental covariates to predict and map Li content across the 7.6 million km2 area of Australia. Soil samples were collected by the National Geochemical Survey of Australia at a total of 1315 sites, with both top (0–10 cm depth) and bottom (on average 60–80 cm depth) catchment outlet sediments sampled. We developed 50 bootstrap models using a Cubist regression tree algorithm for both depths. The spatial prediction models were validated on an independent Northern Australia Geochemical Survey dataset, showing a good prediction with an RMSE of 3.82 mg kg-1 for the top depth. The model for the bottom depth has yet to be validated. The variables of importance for the models indicated that the first three Landsat bands and gamma radiometric dose have a strong impact on Li prediction. The bootstrapped models were then used to generate digital soil Li prediction maps for both depths, which could select and delineate areas with anomalously high Li concentrations in the regolith. The map shows high Li concentration around existing mines and other potentially anomalous Li areas. The same mapping principles can potentially be applied to other elements. </div> <b>Citation:</b> Ng, W., Minasny, B., McBratney, A., de Caritat, P., and Wilford, J.: Digital soil mapping of lithium in Australia, <i>Earth Syst. Sci. Data</i>, 15, 2465–2482, https://doi.org/10.5194/essd-15-2465-2023, <b>2023</b>.
-
This web service contains map layers and coverages for machine learning models, using raster datasets which include radiometric grid infill, cover depths and conductivity. All grids have been converted to cloud-optimised GeoTIFF (COG) format for use and delivery from an cloud-based object store (AWS s3). For potassium (K), thorium (Th) and uranium (U) radiometric infill grids, an equalised histogram was applied to each grid. The radiometric ternary image has no style applied, with from transparency for no-data values. A tile service (WMTS) is also integrated into the WMS to provide a high-performing service for integration into web maps and online mapping portals.
-
The geosciences are a data-rich domain where Earth materials and processes are analysed from local to global scales. However, often we only have discrete measurements at specific locations, and a limited understanding of how these features vary across the landscape. Earth system processes are inherently complex, and trans-disciplinary science will likely become increasingly important in finding solutions to future challenges associated with the environment, mineral/petroleum resources and food security. Machine learning is an important approach to synthesise the increasing complexity and sheer volume of Earth science data, and is now widely used in prediction across many scientific disciplines. In this context, we have built a machine learning pipeline, called Uncover-ML, for both supervised and unsupervised learning, prediction and classification. The Uncover-ML pipeline was developed from a partnership between CSIRO and Geoscience Australia, and is largely built around the Python scikit-learn machine learning libraries. In this paper, we briefly describe the architecture and components of Uncover-ML for feature extraction, data scaling, sample selection, predictive mapping, estimating model performance, model optimisation and estimating model uncertainties. Links to download the source code and information on how to implement the algorithms are also provided. <b>Citation:</b> Wilford, J., Basak, S., Hassan, R., Moushall, B., McCalman, L., Steinberg, D. and Zhang, F, 2020. Uncover-ML: a machine learning pipeline for geoscience data analysis. In: Czarnota, K., Roach, I., Abbott, S., Haynes, M., Kositcin, N., Ray, A. and Slatter, E. (eds.) Exploring for the Future: Extended Abstracts, Geoscience Australia, Canberra, 1–4.
-
Improvements in discovery and management of minerals, energy and groundwater resources are spurred along by advancements in surface and subsurface imaging of the Earth. Over the last half decade Australia has led the world in the collection of regionally extensive airborne electromagnetic (AEM) data coverage, which provides new constraints on subsurface conductivity structure. Inferring geology and hydrology from conductivity is non-trivial as the conductivity response of earth materials is non-unique, but careful calibration and interpretation does provide significant insights into the subsurface. To date utility of this new data is limited by its spatial extent. The AusAEM survey provides conductivity constraints every 12.5 m along flight lines with no constraints across vast areas between flight lines spaced 20 km apart. Here we provide a means to infer the conductivity between flight lines as an interim measure before infill surveys can be undertaken. We use a gradient boosted tree machine learning algorithm to discover relationships between AEM conductivity models across northern Australia and other national data coverages for three depth ranges: 0–0.5 m, 9–11 m and 22–27 m. The predictive power of our models decreases with depth but they are nevertheless consistent with our knowledge of geological, landscape evolution and climatic processes and an improvement on standard interpolation methods such as kriging. Our models provide a novel complementary methodology to gridding/interpolating from AEM conductivity alone for use by the mining, energy and natural resource management sectors. <b>Citation: </b>Wilford J., Ley-Cooper Y., Basak S., & Czarnota K., 2022. High resolution conductivity mapping using regional AEM survey and machine learning. In: Czarnota, K. (ed.) Exploring for the Future: Extended Abstracts, Geoscience Australia, Canberra, https://dx.doi.org/10.26186/146380.