Uncover-ML: a machine learning pipeline for geoscience data analysis.
The geosciences are a data-rich domain where Earth materials and processes are analysed from local to global scales. However, often we only have discrete measurements at specific locations, and a limited understanding of how these features vary across the landscape. Earth system processes are inherently complex, and trans-disciplinary science will likely become increasingly important in finding solutions to future challenges associated with the environment, mineral/petroleum resources and food security. Machine learning is an important approach to synthesise the increasing complexity and sheer volume of Earth science data, and is now widely used in prediction across many scientific disciplines. In this context, we have built a machine learning pipeline, called Uncover-ML, for both supervised and unsupervised learning, prediction and classification. The Uncover-ML pipeline was developed from a partnership between CSIRO and Geoscience Australia, and is largely built around the Python scikit-learn machine learning libraries. In this paper, we briefly describe the architecture and components of Uncover-ML for feature extraction, data scaling, sample selection, predictive mapping, estimating model performance, model optimisation and estimating model uncertainties. Links to download the source code and information on how to implement the algorithms are also provided.
<b>Citation:</b> Wilford, J., Basak, S., Hassan, R., Moushall, B., McCalman, L., Steinberg, D. and Zhang, F, 2020. Uncover-ML: a machine learning pipeline for geoscience data analysis. In: Czarnota, K., Roach, I., Abbott, S., Haynes, M., Kositcin, N., Ray, A. and Slatter, E. (eds.) Exploring for the Future: Extended Abstracts, Geoscience Australia, Canberra, 1–4.
Simple
Identification info
- Date (Creation)
- 2020-06-22
- Date (Publication)
- 2020-06-22T08:08:04
- Citation identifier
- Geoscience Australia Persistent Identifier/https://pid.geoscience.gov.au/dataset/ga/134466
- Citation identifier
- Digital Object Identifier/http://dx.doi.org/10.11636/134466
- Cited responsible party
-
Role Organisation / Individual Name Details Author Wilford, J.
Author Basak, S.
Author Hassan, R.
Author Moushall, B.
Author McCalman, L.
Author Steinberg, D.
Author Zhang, F.
- Purpose
-
EFTF Extedned Abstract
- Status
- Completed
- Point of contact
-
Role Organisation / Individual Name Details Resource provider Minerals, Energy and Groundwater Division
Point of contact Commonwealth of Australia (Geoscience Australia)
Voice Point of contact Du, Z.
MEG Internal Contact
- Spatial representation type
- Topic category
-
- Geoscientific information
Extent
Extent
))
- Maintenance and update frequency
- As needed
Resource format
- Title
-
Product data repository: Various Formats
- Website
-
Data Store directory containing the digital product files
Data Store directory containing one or more files, possibly in a variety of formats, accessible to Geoscience Australia staff only for internal purposes
- theme.ANZRC Fields of Research.rdf
-
-
EARTH SCIENCES
-
INFORMATION AND COMPUTING SCIENCES
-
- Theme
-
-
machine learning
-
- Project
-
-
EFTF
-
- Keywords
-
-
information and computer sciences
-
- Keywords
-
-
data analytics
-
- Keywords
-
-
Exploring for the Future
-
- Keywords
-
-
Toolbox
-
- Keywords
-
-
Published_External
-
Resource constraints
- Title
-
Creative Commons Attribution 4.0 International Licence
- Alternate title
-
CC-BY
- Edition
-
4.0
- Access constraints
- License
- Use constraints
- License
Resource constraints
- Title
-
Australian Government Security ClassificationSystem
- Edition date
- 2018-11-01T00:00:00
- Classification
- Unclassified
- Language
- English
- Character encoding
- UTF8
Distribution Information
- Distributor contact
-
Role Organisation / Individual Name Details Distributor Commonwealth of Australia (Geoscience Australia)
Voice
- OnLine resource
-
Extended Abstract for download (pdf) [2.5MB]
Extended Abstract for download (pdf) [2.5MB]
- Distribution format
-
-
pdf
-
Resource lineage
- Statement
-
The uncover-ML code was developed from a partnership between Data61 (CSIRO) and Geoscience Australia. A large proportion of the code draws on the scikit-learn – machine learning in python resource ( https://scikit-learn.org/stable/ ).
Metadata constraints
- Title
-
Australian Government Security Classification System
- Edition date
- 2018-11-01T00:00:00
- Classification
- Unclassified
Metadata
- Metadata identifier
-
urn:uuid/d184c3e8-8bc5-4889-94c6-cc598ff3f951
- Title
-
GeoNetwork UUID
- Language
- English
- Character encoding
- UTF8
- Contact
-
Role Organisation / Individual Name Details Point of contact Commonwealth of Australia (Geoscience Australia)
Voice Point of contact Du, Z.
MEG Internal Contact
Type of resource
- Resource scope
- Document
- Name
-
GA publication: Extended Abstract
Alternative metadata reference
- Title
-
Geoscience Australia - short identifier for metadata record with
uuid
- Citation identifier
- eCatId/134466
- Date info (Creation)
- 2019-04-08T01:55:29
- Date info (Revision)
- 2019-04-08T01:55:29
Metadata standard
- Title
-
AU/NZS ISO 19115-1:2014
Metadata standard
- Title
-
ISO 19115-1:2014
Metadata standard
- Title
-
ISO 19115-3
- Title
-
Geoscience Australia Community Metadata Profile of ISO 19115-1:2014
- Edition
-
Version 2.0, September 2018
- Citation identifier
- https://pid.geoscience.gov.au/dataset/ga/122551