Applications of natural language processing to geoscience text data and prospectivity modelling
<div>Geological maps are powerful models for visualizing the complex distribution of rock types through space and time. However, the descriptive information that forms the basis for a preferred map interpretation is typically stored in geological map databases as unstructured text data that are difficult to use in practice. Herein we apply natural language processing (NLP) to geoscientific text data from Canada, the U.S., and Australia to address that knowledge gap. First, rock descriptions, geological ages, lithostratigraphic and lithodemic information, and other long-form text data are translated to numerical vectors, i.e., a word embedding, using a geoscience language model. Network analysis of word associations, nearest neighbors, and principal component analysis are then used to extract meaningful semantic relationships between rock types. We further demonstrate using simple Naive Bayes classifiers and the area under receiver operating characteristics plots (AUC) how word vectors can be used to: (1) predict the locations of “pegmatitic” (AUC = 0.962) and “alkalic” (AUC = 0.938) rocks; (2) predict mineral potential for Mississippi-Valley-type (AUC = 0.868) and clastic-dominated (AUC = 0.809) Zn-Pb deposits; and (3) search geoscientific text data for analogues of the giant Mount Isa clastic-dominated Zn-Pb deposit using the cosine similarities between word vectors. This form of semantic search is a promising NLP approach for assessing mineral potential with limited training data. Overall, the results highlight how geoscience language models and NLP can be used to extract new knowledge from unstructured text data and reduce the mineral exploration search space for critical raw materials.</div><div><br></div><div><strong>Citation: </strong>Lawley, C. J. M., Gadd, M. G., Parsa, M., Lederer, G. W., Graham, G. E., and Ford, A., 2023, Applications of Natural Language Processing to Geoscience Text Data and Prospectivity Modeling: Natural Resources Research. https://doi.org/10.1007/s11053-023-10216-1</div>
Simple
Identification info
- Date (Creation)
- 2023-01-12T16:00:00
- Date (Publication)
- 2023-06-13T01:20:35
- Citation identifier
- Geoscience Australia Persistent Identifier/https://pid.geoscience.gov.au/dataset/ga/147637
- Cited responsible party
-
Role Organisation / Individual Name Details Author Lawley, C.J.M.
External Contact Author Gadd, M.G.
External Contact Author Parsa, M.
External Contact Author Lederer, G.W.
External Contact Author Graham, G.E.
External Contact Author Ford, A.
Internal Contact Publisher Springer Nature
External Contact
- Name
-
Natural Resources Research
- Purpose
-
Manuscript examining the use of natural language processing for improving understanding of mineral systems and mineral prospectivity. Case studies are presented for evaluating prospectivity for critical minerals in Canada, United States, and Australia.
- Status
- Completed
- Point of contact
-
Role Organisation / Individual Name Details Resource provider Minerals, Energy and Groundwater Division
External Contact Point of contact Commonwealth of Australia (Geoscience Australia)
Voice Point of contact Ford, A.
Internal Contact
- Spatial representation type
- Topic category
-
- Geoscientific information
Extent
- Maintenance and update frequency
- Not planned
Resource format
- Title
-
Product data repository: Various Formats
- Website
-
Data Store directory containing the digital product files
Data Store directory containing one or more files, possibly in a variety of formats, accessible to Geoscience Australia staff only for internal purposes
- Project
-
-
critical minerals mapping initiative
-
- Keywords
-
-
critical minerals
-
- Keywords
-
-
mineral systems
-
- Keywords
-
-
natural language processing
-
- theme.ANZRC Fields of Research.rdf
-
-
Geology
-
Data mining and knowledge discovery
-
- Keywords
-
-
Published_External
-
Resource constraints
- Title
-
Creative Commons Attribution 4.0 International Licence
- Alternate title
-
CC-BY
- Edition
-
4.0
- Addressee
-
Role Organisation / Individual Name Details User Any
- Use constraints
- License
- Use constraints
- Other restrictions
- Other constraints
-
© 2023, Crown
Resource constraints
- Title
-
Australian Government Security Classification System
- Edition date
- 2018-11-01T00:00:00
- Classification
- Unclassified
- Classification system
-
Australian Government Security Classification System
- Language
- English
- Character encoding
- UTF8
Distribution Information
- Distributor contact
-
Role Organisation / Individual Name Details Distributor Commonwealth of Australia (Geoscience Australia)
Voice facsimile
- OnLine resource
-
Link to Journal
Link to Journal
- Distribution format
-
Resource lineage
- Statement
-
<div>Multiple geological databases were used as the basis for generating natural language processing workflows to evaluate mineral prospectivity for critical minerals in Canada, the United States, and Australia.</div>
Metadata constraints
- Title
-
Australian Government Security Classification System
- Edition date
- 2018-11-01T00:00:00
- Classification
- Unclassified
Metadata
- Metadata identifier
-
urn:uuid/db66820f-d78c-469e-a5fd-598b498dbb1e
- Title
-
GeoNetwork UUID
- Language
- English
- Character encoding
- UTF8
- Contact
-
Role Organisation / Individual Name Details Point of contact Commonwealth of Australia (Geoscience Australia)
Voice Point of contact Ford, A.
Internal Contact
Type of resource
- Resource scope
- Document
- Name
-
Journal Article / Conference Paper
Alternative metadata reference
- Title
-
Geoscience Australia - short identifier for metadata record with
uuid
- Citation identifier
- eCatId/147637
- Date info (Creation)
- 2023-06-13T01:08:34
- Date info (Revision)
- 2023-06-13T01:08:34
Metadata standard
- Title
-
AU/NZS ISO 19115-1:2014
Metadata standard
- Title
-
ISO 19115-1:2014
Metadata standard
- Title
-
ISO 19115-3
- Title
-
Geoscience Australia Community Metadata Profile of ISO 19115-1:2014
- Edition
-
Version 2.0, September 2018
- Citation identifier
- http://pid.geoscience.gov.au/dataset/ga/122551