Data-intensive analytics and machine learning
Type of resources
Keywords
Publication year
Topics
-
The rapid growth in global sensor networks is leading to an explosion in the volume, velocity and variety of geospatial and geoscientific data. This coupled with the increasing integration of geospatial data into our everyday lives is also driving an increasing expectation of spatial information on-demand, and with minimal delay. However, there remains a gap between these expectations and the present reality of accessible information products and tools that help us understand the Living Planet. A solution can only be achieved through the development and implementation of common analytical frameworks that enable large scale interoperability across multiple data infrastructures. Success has been achieved using discrete global grid systems (DGGS). A DGGS is a form of Earth reference system that represents the Earth using a tessellation of discrete nested cells and is designed to ensure a repeatable representation of measurements that is better suited to today's requirements and technologies rather than for primarily navigation and manual charting purposes. A DGGS presents a common framework that is capable of linking very large multi-resolution and multi-domain datasets together to enable the next generation of analytic processes to be applied. There are many possible DGGS, each with their own advantages and disadvantages; however, there exists a common set of key characteristics that enable the standardization of DGGS to be achieved. Through the Open Geospatial Consortium (OGC) a new standard has been developed to define these essential characteristics of DGGS infrastructures and the core functional algorithms necessary to support the operation of and interoperability between DGGS. This paper describes the key elements of the new OGC DGGS Core Standard and how DGGS relate to the Living Planet. Examples from a number of conformant DGGS implementations will be used to demonstrate the immense value that can be derived from the adoption of a DGGS approach to issues of serious concern for the planet we live on. Presented at the 2016 Living Planet Symposium (LPS16) Prague, Czech Republic
-
The Australian Geoscience Data Cube (AGDC) Programme envisions a Digital Earth, composed of observations of the Earth¿s oceans, surface and subsurface taken through space and time stored in a high performance computing environment. The AGDC will allow governments, scientists and the public to monitor, analyse, and project the state of the Earth. The AGDC will also realise the full value of large Earth observation datasets by allowing rapid and repeatable continental-scale analyses of Earth properties through time and space. At its core, the AGDC is an indexing system which supports parallel processing on HPC. One of the key features of the AGDC approach is that all of the observations (pixels) in the input data are retained for analysis¿ the data are not mosaicked, binned, or filtered in any way and the source data for any pixel can be traced through the metadata. The AGDC provides a common analytical platform on which researchers can complete complex full depth analyses of the processed archive (~500TB) in a matter of hours. As with the European Space Agency¿s (ESA) GRID Processing on Demand (GPOD) system (https://gpod.eo.esa.int ), the AGDC will allow analyses to be performed on a data store. By arranging EO data spatially and temporally, the AGDC enables efficient large-scale analysis using a ¿dice and stack¿ method which sub-divides the data into spatially regular, time-stamped, band -aggregated tiles that can be traversed as a dense temporal stack. The AGDC application programming interface (API) allows users to develop custom processing tasks. The API provides access to the tiles by abstracting the low level data access. Users don¿t need to be aware of the underlying system and data specific interactions to be able to formulate and execute processing tasks. The development of precision correction methodologies to enable production of comparable observations (spatially and spectrally), as well as the attribution of quality information about the contents of those observations is key to the success of the AGDC. Quality information for each observation is represented as a series of bitwise tests which, in the case of Landsat, include: contiguity of observations between layers in the dataset¿ cloud and cloud shadow obscuration¿ and a land/sea mask. Work in currently underway to further develop the open source solution from the initial prototype deployment. Components of the evolution include advancing the system design and function to provide: improved support for additional sensors¿ improved ingestion support¿ configurable storage units¿ provide high performance data structures¿ graphic user interface implementation and expanded collaboration and engagement. This paper reviews the history of the data cube and the application areas that will be addressed by the current plan of works. This research was undertaken with the assistance of resources from the National Computational Infrastructure (NCI), which is supported by the Australian Government. Presented at the 2016 Living Planet Symposium (LPS16) Prague, Czech Republic