Climate research often seeks to detect changes over time. With the growth of remote sensing data sets, however, Earth scientists are being overwhelmed with data. Much of the data is organized in directory structures that make it hard to find and difficult to analyze.
The Data Rods project is an effort to fundamentally reorganize the archival structure of remote sensing data sets at the National Snow and Ice Data Center (NSIDC). Data sets will be stored in a pure-object database structure that permits rapid spatial and temporal subsetting, filtering, and algorithmic post-processing. The database format and user interface will also facilitate intercomparisons between data sets, enhancing inter-data-set correlations pattern detection capabilities.
David W. Gallaher, Glenn E. Grant, Julienne Stroeve, Dr. Qin Lv, and Iinstry Liang of the CU Computer Science Department
Satellite data rates are expected to grow exponentially for the foreseeable future. NSIDC stores large volumes of cryospheric remote sensing data, and retrieval and processing of these data already presents significant challenges. The Data Rods project proposes to create a new data structure for rapid retrieval, filtering, and analysis of massive multi-modality data sets.
Pre-processing data sets: gridding
Data archives at the NSIDC contain an extensive collection of multi-modality data sets at a wide variety of temporal and spatial resolutions. To enable meaningful inter-comparisons between these data sets, they must first be reorganized into a common spatial grid. Typical gridding resolutions are tiles of 25km, 5km, 1km, or 250m on a side.
For consistency, the NSIDC Equal-Area Scalable Earth Grids (EASE-Grids) tool is used to grid the data sets into an equal-area map projection at the desired spatial resolution.
Constructing the databases
Once gridded, each image pixel is injected into a pure-object database. Using this schema, each grid point is a "pixel object" containing all the available information from the chosen data set at that location, including a time stamp and latitude/longitude coordinates. Although the object data is physically stored in a linear fashion inside the pure-object database, it can be visualized as if the data fields are layered with time as the Z dimension. Data sets can be appended to existing databases or broken up across multiple databases. Additional databases and volumes can be created as needed. From the viewpoint of a user accessing the data, divisions between databases are transparent.
The Data Rods concept
If the database is viewed as a gridded, layered structure with time as the vertical dimension, then a "data rod" is a column of data that incorporates all the known imagery values at that location through time. Database queries can then be constructed that select only the data rods of interest, thereby subsetting to a particular of spatial or temporal location, and asking, "How has this data changed through time?"
Rods of data permit users to subset data through time.
A final component of the Data Rods project will be the development of a Web-based user interface (UI) to access the databases. The UI will include spatiotemporal search and filtering tools, with options for analysis and visualization. User-customizable algorithms will permit rapid intercomparisons between data sets and automated pattern discovery, including trend searching, anomaly detection, and identification of new spatial and temporal cycles. Possible output formats may include graphical displays, or NetCDF, GeoTIFF or KML files.
The Services for the Analysis of the Greenland Environment (SAGE) is an example of a potential user interface for Data Rods.
Why use a pure-object database?
The goal is speed and efficiency.
Remote sensing data sets may span several decades and contain many terabytes of spatiotemporal information. Our test databases already contain over 2 billion entries while maintaining query and filtering efficiency. This performance would not be possible using a traditional relational database.
A "pure-object" database stores information using data structures that mirror how an object-oriented program handles its internal objects. Object-oriented programs may insert data into a pure-object database simply by requesting that their instantiated objects be made persistent.
Another program can then connect to the database and re-instantiate the same objects, gaining access to the stored object data structures and methods. This differs substantially from a relational database: Instead of using logical references to search for data items (which becomes inefficient as the database grows) a pure-object database stores data in a linear fashion, permitting rapid searches across massive data sets.
Contact NSIDC User Services for more information.