By Michon Scott
To make observations of our planet, including satellite measurements, useful for science, we need to connect them to specific locations and times. On top of that, these Earth-observation data are often complex and organized in hierarchical layers. For example, ice sheet elevation data might include not just the height measurements, but also details like the satellite’s path, reference points used to measure height, and the spread of the elevation data.
Organizing data hierarchically makes data sets easier to use. Providing a way to grab all the hierarchical data at once make the data sets even more useful. The expertise from the NASA NSIDC Distributed Active Archive Center (DAAC) has made such an optimal data-access scenario a reality.
For experienced users, here is a succinct summary: DataTree has been integrated into Xarray.
Xarray explained
Xarray is an open-source tool used for data analysis and visualization. It is frequently employed by users of NASA data, including data sets archived by the NSIDC DAAC. One of its strengths is its ability to handle arrays.
Arrays can be as simple as ordered lists, but they can also have two dimensions, like a spreadsheet with rows and columns of data. In Earth science, arrays often have even more dimensions. For example, satellite data from a particular region might include latitude, longitude, and time. Depending on the data collected, it might also include information on temperature, precipitation, freeze/thaw cycles, ice elevation, and more.
Xarray is such a powerful tool because it is optimized to not only handle data with multiple dimensions but also to track the data's location in space and time. “It lets you index data and access things intelligently and easily,” NSIDC software developer Matt Savoie says. “It's a larger level of abstraction that lets you think about things and hold information in one place.” Xarray makes complex data operations easier, and because it is open source, any knowledgeable user can fix bugs and add enhancements.
Xarray is powerful, but on its own, the tool has historically made it challenging for users to quickly access and work with hierarchical data.
How DataTree enhances Xarray
To better understand the challenge faced by Xarray use, imagine you are a cacao-harvest surveyor (that might not be a real occupation, but it would be a worthwhile job since people like chocolate). Your job entails counting, tree by tree, how many cacao pods are ready to be picked over the next few days. Ideally, you would want to survey each tree in its entirety in one go, looking at it from the trunk all the way out to the individual leaves.
For a long time, Xarray users could not easily access entire tree-like data hierarchies. They could only view a portion of a hierarchy at any one time—like a cacao-harvest surveyor trying to glean useful information about the number of cacao pods by looking only at the tree’s trunk. As early as 2016, Xarray users began asking for an enhancement that could work better with tree-like structures. The solution required a lot of collaboration from software developers at multiple organizations, including Savoie, who worked on behalf of the NSIDC DAAC to help integrate DataTree into Xarray. The result was a new data structure added to Xarray, the first such update in a decade.
Xarray enhanced with DataTree is well-suited for working with hierarchical data formats, such as NetCDF and Zarr, which are frequently wielded by users of NASA data. Taking advantage of Xarray does require some user skills, including a familiarity with Python. For users of Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) data, the NSIDC DAAC offers additional resources via GitHub.
A blog post authored by the software developers who contributed to the Xarray DataTree project details the integration process and what they learned along the way. Meanwhile, Savoie succinctly puts the DataTree integration into Xarray into perspective. “Xarray does everything you need it to do. It allows complex operations on data sets,” he explains. “The DataTree addition doesn't add something totally new that couldn't be done before, but it definitely reduces the friction for working with a lot of NASA data sets. People are going to have an easier time understanding their data and working with it.”
Access data through the NSIDC DAAC
NASA's NSIDC DAAC manages, distributes, and supports a variety of cryospheric and climate-related datasets as one of the discipline-specific Earth Science Data and Information System (ESDIS) data centers within NASA's Earth Science Data Systems (ESDS) Program. User Resources include data documentation, help articles, data tools, training, and on-demand user support. Learn more about NSIDC DAAC services.