By Agnieszka Gautier
When the NASA Ice, Cloud, and land Elevation Satellite-2 launched in late 2018, it brought high-resolution data to a new level. The Advanced Topographic Laser Altimeter System (ATLAS) on the satellite delivers 10,000 laser pulses per second toward Earth. The echo from the laser pulse back to the satellite’s receiver provides in-depth height calculations of sea and land ice, forest canopies, water body heights, and land and urban topography. Global elevations are measured every 70 centimeters (28 inches). The result is a deluge of data points.
That amount of data was unprecedented in 2018, but as instruments become more sophisticated, future NASA missions expect bigger data dumps. For this reason, the work that the National Snow and Ice Data Center (NSIDC) Distributed Active Archive Center (DAAC) initiated to optimize big data in the cloud has paved the way forward for current and future NASA missions.
On July 31, 2025, the NASA NSIDC DAAC released ATL03 global geolocated photon data in cloud-optimized HDF5, making ICESat-2 one of NASA’s first missions to publish data in this format. The new format improves speed and scalability and offers potential cost-saving and time-saving benefits for cloud-based science and research. This collaborative work between NSIDC DAAC, the ICESat-2 mission, the non-profit The HDF Group, and the user community is the first of its kind.
Learning to cut a steak with a spoon
Historically, data from NASA Earth Science missions were designed to be used in local or supercomputer environments. “Now that files exist in the cloud, you have all these problems that arise from not having files locally or not having a supercomputer,” said Luis Lopez, a software engineer with the NSIDC DAAC who, along with NSIDC senior scientist Andy Barrett, researched and tested how to make ICESat-2 perform better in the cloud. As satellite instruments have become more sophisticated, their resulting data files have grown, often becoming so unruly as to stymie large-scale analysis. Some ICESat-2 files, for instance, are up to seven gigabytes in size, and can take up to half an hour—depending on the machine—to access only part of that file.
All these issues not only followed the data into the cloud, but being in the cloud created new challenges. “ATL03 is a huge data set. You can’t just copy and paste this data into the cloud,” said Amy Steiker, tool and service manager for the DAAC. “Users always struggled to download its many gigabytes of data per file. When it hit the cloud, the access time became even slower.” That is because the underlying data structure was not optimized to work with cloud-based object storage.
One issue with the original ICESat-2 HDF5 file format was the fragmented metadata, basically the data about the data. HDF5 format is a standard file format in the geo sciences and remote sensing community. It is self-describing, meaning that all the components needed to read the data are located within each file. The metadata within these files includes information about the data type, units, and structure, allowing different applications to access the files. This format existed well before the cloud, and it shows. “It suffers from all of these pathologies of a local file system,” Lopez said. Because the files exist in the cloud, and are not local, there is a delay in reading all the characteristics of the file before even recognizing it. What needed to happen was a sort of repacking of the metadata.
The ATLAS instrument uses six laser beams arranged in three pairs to measure global height. “So, there are a lot of repetitive data variables,” Steiker said. “That structure required having to read the files in multiple places because, for instance, there’s a height value, here, here, and here.” That structure resulted in fragmented metadata.
Cloud optimization required identifying the fragments and putting them together in a new file, so that the metadata could be continuous. “All the data file addresses should be in the same location,” Lopez said. First, data were consolidated, and then they were reorganized. “You collect all the metadata and put it at the front of the file,” Lopez added. “Then when the software wants to open the requested file, it gets all it needs upfront, instead of doing a thousand different requests.”
It takes a mission
While the NSIDC DAAC led the cloud optimization effort, the end product resulted from collaborative efforts from experts within the ICESat-2 mission and in HDF5 formats. Aleksandar Jelenak, a senior informatics architect at The HDF Group, who specializes in advancing HDF5 technology, was a key consultant on cloud-optimization and the Python library, h5py. This library is commonly used to access ICESat-2 data. Jeffrey Lee, a software engineer with the ICESat-2 mission team in charge of ATLAS data development, directly implemented changes into the production of the data.
What improvements needed to be made came down to the user community. “That’s really what NSIDC and the DAAC are tasked to do as far as user support, management, and stewardship,” Steiker said. From getting user input to producing the cloud-optimized product took about two years, aligning with the ICESat-2 reprocessing schedule. “It takes some time and dedicated effort to get that feedback, but it can and absolutely should be done,” she added.
User feedback also informed the prioritization of cloud-optimized HDF5 over other cloud-optimized format options, and which data sets to focus on based on existing data access challenges. “NSIDC has done a lot of the legwork,” Steiker said, “and I want to amplify this work because I think there’s lot of value for other missions to do it.”
And this is already happening. Other missions have begun applying lessons learned to their own missions. For instance, when the NSIDC DAAC was working on cloud-optimization of ICESat-2 data, the NASA-ISRO Synthetic Aperture Radar (NISAR) mission that launched on July 30, 2025, was in the process of deciding how to produce their data. “They were like, ‘Why don’t we adopt the same things that you guys are doing, so scientists won’t have the issues you were having,’” Lopez said. NISAR implemented the user feedback that NSIDC DAAC gathered to publish in cloud-optimized HDF5 format. “If we can work with the missions to fix the problems at the start, then we’ll be in better shape,” Lopez said.
Closing the loop
The NISAR mission will provide a detailed view of Earth’s land and ice-covered surfaces to observe changes, providing information about biomass, natural hazards, sea level rise, and groundwater. Its dual-band radar—the first of its kind in space—will measure Earth’s surface down to a centimeter. “Because of its many sensors and the resolution of its main instrument, I think all of NASA’s Earth data will double in five years only from NISAR,” Lopez said. “Imagine if it was not cloud optimized.” Harmonizing and optimizing files from NASA missions is not just better for the user, but it could save NASA money and time if these missions do not have to reprocess data across file formats down the road.
Projects that encourage the DAACs to work closely with scientists and user communities, as the NSIDC DAAC team has demonstrated here, benefit all levels of science from data collection and processing to data stewardship, and ultimately research to better understand our planet and its changes. “We really closed the loop,” Lopez said. “We offloaded some of the upfront work, so scientists could get to their research faster.” The NSIDC DAAC gathered user feedback and brought it back to the beginning, where data are processed in the first place.
“It’s a powerful example of our value,” Steiker added.
Access data through the NSIDC DAAC
NASA’s NSIDC DAAC manages, distributes, and supports a variety of cryospheric and weather-related data sets as one of the discipline-specific Earth Science Data and Information System (ESDIS) data centers within NASA's Earth Science Data Systems (ESDS) Program. User Resources include data documentation, help articles, data tools, training, and on-demand user support.
NASA NSIDC DAAC data highlighted in this article
Neumann, T. A., Hancock, D., Robbins, J., Gibbons, A., Lee, J., Brenner, A., Felikson, D., Harbeck, K., Saba, J., Luthcke, S. B., Rebold, T., Reese, A. & Sutterley, T. (2025). ATLAS/ICESat-2 L2A Global Geolocated Photon Data. (ATL03, Version 7). [Data Set]. Boulder, Colorado, USA. NASA National Snow and Ice Data Center Distributed Active Archive Center. doi:10.5067/ATLAS/ATL03.007. [describe subset used if applicable]. Date Accessed 10-20-2025.
References
Neuenschwander, Amy, Lori Magruder, Eric Guenther, Steven Hancock, and Matt Purslow. 2022. "Radiometric Assessment of ICESat-2 over Vegetated Surfaces." Remote Sensing 14, no. 3: 787. doi:10.3390/rs14030787.