Creating Text Files of HTTPS and S3 URLs for Earthdata Cloud Data Access
NASA Earthdata provides options for accessing NSIDC DAAC data through both HTTPS (https://…
) and S3 (s3://…
) URLs. Understanding which URL type to use can significantly improve your data access workflow and efficiency. This guide walks you through obtaining these URLs and choosing the right option for your needs.
When to Use
HTTPS URLs
Use HTTPS URLs when you want to download data to your local machine or a local server.
HTTPS URLs are best suited for local work (e.g., on your laptop), when you need offline access to store files for later use, and don't require high-performance access to large files.
S3 URLs
Use S3 URLs when working in a cloud-native environment or when you want to stream data without downloading entire files.
S3 URLs are best used for when you're working in AWS (e.g., EC2, Lambda), want to stream or read specific parts of files (e.g., subset large HDF5 or netCDF files), or are developing a cloud pipeline—they provide efficient, scalable access without download management.
Quick Guide
Here's a quick guide for common scenarios:
Use Case | HTTPS | S3 |
---|---|---|
Running a script on your laptop | ✔︎ | |
Downloading once and reusing offline | ✔︎ | |
Building a cloud-native workflow | ✔︎ | |
Working inside AWS or JupyterHub cloud | ✔︎ |
NASA Earthdata Search
NASA Earthdata Search is a web-based tool for discovering, filtering, visualizing, and accessing NASA Earth science data in Earthdata Cloud. Here's how to obtain both HTTPS and S3 URLs for NSIDC DAAC data collections:
1. Go to Earthdata Search at https://search.earthdata.nasa.gov and log in. If you already know which dataset or files you need, skip to Step 7. Steps 2 through 6 explain how to search and discover data.
2. Filter datasets by typing "NSIDC" in the search box to show only NSIDC-distributed collections (or NSIDC_CPRD
to filter for cloud-hosted NSIDC-distributed collections).
3. You can further refine your search using spatial and temporal filters. For more guidance on setting filters, see: https://nsidc.org/data/user-resources/help-center/search-order-and-customize-nsidc-daac-data-nasa-earthdata-search
4. In the left panel under Filter Collections, select "Available in Earthdata Cloud" under Features to show only cloud-hosted datasets (if you did not use NSIDC_CPRD
in Step 2).
5. Click your desired collection in the "Matching Collections" results (in the example below, SMAP Enhanced L3 Radiometer Global and Polar Grid Daily 9 km EASE-Grid Soil Moisture V006).
6. Set your desired date range (for example, 2024-01-01 through 2024-03-31), then click Download All.
7. Click Download Data to access the Download Status page, which contains two important tabs:
Download Files: This tab provides a list of HTTPS URLs for your selected granules. This list can be directly saved to your computer as a text file by clicking Save. You can then use this text file with command-line tools like curl and wget. See the multiple file download sections using HTTPS links from a text file in this article: https://nsidc.org/data/user-resources/help-center/downloading-data-earthdata-cloud-your-local-computer-using-command-line
AWS S3 Access: This tab displays S3 URLs for your selected granules. You can also save this list as a text file for direct S3 access workflows.
CMR
CMR (Common Metadata Repository) is NASA's central metadata catalog for Earth science data. It indexes metadata records from NASA Earth science datasets and makes them searchable through APIs and tools including NASA Earthdata Search, earthaccess Python Library, and various data interfaces used by data centers and DAACs for dataset and granule discovery. One of CMR's key features is providing HTTPS and S3 URLs for each data file, which we'll use to create a text file of access links.
This method of obtaining HTTPS and S3 URLs is more technical than the NASA Earthdata Search workflow described above. Though it requires knowledge of CMR and data parsing, learning to use CMR can streamline your data discovery process and make it more powerful.
As with Earthdata Search, this method allows for filtering based on the data provider, concept ID, spatial bounding box, or temporal range, among many others. The CMR Search API Documentation Page provides comprehensive guidance, and the examples below will help you get started.
Using curl and grep
This command creates a text file named URLs.txt containing up to 200 HTTPS download links for .h5 files from the ATLAS/ICESat-2 L3B Monthly Gridded Atmosphere V005 data set (ATL17, V05).
curl 'https://cmr.earthdata.nasa.gov/search/granules.umm_json?collection_concept_id=C2769338020-NSIDC_CPRD&page_size=200' | grep -o 'https://data[^"]*\.h5' > URLs.txt
This command has three parts:
1. The curl part:
curl 'https://cmr.earthdata.nasa.gov/search/granules.umm_json?collection_concept_id=C2769338020-NSIDC_CPRD&page_size=200'
- Sends a request to the NASA CMR API
- Retrieves metadata for granules from a specific collection:
- The example uses the collection concept ID
C2769338020-NSIDC_CPRD
, which uniquely identifies the ATL17 V5 dataset. You can substitute this with any concept ID from the NSIDC DAAC catalog.
- The example uses the collection concept ID
- Returns results in UMM-JSON format
- Limits results to 200 granules (
page_size=200
)
2. Piped to grep:
| grep -o 'https://data[^"]*\.h5'
- The pipe command
|
transfers the CMR API JSON response to thegrep
command, which filters the response. - Searches the JSON response for any text that:
- Starts with
https://data
- Ends with
.h5
(HDF5 files). You can modify this extension based on your needs—for example, use.nc
for netCDF files. You can also remove the file extension filter (\.h5
) to get all data files for a granule, including browse imagery. For multiple extensions, such as extracting both.h5
and.tif
links, add the-E
flag and use the "or" operator|
:grep -oE 'https://data[^"]*\.(h5|tif)'
- Ignores all other metadata (extracting only the HTTPS .h5 URLs)
- Starts with
- The
-o
flag outputs only the matched URLs, with one URL per line
3. Redirects output to a file:
> URLs.txt
- Saves the filtered .h5 URLs into a text file called URLs.txt
To extract S3 URLs instead, modify the grep pattern in the command above to:
| grep -o 's3://[^"]*\.h5'
Based on this breakdown, you can customize the options and filters to control which files are included in your output text file.
Final Thoughts
Understanding how to choose between HTTPS and S3 URLs is key for efficient data access. Here's a quick reference:
- Use HTTPS URLs when downloading data to work with locally
- Use S3 URLs when working in cloud environments or streaming data
You can obtain these URLs through either NASA Earthdata Search—which provides a user-friendly interface—or CMR, which offers more programmatic control for advanced users.