Creating Text Files of HTTPS and S3 URLs for Earthdata Cloud Data Access

NASA Earthdata provides options for accessing NSIDC DAAC data through both HTTPS (https://…) and S3 (s3://…) URLs. Understanding which URL type to use can significantly improve your data access workflow and efficiency. This guide walks you through obtaining these URLs and choosing the right option for your needs.

When to Use

HTTPS URLs

Use HTTPS URLs when you want to download data to your local machine or a local server.

HTTPS URLs are best suited for local work (e.g., on your laptop), when you need offline access to store files for later use, and don't require high-performance access to large files.

S3 URLs

Use S3 URLs when working in a cloud-native environment or when you want to stream data without downloading entire files.

S3 URLs are best used for when you're working in AWS (e.g., EC2, Lambda), want to stream or read specific parts of files (e.g., subset large HDF5 or netCDF files), or are developing a cloud pipeline—they provide efficient, scalable access without download management.

Quick Guide

Here's a quick guide for common scenarios:

Use Case	HTTPS	S3
Running a script on your laptop	✔︎
Downloading once and reusing offline	✔︎
Building a cloud-native workflow		✔︎
Working inside AWS or JupyterHub cloud		✔︎

NASA Earthdata Search

NASA Earthdata Search is a web-based tool for discovering, filtering, visualizing, and accessing NASA Earth science data in Earthdata Cloud. Here's how to obtain both HTTPS and S3 URLs for NSIDC DAAC data collections:

1. Go to Earthdata Search at https://search.earthdata.nasa.gov and log in. If you already know which dataset or files you need, skip to Step 7. Steps 2 through 6 explain how to search and discover data.

2. Filter datasets by typing "NSIDC" in the search box to show only NSIDC-distributed collections (or NSIDC_CPRD to filter for cloud-hosted NSIDC-distributed collections).

3. You can further refine your search using spatial and temporal filters. For more guidance on setting filters, see: https://nsidc.org/data/user-resources/help-center/search-order-and-customize-nsidc-daac-data-nasa-earthdata-search

4. In the left panel under Filter Collections, select "Available in Earthdata Cloud" under Features to show only cloud-hosted datasets (if you did not use NSIDC_CPRD in Step 2).

Screenshot showing the NASA Earthdata interface with search results for NSIDC datasets and the "Available in Earthdata Cloud" filter selected

5. Click your desired collection in the "Matching Collections" results (in the example below, SMAP Enhanced L3 Radiometer Global and Polar Grid Daily 9 km EASE-Grid Soil Moisture V006).

6. Set your desired date range (for example, 2024-01-01 through 2024-03-31), then click Download All.

Screenshot of Earthdata Search showing SMAP Enhanced L3 results with temporal filter set for Jan 1-Mar 31, 2024

7. Click Download Data to access the Download Status page, which contains two important tabs:

NASA Earthdata Search interface showing SMAP Enhanced L3 Radiometer dataset with download options: "Download all data" selected and "Download Data" button at bottom. No customization options available.

Download Files: This tab provides a list of HTTPS URLs for your selected granules. This list can be directly saved to your computer as a text file by clicking Save. You can then use this text file with command-line tools like curl and wget. See the multiple file download sections using HTTPS links from a text file in this article: https://nsidc.org/data/user-resources/help-center/downloading-data-earthdata-cloud-your-local-computer-using-command-line

Screenshot of NASA Earthdata Search showing SMAP dataset download interface. "Download Files" tab selected, displaying HTTPS URLs to NSIDC Earthdata Cloud for SMAP data. Action buttons include "Download Files," "Copy," and "Save."

AWS S3 Access: This tab displays S3 URLs for your selected granules. You can also save this list as a text file for direct S3 access workflows.

NASA Earthdata products hosted in AWS S3 buckets are protected by temporary AWS credentials tied to your Earthdata Login. To access these products, you'll need three AWS S3 credentials: "accessKeyId," "secretAccessKey," and "sessionToken." These can be obtained by clicking Get AWS S3 Credentials (blue box in the screenshot below). These temporary credentials expire after one hour.

Screenshot of NASA Earthdata Search showing the "AWS S3 Access" tab interface

CMR

CMR (Common Metadata Repository) is NASA's central metadata catalog for Earth science data. It indexes metadata records from NASA Earth science datasets and makes them searchable through APIs and tools including NASA Earthdata Search, earthaccess Python Library, and various data interfaces used by data centers and DAACs for dataset and granule discovery. One of CMR's key features is providing HTTPS and S3 URLs for each data file, which we'll use to create a text file of access links.

This method of obtaining HTTPS and S3 URLs is more technical than the NASA Earthdata Search workflow described above. Though it requires knowledge of CMR and data parsing, learning to use CMR can streamline your data discovery process and make it more powerful.

As with Earthdata Search, this method allows for filtering based on the data provider, concept ID, spatial bounding box, or temporal range, among many others. The CMR Search API Documentation Page provides comprehensive guidance, and the examples below will help you get started.

Using curl and grep

This command creates a text file named URLs.txt containing up to 200 HTTPS download links for .h5 files from the ATLAS/ICESat-2 L3B Monthly Gridded Atmosphere V005 data set (ATL17, V05).

curl 'https://cmr.earthdata.nasa.gov/search/granules.umm_json?collection_concept_id=C2769338020-NSIDC_CPRD&page_size=200' | grep -o 'https://data[^"]*\.h5' > URLs.txt

This command has three parts:

1. The curl part:

curl 'https://cmr.earthdata.nasa.gov/search/granules.umm_json?collection_concept_id=C2769338020-NSIDC_CPRD&page_size=200'

Sends a request to the NASA CMR API
Retrieves metadata for granules from a specific collection:
- The example uses the collection concept ID C2769338020-NSIDC_CPRD, which uniquely identifies the ATL17 V5 dataset. You can substitute this with any concept ID from the NSIDC DAAC catalog.
Returns results in UMM-JSON format
Limits results to 200 granules (page_size=200)

2. Piped to grep:

| grep -o 'https://data[^"]*\.h5'

The pipe command | transfers the CMR API JSON response to the grep command, which filters the response.
Searches the JSON response for any text that:
- Starts with https://data
- Ends with .h5 (HDF5 files). You can modify this extension based on your needs—for example, use .nc for netCDF files. You can also remove the file extension filter (\.h5) to get all data files for a granule, including browse imagery. For multiple extensions, such as extracting both .h5 and .tif links, add the -E flag and use the "or" operator |: grep -oE 'https://data[^"]*\.(h5|tif)'
- Ignores all other metadata (extracting only the HTTPS .h5 URLs)
The -o flag outputs only the matched URLs, with one URL per line

3. Redirects output to a file:

> URLs.txt

Saves the filtered .h5 URLs into a text file called URLs.txt

To extract S3 URLs instead, modify the grep pattern in the command above to:

| grep -o 's3://[^"]*\.h5'

Based on this breakdown, you can customize the options and filters to control which files are included in your output text file.

Final Thoughts

Understanding how to choose between HTTPS and S3 URLs is key for efficient data access. Here's a quick reference:

Use HTTPS URLs when downloading data to work with locally
Use S3 URLs when working in cloud environments or streaming data

You can obtain these URLs through either NASA Earthdata Search—which provides a user-friendly interface—or CMR, which offers more programmatic control for advanced users.