NASA Earthdata Cloud Data Access Guide

The NASA Earthdata Cloud is the NASA cloud-based archive of Earth observations. It is hosted by Amazon Web Services (AWS). All ICESat-2 and GLAS/ICESat data products are available in the Earthdata Cloud. The NSIDC DAAC is in the process of transferring the rest of their data products to the Earthdata Cloud. For a list of NSIDC DAAC data products that are currently available in the cloud, see NSIDC DAAC Cloud Collection page.

Key things to know

  • Downloading data from the Earthdata Cloud to your local computer or storage system is and will continue to be free for users. For further general information on the Earthdata Cloud please see FAQs about NSIDC DAACs Earthdata Cloud migration.
  • Currently, user services provides support for one of the access methods; accessing the public S3 bucket using Earthdata Login and AWS credentials and copying data to an EC2 instance. For guidance on other access methods and working with data directly in the cloud, please see the 'Resources' section for more information.
  • This guide will be updated with more information over time, as more data products are added to the cloud. 

S3 access to NSIDC DAAC data

What is S3?

S3 is Amazon Web Services Simple Storage Service.  The data are stored in Amazon S3 buckets in the Earthdata Cloud. S3 buckets are similar to file folders; they store objects which consist of data and its descriptive metadata. 

Prerequisites for S3 access from the Cloud

To access NSIDC DAAC data stored in S3 buckets in the Earthdata Cloud and copy the data to an EC2 instance, the following three things are required:

  1. An AWS account. You can find more details on how to set one up here.
  2. An EC2 instance set up in your AWS account. This article outlines how to set one up.
  3. A free Earthdata Login account to access the data. If you don't have one, you can register for one here.
You must have an EC2 instance in the us-west-2 region to access the NSIDC DAAC data in the S3 buckets.

 

Next, use one of the workflows below to search for and access data in the Earthdata Cloud S3 buckets. If you prefer to use Earthdata Search follow the first workflow; if you prefer to use the command-line follow the second workflow. The workflows have been tested in an Amazon Linux EC2 instance.

Earthdata Search workflow

1. Searching for data and S3 links using Earthdata Search

1.1 Go to https://search.earthdata.nasa.gov. Make sure you are logged in with your Earthdata Login credentials. 

1.2 Select the ‘Available from AWS Cloud’ filter in the side bar (step 1 in screenshot below). To narrow down to NSIDC DAAC data products, type ‘NSIDC’ in the search box in the top left corner (step 2 in the screenshot below). This will list all the NSIDC DAAC data products that are currently available in the Earthdata Cloud. You can apply spatial, temporal or keyword filters to narrow down the list if you wish.

Earthdata Search web application with filter for NSIDC DAAC cloud data sets
Screenshot of Earthdata Search web application with a filter for NSIDC DAAC data sets that are available in Earthdata Cloud. — Credit: National Snow and Ice Data Center

1.3 Once you have selected a data product, again apply spatial and/or temporal filters until you have the required granules. Click the ‘Download All’ button (step 1 in screenshot) and in the sidebar that appears select ‘Direct Download’ (step 2 in screenshot). Then click the ‘Download Data’ button (step 3 in screenshot).

Earthdata Search web application workflow to obtain S3 links to data in the Earthdata Cloud
Three-step workflow in Earthdata Search to obtain the S3 links for the selected data in the Earthdata Cloud. — Credit: National Snow and Ice Data Center

1.4 In the Download Status window you will see an option for ‘AWS S3 Access’. Select this (step 1 in screenshot), and a list of the S3 links for your data will appear. You need these links to access the data; you can copy these links or save them as a text file using the ‘Copy’ or ‘Save’ options (step two in the screenshot). You can also copy individual links by hovering over them and selecting the copy button (step 3 in the screenshot).  Keep this window open as you will need it for the next step.

Download Status page in Earthdata search that lists S3 links and link to AWS S3 credentials
Download Status page in Earthdata Search that lists the S3 links and provides a link for getting AWS S3 credentials. — Credit: National Snow and Ice Data Center

2. Getting temporary AWS S3 credentials from Earthdata Search 

2.1 To access the data in the Earthdata Cloud S3 buckets you need temporary AWS S3 credentials; “accessKeyId”, “secretAccessKey”, and “sessionToken”. In the Download Status window from the previous section there is a ‘Get AWS S3 Credentials’ link (step 4 in the screenshot above). Click on this link and a new page will open that will have all three credentials. Keep this page open as it is needed for the next step.

These credentials expire after 1 hour, to get new credentials visit the following URL (you may need to log in with your Earthdata Login credentials): https://data.nsidc.earthdatacloud.nasa.gov/s3credentials

 

3. Copying data from the S3 buckets to an EC2 instance 

Linux/macOS/Windows command line option

3.1 If you haven’t already, make sure your AWS EC2 instance is launched. Open a terminal/ command prompt and connect to it using the SSH protocol. See the ‘Prerequisites for S3 access from the cloud’ section above, for details on how to set up an AWS account and EC2 instance.

 

3.2 Set up your AWS S3 credentials as environment variables, using these three commands:

export AWS_ACCESS_KEY_ID=<your_accessKeyId>
export AWS_SECRET_ACCESS_KEY=<your_secretAccessKey>
export AWS_SESSION_TOKEN=<you_sessionToken>

where the phrases in brackets are the AWS S3 credentials acquired in the previous section.

 

3.3 Copy a file from the S3 bucket to the EC2 instance:

aws s3 cp <s3 link> .

where the <s3 link> is one of the links you acquired from Earthdata Search. For example, to copy a file from the ATLAS/ICESat-2 L2A Global Geolocated Photon Data, Version 5 (ATL03) data product:

aws s3 cp s3://nsidc-cumulus-prod-protected/ATLAS/ATL03/005/2018/10/14/ATL03_20181014003519_02350106_005_01.h5 .

 

Due to Earthdata Cloud S3 bucket permissions, you have to be in the us-west-2 region in order to copy files. So make sure your instance is set up in this same region.
It costs money to store data on your EC2 instance, and the amount you can store will depend on the configurations of your EC2 instance. If you wish to work with large volumes of data, this access workflow is not the best option. Instead consider direct S3 access, where the data are accessed in-place without the need to copy or transfer data from its original S3 bucket location. See the 'Resources' section for links to additional guidance on direct S3 access.

Command line workflow for Linux/macOS/Windows

1. Searching for data and S3 links using curl

NASA’s Common Metadata Repository (CMR) is a high performance, high-quality continuously evolving metadata system that catalogs all data and service metadata records for NASA’s Earth Observing System Data and Information System (EOSDIS). You can use the CMR API to search for data and the associated S3 links.

1.1 Launch your AWS EC2 instance and connect to it from the terminal/command prompt  using the SSH protocol. See the ‘Prerequisites for S3 access from the cloud’ section above, for details on how to set up an AWS account and EC2 instance. 

 

1.2 Search CMR for a specific data product and the associated S3 links using curl:

curl ‘https://cmr.earthdata.nasa.gov/search/granules.json?pretty=true&provider=NSIDC_CPRD&short_name=<dataproductid>&version=<versionnumber>’

where <dataproductid> is the data product ID e.g. ATL03, ATL05, etc. and <versionnumber> is the three digit version number of the data product e.g. 005.

For example, to return the metadata, including S3 links, for the first 10 granules for ATLAS/ICESat-2 L2A Global Geolocated Photon Data, Version 5 (ATL03):

curl ‘https://cmr.earthdata.nasa.gov/search/granules.json?pretty=true&provider=NSIDC_CPRD&short_name=ATL03&version=005'

If you just want the S3 links for the first 10 granules of ATLAS/ICESat-2 L2A Global Geolocated Photon Data, Version 5 (ATL03), you can try this command:

curl 'https://cmr.earthdata.nasa.gov/search/granules.json?pretty=true&provider=NSIDC_CPRD&short_name=ATL03&version=005' | grep 's3://' | awk -F '"' '{print $4}'

Note it will return S3 links for all the files, the data files and the associated browse files.

For further guidance on the options available for the CMR API, please refer to this documentation: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html.

 

2. Getting temporary AWS S3 credentials using curl

2.1 To access the data in the Earthdata Cloud S3 buckets you need temporary AWS S3 credentials. To get these credentials, Earthdata Login credentials are needed. Set up a .netrc file with your Earthdata Login credentials using these two commands:

echo ‘machine urs.earthdata.nasa.gov login <uid> password <password>’ >> ~/.netrc 
chmod 0600 ~/.netrc

Replace <uid> and <password> with your Earthdata Login username and password. 

 

2.2 Execute this command to get temporary AWS credentials “accessKeyId”, “secretAccessKey”, and “sessionToken”:

curl -b ~/.urs_cookies -c ~/.urs_cookies -L -n 'https://data.nsidc.earthdatacloud.nasa.gov/s3credentials'
The temporary AWS S3 credentials expire after 1 hour, so you will need to repeat step 2.2 above to get new credentials.

 

3. Copying data from the S3 bucket to the EC2 instance

3.1 Set up the AWS S3 credentials as environment variables in the terminal/command prompt, using these three commands:

export AWS_ACCESS_KEY_ID=<your_accessKeyId>
export AWS_SECRET_ACCESS_KEY=<your_secretAccessKey>
export AWS_SESSION_TOKEN=<you_sessionToken>

where the phrases in brackets are the AWS S3 credentials you acquired in the previous section.

 

3.2 Copy the file from the S3 bucket to the EC2 instance:

aws s3 cp <s3 link> .

where the <s3 link> is one of the links acquired from the CMR search in the first section. For example, to copy a file from the ATLAS/ICESat-2 L2A Global Geolocated Photon Data, Version 5 (ATL03) data product:

aws s3 cp s3://nsidc-cumulus-prod-protected/ATLAS/ATL03/005/2018/10/14/ATL03_20181014003519_02350106_005_01.h5 .

 

Due to Earthdata Cloud S3 bucket permissions, you have to be in the us-west-2 region in order to copy files. So, make sure your instance is set up in this same region.
It costs to store data on your EC2 instance, and the amount you can store will depend on the configurations of your EC2 instance. If you wish to work with large volumes of data, this access workflow is not the best option. Instead consider direct S3 access, where the data are accessed in place, without the need to copy or transfer data from its original S3 bucket location. See the 'Resources' section for links to additional guidance on direct S3 access.

Resources

General Information about the Earthdata Cloud: Earthdata Cloud Cookbook - Cheatsheets, Guides and Slides

Tutorials for working with NASA DAAC data in the cloud, including tutorials on using the CMR API and direct S3 access: Earthdata Cloud Cookbook - Tutorials