Skip to content

earthaccess API

earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing a higher abstraction for NASA’s Search API (CMR) so that searching for data can be done using a simpler notation instead of low level HTTP queries.

This library handles authentication with NASA’s OAuth2 API (EDL) and provides HTTP and AWS S3 sessions that can be used with xarray and other PyData libraries to access NASA EOSDIS datasets directly allowing scientists get to their science in a simpler and faster way, reducing barriers to cloud-based data analysis.


collection_query()

Returns a query builder instance for NASA collections (datasets).

Returns:

Type Description
CollectionQuery

a query builder instance for data collections.

download(granules, local_path=None, provider=None, threads=8, *, show_progress=None, credentials_endpoint=None, pqdm_kwargs=None)

Retrieves data granules from a remote storage system. Provide the optional local_path argument to prevent repeated downloads.

  • If we run this in the cloud, we will be using S3 to move data to local_path.
  • If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted, we'll use HTTP links.

Parameters:

Name Type Description Default
granules Union[DataGranule, List[DataGranule], str, List[str]]

a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP)

required
local_path Optional[Union[Path, str]]

Local directory to store the remote data granules. If not supplied, defaults to a subdirectory of the current working directory of the form data/YYYY-MM-DD-UUID, where YYYY-MM-DD is the year, month, and day of the current date, and UUID is the last 6 digits of a UUID4 value.

None
provider Optional[str]

if we download a list of URLs, we need to specify the provider.

None
credentials_endpoint Optional[str]

S3 credentials endpoint to be used for obtaining temporary S3 credentials. This is only required if the metadata doesn't include it, or we pass urls to the method instead of DataGranule instances.

None
threads int

parallel number of threads to use to download the files, adjust as necessary, default = 8

8
show_progress Optional[bool]

whether or not to display a progress bar. If not specified, defaults to True for interactive sessions (i.e., in a notebook or a python REPL session), otherwise False.

None
pqdm_kwargs Optional[Mapping[str, Any]]

Additional keyword arguments to pass to pqdm, a parallel processing library. See pqdm documentation for available options. Default is to use immediate exception behavior and the number of jobs specified by the threads parameter.

None

Returns:

Type Description
List[Path]

List of downloaded files

Raises:

Type Description
Exception

A file download failed.

get_edl_token()

Returns the current token used for EDL.

Returns:

Type Description
str

EDL token

get_fsspec_https_session()

Returns a fsspec session that can be used to access datafiles across many different DAACs.

Returns:

Type Description
AbstractFileSystem

An fsspec instance able to access data across DAACs.

Examples:

import earthaccess

earthaccess.login()
fs = earthaccess.get_fsspec_https_session()
with fs.open(DAAC_GRANULE) as f:
    f.read(10)

get_requests_https_session()

Returns a requests Session instance with an authorized bearer token. This is useful for making requests to restricted URLs, such as data granules or services that require authentication with NASA EDL.

Returns:

Type Description
Session

An authenticated requests Session instance.

Examples:

import earthaccess

earthaccess.login()

req_session = earthaccess.get_requests_https_session()
data = req_session.get(granule_url, headers = {"Range": "bytes=0-100"})

get_s3_credentials(daac=None, provider=None, results=None)

Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can use the daac name, the provider, or a list of results from earthaccess.search_data(). If we use results, earthaccess will use the metadata on the response to get the credentials, which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.

Parameters:

Name Type Description Default
daac Optional[str]

a DAAC short_name like NSIDC or PODAAC, etc.

None
provider Optional[str]

if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc.

None
results Optional[List[DataGranule]]

List of results from search_data()

None

Returns:

Type Description
Dict[str, Any]

a dictionary with S3 credentials for the DAAC or provider

get_s3_filesystem(daac=None, provider=None, results=None, endpoint=None)

Return an s3fs.S3FileSystem for direct access when running within the AWS us-west-2 region.

Parameters:

Name Type Description Default
daac Optional[str]

Any DAAC short name e.g. NSIDC, GES_DISC

None
provider Optional[str]

Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider.

None
results Optional[DataGranule]

A list of results from search_data(). earthaccess will use the metadata from CMR to obtain the S3 Endpoint.

None
endpoint Optional[str]

URL of a cloud provider credentials endpoint to be used for obtaining AWS S3 access credentials.

None

Returns:

Type Description
S3FileSystem

An authenticated s3fs session valid for 1 hour.

get_s3fs_session(daac=None, provider=None, results=None)

Returns a fsspec s3fs file session for direct access when we are in us-west-2.

Parameters:

Name Type Description Default
daac Optional[str]

Any DAAC short name e.g. NSIDC, GES_DISC

None
provider Optional[str]

Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider.

None
results Optional[DataGranule]

A list of results from search_data(). earthaccess will use the metadata from CMR to obtain the S3 Endpoint.

None

Returns:

Type Description
S3FileSystem

An s3fs.S3FileSystem authenticated for reading in-region in us-west-2 for 1 hour.

granule_query()

Returns a query builder instance for data granules.

Returns:

Type Description
GranuleQuery

a query builder instance for data granules.

login(strategy='all', persist=False, system=PROD)

Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).

Attempt to login via only the specified strategy, unless the "all" strategy is used, in which case each of the individual strategies is attempted in the following order, until one succeeds: "environment", "netrc", "interactive". In this case, only when all strategies fail does login fail.

Parameters:

Name Type Description Default
strategy str

An authentication method.

  • "all": try each of the following methods, in order, until one succeeds.
  • "environment": retrieve either an Earthdata login token from the EARTHDATA_TOKEN environment variable, or a username and password pair from the EARTHDATA_USERNAME and EARTHDATA_PASSWORD environment variables (specifying a token takes precedence).
  • "netrc": retrieve username and password from ~/.netrc (or ~/_netrc on Windows), or from the file specified by the NETRC environment variable.
  • "interactive": enter username and password via interactive prompts.
'all'
persist bool

if True, persist credentials to a .netrc file

False
system System

the Earthdata system to access

PROD

Returns:

Type Description
Auth

An instance of Auth.

Raises:

Type Description
LoginAttemptFailure

If the NASA Earthdata Login service rejects credentials.

open(granules, provider=None, *, credentials_endpoint=None, show_progress=None, pqdm_kwargs=None, open_kwargs=None)

Returns a list of file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.

Parameters:

Name Type Description Default
granules Union[List[str], List[DataGranule]]

a list of granule instances or list of URLs, e.g. s3://some-granule. If a list of URLs is passed, we need to specify the data provider.

required
provider Optional[str]

e.g. POCLOUD, NSIDC_CPRD, etc.

None
show_progress Optional[bool]

whether or not to display a progress bar. If not specified, defaults to True for interactive sessions (i.e., in a notebook or a python REPL session), otherwise False.

None
pqdm_kwargs Optional[Mapping[str, Any]]

Additional keyword arguments to pass to pqdm, a parallel processing library. See pqdm documentation for available options. Default is to use immediate exception behavior and the number of jobs specified by the threads parameter.

None
open_kwargs Optional[Dict[str, Any]]

Additional keyword arguments to pass to fsspec.open, such as cache_type and block_size. Defaults to using blockcache with a block size determined by the file size (4 to 16MB).

None

Returns:

Type Description
List[AbstractFileSystem]

A list of "file pointers" to remote (i.e. s3 or https) files.

search_data(count=-1, **kwargs)

Search for dataset files (granules) using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

The CMR does not permit queries across all granules in all collections in order to provide fast search responses. Granule queries must target a subset of the collections in the CMR using a condition like provider, provider_id, concept_id, collection_concept_id, short_name, version or entry_title.

Parameters:

Name Type Description Default
count int

Number of records to get, -1 = all

-1
kwargs Dict

arguments to CMR:

  • short_name: (str) Filter granules by product short name; e.g. ATL08
  • version: (str) Filter by dataset version
  • daac: (str) a provider code for any DAAC, e.g. NSIDC or PODAAC
  • data_center; (str) An alias for daac
  • provider: (str) Only match granules from a given provider. A DAAC can have more than one provider, e.g PODAAC and POCLOUD, NSIDC_ECS and NSIDC_CPRD.
  • cloud_hosted: (bool) If True, only match granules hosted in Earthdata Cloud
  • downloadable: (bool) If True, only match granules that are downloadable. A granule is downloadable when it contains at least one RelatedURL of type GETDATA.
  • online_only: (bool) Alias of downloadable
  • orbit_number; (float) Filter granule by the orbit number in which a granule was acquired
  • granule_name; (str) Filter by granule name. Granule name can contain wild cards, e.g MODGRNLD.*.daily.*.
  • instrument; (str) Filter by instrument name, e.g. "ATLAS"
  • platform; (str) Filter by platform, e.g. satellite or plane
  • cloud_cover: (tuple) Filter by cloud cover. Tuple is a range of cloud covers, e.g. (0, 20). Cloud cover values in metadata may be fractions (i.e. (0.,0.2)) or percentages. CMRS searches for cloud cover range based on values in metadata. Note collections without cloud_cover in metadata will return zero granules.
  • day_night_flag: (str) Filter for day- and night-time images, accepts 'day', 'night', 'unspecified'.
  • temporal: (tuple) A tuple representing temporal bounds in the form (date_from, date_to). Dates can be datetime objects or ISO 8601 formatted strings. Date strings can be full timestamps; e.g. YYYY-MM-DD HH:mm:ss or truncated YYYY-MM-DD
  • bounding_box: (tuple) Filter collection by those that intersect bounding box. A tuple representing spatial bounds in the form (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)
  • polygon: (list[tuples]) Filter by polygon. Polygon must be a list of tuples containing longitude-latitude pairs representing polygon vertices. Vertices must be in counter-clockwise order and the final vertex must be the same as the first vertex; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3), (lon4,lat4),(lon1,lat1)]
  • point: (tuple(float,float)) Filter by collections intersecting a point, where the point is a longitude-latitude pair; e.g. (lon,lat)
  • line: (list[tuples]) Filter collections that overlap a series of connected points. Points are represented as tuples containing longitude-latitude pairs; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3)]
  • circle: (tuple(float, float, float)) Filter collections that intersect a circle defined as a point with a radius. Circle parameters are a tuple containing latitude, longitude and radius in meters; e.g. (lon, lat, radius_m). The circle center cannot be the north or south poles. The radius mst be between 10 and 6,000,000 m
{}

Returns:

Type Description
List[DataGranule]

a list of DataGranules that can be used to access the granule files by using download() or open().

Raises:

Type Description
RuntimeError

The CMR query failed.

Examples:

granules = earthaccess.search_data(
    short_name="ATL06",
    bounding_box=(-46.5, 61.0, -42.5, 63.0),
    )
granules = earthaccess.search_data(
    doi="10.5067/SLREF-CDRV2",
    cloud_hosted=True,
    temporal=("2002-01-01", "2002-12-31")
)

search_datasets(count=-1, **kwargs)

Search datasets (collections) using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

Parameters:

Name Type Description Default
count int

Number of records to get, -1 = all

-1
kwargs Dict

arguments to CMR:

  • keyword: (str) Filter collections by keywords. Case-insensitive and supports wildcards ? and *
  • short_name: (str) Filter collections by product short name; e.g. ATL08
  • doi: (str) Filter by DOI
  • daac: (str) Filter by DAAC; e.g. NSIDC or PODAAC
  • data_center: (str) An alias for daac
  • provider: (str) Filter by data provider; each DAAC can have more than one provider, e.g. POCLOUD, PODAAC, etc.
  • has_granules: (bool) If true, only return collections with granules. Default: True
  • temporal: (tuple) A tuple representing temporal bounds in the form (date_from, date_to). Dates can be datetime objects or ISO 8601 formatted strings. Date strings can be full timestamps; e.g. YYYY-MM-DD HH:mm:ss or truncated YYYY-MM-DD
  • bounding_box: (tuple) Filter collection by those that intersect bounding box. A tuple representing spatial bounds in the form (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)
  • polygon: (List[tuples]) Filter by polygon. Polygon must be a list of tuples containing longitude-latitude pairs representing polygon vertices. Vertices must be in counter-clockwise order and the final vertex must be the same as the first vertex; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3),(lon4,lat4),(lon1,lat1)]
  • point: (Tuple[float,float]) Filter by collections intersecting a point, where the point is a longitude-latitude pair; e.g. (lon,lat)
  • line: (List[tuples]) Filter collections that overlap a series of connected points. Points are represented as tuples containing longitude-latitude pairs; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3)]
  • circle: (List[float, float, float]) Filter collections that intersect a circle defined as a point with a radius. Circle parameters are a list containing latitude, longitude and radius in meters; e.g. [lon, lat, radius_m]. The circle center cannot be the north or south poles. The radius mst be between 10 and 6,000,000 m
  • cloud_hosted: (bool) Return only collected hosted on Earthdata Cloud. Default: True
  • downloadable: (bool) If True, only return collections that can be downloaded from an online archive
  • concept_id: (str) Filter by Concept ID; e.g. C3151645377-NSIDC_CPRD
  • instrument: (str) Filter by Instrument name; e.g. ATLAS
  • project: (str) Filter by project or campaign name; e.g. ABOVE
  • fields: (List[str]) Return only the UMM fields listed in this parameter
  • revision_date: tuple(str,str) Filter by collections that have revision date within the range
  • debug: (bool) If True prints CMR request. Default: True
{}

Returns:

Type Description
List[DataCollection]

A list of DataCollection results that can be used to get information about a dataset, e.g. concept_id, doi, etc.

Raises:

Type Description
RuntimeError

The CMR query failed.

Examples:

datasets = earthaccess.search_datasets(
    keyword="sea surface anomaly",
    cloud_hosted=True
)
results = earthaccess.search_datasets(
    daac="NSIDC",
    bounding_box=(-73., 58., -10., 84.),
)
results = earthaccess.search_datasets(
    instrument="ATLAS",
    bounding_box=(-73., 58., -10., 84.),
    temporal=("2024-09-01", "2025-04-30"),
)

search_services(count=-1, **kwargs)

Search the NASA CMR for Services matching criteria.

See https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service.

Parameters:

Name Type Description Default
count int

maximum number of services to fetch (if less than 1, all services matching specified criteria are fetched [default])

-1
kwargs Any

keyword arguments accepted by the CMR for searching services

{}

Returns:

Type Description
List[Any]

list of services (possibly empty) matching specified criteria, in UMM

List[Any]

JSON format

Examples:

services = search_services(provider="POCLOUD", keyword="COG")

status(system=PROD, raise_on_outage=False)

Get the statuses of NASA's Earthdata services.

Parameters:

Name Type Description Default
system System

The Earthdata system to access, defaults to PROD.

PROD
raise_on_outage bool

If True, raises exception on errors or outages.

False

Returns:

Type Description
dict[str, str]

A dictionary containing the statuses of Earthdata services.

Examples:

>>> earthaccess.status()
{'Earthdata Login': 'OK', 'Common Metadata Repository': 'OK'}
>>> earthaccess.status(earthaccess.UAT)
{'Earthdata Login': 'OK', 'Common Metadata Repository': 'OK'}

Raises:

Type Description
ServiceOutage

if at least one service status is not "OK"

get_granule_credentials_endpoint_and_region(granule)

Retrieve credentials endpoint for direct access granule link.

Parameters:

Name Type Description Default
granule DataGranule

The first granule being included in the virtual dataset.

required

Returns:

Name Type Description
credentials_endpoint str

The S3 credentials endpoint. If this information is in the UMM-G record, then it is used from there. If not, a query for the collection is performed and the information is taken from the UMM-C record.

region str

Region for the data. Defaults to us-west-2. If the credentials endpoint is retrieved from the UMM-C record for the collection, the Region information is also used from UMM-C.

open_virtual_dataset(granule, group=None, access='indirect')

Open a granule as a single virtual xarray Dataset.

Uses NASA DMR++ metadata files to create a virtual xarray dataset with ManifestArrays. This virtual dataset can be used to create zarr reference files. See https://virtualizarr.readthedocs.io for more information on virtual xarray datasets.

Warning

This feature is current experimental and may change in the future. This feature relies on DMR++ metadata files which may not always be present for your dataset and you may get a FileNotFoundError.

Parameters:

Name Type Description Default
granule DataGranule

The granule to open

required
group str | None

Path to the netCDF4 group in the given file to open. If None, the root group will be opened. If the DMR++ file does not have groups, this parameter is ignored.

None
access str

The access method to use. One of "direct" or "indirect". Use direct when running on AWS, use indirect when running on a local machine.

'indirect'

Returns:

Type Description
Dataset

xarray.Dataset

Examples:

>>> results = earthaccess.search_data(count=2, temporal=("2023"), short_name="SWOT_L2_LR_SSH_Expert_2.0")
>>> vds =  earthaccess.open_virtual_dataset(results[0], access="indirect")
>>> vds
<xarray.Dataset> Size: 149MB
Dimensions:                                (num_lines: 9866, num_pixels: 69,
                                            num_sides: 2)
Coordinates:
    longitude                              (num_lines, num_pixels) int32 3MB ...
    latitude                               (num_lines, num_pixels) int32 3MB ...
    latitude_nadir                         (num_lines) int32 39kB ManifestArr...
    longitude_nadir                        (num_lines) int32 39kB ManifestArr...
Dimensions without coordinates: num_lines, num_pixels, num_sides
Data variables: (12/98)
    height_cor_xover_qual                  (num_lines, num_pixels) uint8 681kB ManifestArray<shape=(9866, 69), dtype=uint8, chunks=(9866, 69...
>>> vds.virtualize.to_kerchunk("swot_2023_ref.json", format="json")

open_virtual_mfdataset(granules, group=None, access='indirect', preprocess=None, parallel='dask', load=True, reference_dir=None, reference_format='json', **xr_combine_nested_kwargs)

Open multiple granules as a single virtual xarray Dataset.

Uses NASA DMR++ metadata files to create a virtual xarray dataset with ManifestArrays. This virtual dataset can be used to create zarr reference files. See https://virtualizarr.readthedocs.io for more information on virtual xarray datasets.

Warning

This feature is current experimental and may change in the future. This feature relies on DMR++ metadata files which may not always be present for your dataset and you may get a FileNotFoundError.

Parameters:

Name Type Description Default
granules list[DataGranule]

The granules to open

required
group str | None

Path to the netCDF4 group in the given file to open. If None, the root group will be opened. If the DMR++ file does not have groups, this parameter is ignored.

None
access str

The access method to use. One of "direct" or "indirect". Use direct when running on AWS, use indirect when running on a local machine.

'indirect'
preprocess callable | None

A function to apply to each virtual dataset before combining

None
parallel Literal['dask', 'lithops', False]

Open the virtual datasets in parallel (using dask.delayed or lithops)

'dask'
load bool

If load is True, earthaccess will serialize the virtual references in order to use lazy indexing on the resulting xarray virtual ds.

True
reference_dir str | None

Directory to store kerchunk references. If None, a temporary directory will be created and deleted after use.

None
reference_format Literal['json', 'parquet']

When load is True, earthaccess will serialize the references using this format, json (default) or parquet.

'json'
xr_combine_nested_kwargs Any

Xarray arguments describing how to concatenate the datasets. Keyword arguments for xarray.combine_nested. See https://docs.xarray.dev/en/stable/generated/xarray.combine_nested.html

{}

Returns:

Type Description
Dataset

Concatenated xarray.Dataset

Examples:

>>> results = earthaccess.search_data(count=5, temporal=("2024"), short_name="MUR-JPL-L4-GLOB-v4.1")
>>> vds = earthaccess.open_virtual_mfdataset(results, access="indirect", load=False, concat_dim="time", coords="minimal", compat="override", combine_attrs="drop_conflicts")
>>> vds
<xarray.Dataset> Size: 29GB
Dimensions:           (time: 5, lat: 17999, lon: 36000)
Coordinates:
    time              (time) int32 20B ManifestArray<shape=(5,), dtype=int32,...
    lat               (lat) float32 72kB ManifestArray<shape=(17999,), dtype=...
    lon               (lon) float32 144kB ManifestArray<shape=(36000,), dtype...
Data variables:
    mask              (time, lat, lon) int8 3GB ManifestArray<shape=(5, 17999...
    sea_ice_fraction  (time, lat, lon) int8 3GB ManifestArray<shape=(5, 17999...
    dt_1km_data       (time, lat, lon) int8 3GB ManifestArray<shape=(5, 17999...
    analysed_sst      (time, lat, lon) int16 6GB ManifestArray<shape=(5, 1799...
    analysis_error    (time, lat, lon) int16 6GB ManifestArray<shape=(5, 1799...
    sst_anomaly       (time, lat, lon) int16 6GB ManifestArray<shape=(5, 1799...
Attributes: (12/42)
    Conventions:                CF-1.7
    title:                      Daily MUR SST, Final product

>>> vds.virtualize.to_kerchunk("mur_combined.json", format="json")
>>> vds = open_virtual_mfdataset(results, access="indirect", concat_dim="time", coords='minimal', compat='override', combine_attrs="drop_conflicts")
>>> vds
<xarray.Dataset> Size: 143GB
Dimensions:           (time: 5, lat: 17999, lon: 36000)
Coordinates:
* lat               (lat) float32 72kB -89.99 -89.98 -89.97 ... 89.98 89.99
* lon               (lon) float32 144kB -180.0 -180.0 -180.0 ... 180.0 180.0
* time              (time) datetime64[ns] 40B 2024-01-01T09:00:00 ... 2024-...
Data variables:
    analysed_sst      (time, lat, lon) float64 26GB dask.array<chunksize=(1, 3600, 7200), meta=np.ndarray>
    analysis_error    (time, lat, lon) float64 26GB dask.array<chunksize=(1, 3600, 7200), meta=np.ndarray>
    dt_1km_data       (time, lat, lon) timedelta64[ns] 26GB dask.array<chunksize=(1, 4500, 9000), meta=np.ndarray>
    mask              (time, lat, lon) float32 13GB dask.array<chunksize=(1, 4500, 9000), meta=np.ndarray>
    sea_ice_fraction  (time, lat, lon) float64 26GB dask.array<chunksize=(1, 4500, 9000), meta=np.ndarray>
    sst_anomaly       (time, lat, lon) float64 26GB dask.array<chunksize=(1, 3600, 7200), meta=np.ndarray>
Attributes: (12/42)
    Conventions:                CF-1.7
    title:                      Daily MUR SST, Final product