3.6 Climate Model Data with Intake-ESM

3.6 Climate Model Data with Intake-ESM#

Note

The code for downloading climate model data in this section was adapted from this Pangeo tutorial and the Project Pythia CMIP6 Cookbook Thanks to Brian Rose and Pascal Bourgault for suggesting the addition of this section.

In addition to the ESGF archive and direct access to the Google Cloud CMIP data archive with xr.open_zarr, the Python package intake-esm can be useful for accessing climate model output. intake-esm works by accessing what’s called an ESM Collection Specification (also see this page), which describes a database of climate model data. One such databases we could access, which is maintained for the Pangeo project, is hosted on Google Cloud Services. We’ll work through how to access and search the data catalog and load the data to your local machine.

As usual, we’ll import the required packages.

import numpy as np
import xarray as xr
import gcsfs
import intake
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import xclim.ensembles as xce

# URLs for the Google Cloud CMIP6 ESM Collection Spec
url_google = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"

First let’s take a look at the data catalog for the Google CMIP6 archive. This shows a summary of all the CMIP6 data available from this database.

catalog = intake.open_esm_datastore(url_google)
catalog

pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):

unique
activity_id 18
institution_id 36
source_id 88
experiment_id 170
member_id 657
table_id 37
variable_id 700
grid_label 10
zstore 514818
dcpp_init_year 60
version 736
derived_variable_id 0

Now let’s search for a particular output variable from a certain model. The interface is similar to what we saw in Section 3.5, but we don’t need to use a long string to query a dataframe. intake-esm has a function for that. To switch things up, let’s now look for daily precipitation output for the historical and SSP3-7.0 simulations from the model IPSL-CM6A-LR.

search_result = catalog.search(source_id = "IPSL-CM6A-LR",
                               experiment_id=['historical', 'ssp370'], 
                               table_id='day', 
                               variable_id='pr')
# we can convert the search results to a pandas dataframe and print out the results
search_result.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP IPSL IPSL-CM6A-LR historical r8i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
1 CMIP IPSL IPSL-CM6A-LR historical r2i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
2 CMIP IPSL IPSL-CM6A-LR historical r7i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
3 CMIP IPSL IPSL-CM6A-LR historical r31i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
4 CMIP IPSL IPSL-CM6A-LR historical r5i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
5 CMIP IPSL IPSL-CM6A-LR historical r26i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
6 CMIP IPSL IPSL-CM6A-LR historical r29i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
7 CMIP IPSL IPSL-CM6A-LR historical r6i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
8 CMIP IPSL IPSL-CM6A-LR historical r25i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
9 CMIP IPSL IPSL-CM6A-LR historical r20i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
10 CMIP IPSL IPSL-CM6A-LR historical r24i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
11 CMIP IPSL IPSL-CM6A-LR historical r28i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
12 CMIP IPSL IPSL-CM6A-LR historical r23i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
13 CMIP IPSL IPSL-CM6A-LR historical r21i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
14 CMIP IPSL IPSL-CM6A-LR historical r30i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
15 CMIP IPSL IPSL-CM6A-LR historical r22i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
16 CMIP IPSL IPSL-CM6A-LR historical r18i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
17 CMIP IPSL IPSL-CM6A-LR historical r17i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
18 CMIP IPSL IPSL-CM6A-LR historical r11i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
19 CMIP IPSL IPSL-CM6A-LR historical r15i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
20 CMIP IPSL IPSL-CM6A-LR historical r16i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
21 CMIP IPSL IPSL-CM6A-LR historical r19i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
22 CMIP IPSL IPSL-CM6A-LR historical r1i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
23 CMIP IPSL IPSL-CM6A-LR historical r4i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
24 CMIP IPSL IPSL-CM6A-LR historical r27i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
25 CMIP IPSL IPSL-CM6A-LR historical r3i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
26 CMIP IPSL IPSL-CM6A-LR historical r9i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
27 CMIP IPSL IPSL-CM6A-LR historical r14i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
28 CMIP IPSL IPSL-CM6A-LR historical r10i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
29 CMIP IPSL IPSL-CM6A-LR historical r13i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
30 CMIP IPSL IPSL-CM6A-LR historical r12i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
31 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r2i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
32 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r1i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
33 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r10i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
34 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r3i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
35 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r4i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
36 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r5i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
37 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r6i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
38 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r7i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
39 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r9i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
40 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r8i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
41 CMIP IPSL IPSL-CM6A-LR historical r32i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20190802
42 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r14i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20191122

In principle, intake-esm can load all of these datasets in one function call using search_result.to_dataset_dict(zarr_kwargs={'consolidated': True}). We’ll do this, but to save time we’ll reduce the size of the data request by selecting only two ensemble members first.

# search again,specifying the ensemble members we want. Then print the dataframe again.
search_result_small = search_result.search(member_id = ['r1i1p1f1', 'r2i1p1f1'])
search_result_small.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP IPSL IPSL-CM6A-LR historical r2i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
1 CMIP IPSL IPSL-CM6A-LR historical r1i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
2 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r2i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
3 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r1i1p1f1 day pr gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190119
# now load the data into a dictionary and print the keys
ds_dict = search_result_small.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(ds_dict.keys())
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [2/2 00:07<00:00]
['ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp370.day.gr',
 'CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr']

intake-esm organized the datasets into a dictionary with two entries: one for the SSP3-7.0 scenario and one for the historical-forcing simulation. Did it put both realizations into a single dataset for each period? Let’s see:

ds_dict['CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr']
<xarray.Dataset> Size: 10GB
Dimensions:         (member_id: 2, dcpp_init_year: 1, time: 60265, lat: 143,
                     lon: 144, axis_nbounds: 2)
Coordinates:
  * lat             (lat) float32 572B -90.0 -88.73 -87.46 ... 87.46 88.73 90.0
  * lon             (lon) float32 576B 0.0 2.5 5.0 7.5 ... 352.5 355.0 357.5
  * time            (time) datetime64[ns] 482kB 1850-01-01T12:00:00 ... 2014-...
    time_bounds     (time, axis_nbounds) datetime64[ns] 964kB dask.array<chunksize=(30133, 1), meta=np.ndarray>
  * member_id       (member_id) object 16B 'r1i1p1f1' 'r2i1p1f1'
  * dcpp_init_year  (dcpp_init_year) float64 8B nan
Dimensions without coordinates: axis_nbounds
Data variables:
    pr              (member_id, dcpp_init_year, time, lat, lon) float32 10GB dask.array<chunksize=(1, 1, 843, 143, 144), meta=np.ndarray>
Attributes: (12/53)
    CMIP6_CV_version:                 cv=6.2.3.5-2-g63b123e
    Conventions:                      CF-1.7 CMIP-6.2
    EXPID:                            historical
    NCO:                              "4.6.0"
    activity_id:                      CMIP
    branch_method:                    standard
    ...                               ...
    intake_esm_attrs:variable_id:     pr
    intake_esm_attrs:grid_label:      gr
    intake_esm_attrs:version:         20180803
    intake_esm_attrs:_data_format_:   zarr
    variant_info:                     Restart from another point in piControl...
    intake_esm_dataset_key:           CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr

Yes it did, awesome! The dimension member_id in the dataset represents the different ensemble members. You can see how using intake-esm to load data is extremely convenient.

Some post-processing will be necessary to turn the data from a dictionary of different xr.Datasets to a single xr.Dataset. For example, we may wish to concatenate the historical and SSP3-7.0 output into a single time series. This is fairly trivial, but we’ll demonstrate how to do it anyway:

# extract each of the two datasets from the dictionary, and select an abritrary sample location
ds_historical = ds_dict['CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr']
ds_ssp370 = ds_dict['ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp370.day.gr']

# concatenate in time
ds_full_record = xr.concat([ds_historical, ds_ssp370], dim = 'time')
ds_full_record 
<xarray.Dataset> Size: 15GB
Dimensions:         (member_id: 2, dcpp_init_year: 1, time: 91676, lat: 143,
                     lon: 144, axis_nbounds: 2)
Coordinates:
  * lat             (lat) float32 572B -90.0 -88.73 -87.46 ... 87.46 88.73 90.0
  * lon             (lon) float32 576B 0.0 2.5 5.0 7.5 ... 352.5 355.0 357.5
  * time            (time) datetime64[ns] 733kB 1850-01-01T12:00:00 ... 2100-...
    time_bounds     (time, axis_nbounds) datetime64[ns] 1MB dask.array<chunksize=(30133, 1), meta=np.ndarray>
  * member_id       (member_id) object 16B 'r1i1p1f1' 'r2i1p1f1'
  * dcpp_init_year  (dcpp_init_year) float64 8B nan
Dimensions without coordinates: axis_nbounds
Data variables:
    pr              (member_id, dcpp_init_year, time, lat, lon) float32 15GB dask.array<chunksize=(1, 1, 843, 143, 144), meta=np.ndarray>
Attributes: (12/53)
    CMIP6_CV_version:                 cv=6.2.3.5-2-g63b123e
    Conventions:                      CF-1.7 CMIP-6.2
    EXPID:                            historical
    NCO:                              "4.6.0"
    activity_id:                      CMIP
    branch_method:                    standard
    ...                               ...
    intake_esm_attrs:variable_id:     pr
    intake_esm_attrs:grid_label:      gr
    intake_esm_attrs:version:         20180803
    intake_esm_attrs:_data_format_:   zarr
    variant_info:                     Restart from another point in piControl...
    intake_esm_dataset_key:           CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr

In order to handle data from multiple models, we can use xclim.ensembles to combine datasets for separate models into a single xr.Dataset.

# search for data from two models, one ensemble member.
search_result_multimodel = catalog.search(source_id = ["IPSL-CM6A-LR" ,'CESM2'],
                                          experiment_id='historical',
                                          member_id = 'r1i1p1f1',
                                          table_id='day', 
                                          variable_id='pr')
# print the results
search_result_multimodel.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP IPSL IPSL-CM6A-LR historical r1i1p1f1 day pr gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
1 CMIP NCAR CESM2 historical r1i1p1f1 day pr gn gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r1... NaN 20190401
# access the data
ds_dict_multimodel = search_result_multimodel.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(ds_dict_multimodel.keys())
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [2/2 00:00<00:00]
['CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr',
 'CMIP.NCAR.CESM2.historical.day.gn']
# put both models into a single dataset
ds_multimodel = xce.create_ensemble([ds_dict_multimodel[k] for k in ds_dict_multimodel.keys()],                             
                                    realizations = ["IPSL-CM6A-LR" ,'CESM2'],
                                    calendar = 'noleap')
ds_multimodel
<xarray.Dataset> Size: 92GB
Dimensions:         (realization: 2, member_id: 1, dcpp_init_year: 1,
                     time: 120451, lat: 333, lon: 288, axis_nbounds: 2, nbnd: 2)
Coordinates:
  * lat             (lat) float64 3kB -90.0 -89.06 -88.73 ... 88.73 89.06 90.0
  * lon             (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 356.2 357.5 358.8
  * time            (time) object 964kB 1850-01-01 00:00:00 ... 2015-01-01 00...
  * member_id       (member_id) object 8B 'r1i1p1f1'
  * dcpp_init_year  (dcpp_init_year) float64 8B nan
    time_bounds     (time, axis_nbounds) datetime64[ns] 2MB dask.array<chunksize=(60227, 1), meta=np.ndarray>
    lat_bnds        (lat, nbnd) float64 5kB dask.array<chunksize=(333, 2), meta=np.ndarray>
    lon_bnds        (lon, nbnd) float64 5kB dask.array<chunksize=(288, 2), meta=np.ndarray>
    time_bnds       (time, nbnd) object 2MB dask.array<chunksize=(60226, 1), meta=np.ndarray>
  * realization     (realization) <U12 96B 'IPSL-CM6A-LR' 'CESM2'
Dimensions without coordinates: axis_nbounds, nbnd
Data variables:
    pr              (realization, member_id, dcpp_init_year, time, lat, lon) float32 92GB dask.array<chunksize=(1, 1, 1, 1100, 333, 288), meta=np.ndarray>
Attributes: (12/67)
    CMIP6_CV_version:                 cv=6.2.3.5-2-g63b123e
    Conventions:                      CF-1.7 CMIP-6.2
    EXPID:                            historical
    NCO:                              "4.6.0"
    activity_id:                      CMIP
    branch_method:                    standard
    ...                               ...
    intake_esm_attrs:variable_id:     pr
    intake_esm_attrs:grid_label:      gr
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR...
    intake_esm_attrs:version:         20180803
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr