3.6 Climate Model Data with Intake-ESM#
Note
The code for downloading climate model data in this section was adapted from this Pangeo tutorial and the Project Pythia CMIP6 Cookbook Thanks to Brian Rose and Pascal Bourgault for suggesting the addition of this section.
In addition to the ESGF archive and direct access to the Google Cloud CMIP data archive with xr.open_zarr
, the Python package intake-esm
can be useful for accessing climate model output. intake-esm
works by accessing what’s called an ESM Collection Specification (also see this page), which describes a database of climate model data. One such databases we could access, which is maintained for the Pangeo project, is hosted on Google Cloud Services. We’ll work through how to access and search the data catalog and load the data to your local machine.
As usual, we’ll import the required packages.
import numpy as np
import xarray as xr
import gcsfs
import intake
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import xclim.ensembles as xce
# URLs for the Google Cloud CMIP6 ESM Collection Spec
url_google = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
First let’s take a look at the data catalog for the Google CMIP6 archive. This shows a summary of all the CMIP6 data available from this database.
catalog = intake.open_esm_datastore(url_google)
catalog
pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):
unique | |
---|---|
activity_id | 18 |
institution_id | 36 |
source_id | 88 |
experiment_id | 170 |
member_id | 657 |
table_id | 37 |
variable_id | 700 |
grid_label | 10 |
zstore | 514818 |
dcpp_init_year | 60 |
version | 736 |
derived_variable_id | 0 |
Now let’s search for a particular output variable from a certain model. The interface is similar to what we saw in Section 3.5, but we don’t need to use a long string to query a dataframe. intake-esm
has a function for that. To switch things up, let’s now look for daily precipitation output for the historical and SSP3-7.0 simulations from the model IPSL-CM6A-LR.
search_result = catalog.search(source_id = "IPSL-CM6A-LR",
experiment_id=['historical', 'ssp370'],
table_id='day',
variable_id='pr')
# we can convert the search results to a pandas dataframe and print out the results
search_result.df
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP | IPSL | IPSL-CM6A-LR | historical | r8i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
1 | CMIP | IPSL | IPSL-CM6A-LR | historical | r2i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
2 | CMIP | IPSL | IPSL-CM6A-LR | historical | r7i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
3 | CMIP | IPSL | IPSL-CM6A-LR | historical | r31i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
4 | CMIP | IPSL | IPSL-CM6A-LR | historical | r5i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
5 | CMIP | IPSL | IPSL-CM6A-LR | historical | r26i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
6 | CMIP | IPSL | IPSL-CM6A-LR | historical | r29i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
7 | CMIP | IPSL | IPSL-CM6A-LR | historical | r6i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
8 | CMIP | IPSL | IPSL-CM6A-LR | historical | r25i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
9 | CMIP | IPSL | IPSL-CM6A-LR | historical | r20i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
10 | CMIP | IPSL | IPSL-CM6A-LR | historical | r24i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
11 | CMIP | IPSL | IPSL-CM6A-LR | historical | r28i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
12 | CMIP | IPSL | IPSL-CM6A-LR | historical | r23i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
13 | CMIP | IPSL | IPSL-CM6A-LR | historical | r21i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
14 | CMIP | IPSL | IPSL-CM6A-LR | historical | r30i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
15 | CMIP | IPSL | IPSL-CM6A-LR | historical | r22i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
16 | CMIP | IPSL | IPSL-CM6A-LR | historical | r18i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
17 | CMIP | IPSL | IPSL-CM6A-LR | historical | r17i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
18 | CMIP | IPSL | IPSL-CM6A-LR | historical | r11i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
19 | CMIP | IPSL | IPSL-CM6A-LR | historical | r15i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
20 | CMIP | IPSL | IPSL-CM6A-LR | historical | r16i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
21 | CMIP | IPSL | IPSL-CM6A-LR | historical | r19i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
22 | CMIP | IPSL | IPSL-CM6A-LR | historical | r1i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
23 | CMIP | IPSL | IPSL-CM6A-LR | historical | r4i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
24 | CMIP | IPSL | IPSL-CM6A-LR | historical | r27i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
25 | CMIP | IPSL | IPSL-CM6A-LR | historical | r3i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
26 | CMIP | IPSL | IPSL-CM6A-LR | historical | r9i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
27 | CMIP | IPSL | IPSL-CM6A-LR | historical | r14i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
28 | CMIP | IPSL | IPSL-CM6A-LR | historical | r10i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
29 | CMIP | IPSL | IPSL-CM6A-LR | historical | r13i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
30 | CMIP | IPSL | IPSL-CM6A-LR | historical | r12i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
31 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r2i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
32 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r1i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
33 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r10i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
34 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r3i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
35 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r4i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
36 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r5i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
37 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r6i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
38 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r7i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
39 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r9i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
40 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r8i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
41 | CMIP | IPSL | IPSL-CM6A-LR | historical | r32i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20190802 |
42 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r14i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20191122 |
In principle, intake-esm
can load all of these datasets in one function call using search_result.to_dataset_dict(zarr_kwargs={'consolidated': True})
. We’ll do this, but to save time we’ll reduce the size of the data request by selecting only two ensemble members first.
# search again,specifying the ensemble members we want. Then print the dataframe again.
search_result_small = search_result.search(member_id = ['r1i1p1f1', 'r2i1p1f1'])
search_result_small.df
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP | IPSL | IPSL-CM6A-LR | historical | r2i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
1 | CMIP | IPSL | IPSL-CM6A-LR | historical | r1i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
2 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r2i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
3 | ScenarioMIP | IPSL | IPSL-CM6A-LR | ssp370 | r1i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... | NaN | 20190119 |
# now load the data into a dictionary and print the keys
ds_dict = search_result_small.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(ds_dict.keys())
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
['ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp370.day.gr',
'CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr']
intake-esm
organized the datasets into a dictionary with two entries: one for the SSP3-7.0 scenario and one for the historical-forcing simulation. Did it put both realizations into a single dataset for each period? Let’s see:
ds_dict['CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr']
<xarray.Dataset> Size: 10GB Dimensions: (member_id: 2, dcpp_init_year: 1, time: 60265, lat: 143, lon: 144, axis_nbounds: 2) Coordinates: * lat (lat) float32 572B -90.0 -88.73 -87.46 ... 87.46 88.73 90.0 * lon (lon) float32 576B 0.0 2.5 5.0 7.5 ... 352.5 355.0 357.5 * time (time) datetime64[ns] 482kB 1850-01-01T12:00:00 ... 2014-... time_bounds (time, axis_nbounds) datetime64[ns] 964kB dask.array<chunksize=(30133, 1), meta=np.ndarray> * member_id (member_id) object 16B 'r1i1p1f1' 'r2i1p1f1' * dcpp_init_year (dcpp_init_year) float64 8B nan Dimensions without coordinates: axis_nbounds Data variables: pr (member_id, dcpp_init_year, time, lat, lon) float32 10GB dask.array<chunksize=(1, 1, 843, 143, 144), meta=np.ndarray> Attributes: (12/53) CMIP6_CV_version: cv=6.2.3.5-2-g63b123e Conventions: CF-1.7 CMIP-6.2 EXPID: historical NCO: "4.6.0" activity_id: CMIP branch_method: standard ... ... intake_esm_attrs:variable_id: pr intake_esm_attrs:grid_label: gr intake_esm_attrs:version: 20180803 intake_esm_attrs:_data_format_: zarr variant_info: Restart from another point in piControl... intake_esm_dataset_key: CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr
Yes it did, awesome! The dimension member_id
in the dataset represents the different ensemble members. You can see how using intake-esm
to load data is extremely convenient.
Some post-processing will be necessary to turn the data from a dictionary of different xr.Dataset
s to a single xr.Dataset
. For example, we may wish to concatenate the historical and SSP3-7.0 output into a single time series. This is fairly trivial, but we’ll demonstrate how to do it anyway:
# extract each of the two datasets from the dictionary, and select an abritrary sample location
ds_historical = ds_dict['CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr']
ds_ssp370 = ds_dict['ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp370.day.gr']
# concatenate in time
ds_full_record = xr.concat([ds_historical, ds_ssp370], dim = 'time')
ds_full_record
<xarray.Dataset> Size: 15GB Dimensions: (member_id: 2, dcpp_init_year: 1, time: 91676, lat: 143, lon: 144, axis_nbounds: 2) Coordinates: * lat (lat) float32 572B -90.0 -88.73 -87.46 ... 87.46 88.73 90.0 * lon (lon) float32 576B 0.0 2.5 5.0 7.5 ... 352.5 355.0 357.5 * time (time) datetime64[ns] 733kB 1850-01-01T12:00:00 ... 2100-... time_bounds (time, axis_nbounds) datetime64[ns] 1MB dask.array<chunksize=(30133, 1), meta=np.ndarray> * member_id (member_id) object 16B 'r1i1p1f1' 'r2i1p1f1' * dcpp_init_year (dcpp_init_year) float64 8B nan Dimensions without coordinates: axis_nbounds Data variables: pr (member_id, dcpp_init_year, time, lat, lon) float32 15GB dask.array<chunksize=(1, 1, 843, 143, 144), meta=np.ndarray> Attributes: (12/53) CMIP6_CV_version: cv=6.2.3.5-2-g63b123e Conventions: CF-1.7 CMIP-6.2 EXPID: historical NCO: "4.6.0" activity_id: CMIP branch_method: standard ... ... intake_esm_attrs:variable_id: pr intake_esm_attrs:grid_label: gr intake_esm_attrs:version: 20180803 intake_esm_attrs:_data_format_: zarr variant_info: Restart from another point in piControl... intake_esm_dataset_key: CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr
In order to handle data from multiple models, we can use xclim.ensembles
to combine datasets for separate models into a single xr.Dataset
.
# search for data from two models, one ensemble member.
search_result_multimodel = catalog.search(source_id = ["IPSL-CM6A-LR" ,'CESM2'],
experiment_id='historical',
member_id = 'r1i1p1f1',
table_id='day',
variable_id='pr')
# print the results
search_result_multimodel.df
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP | IPSL | IPSL-CM6A-LR | historical | r1i1p1f1 | day | pr | gr | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
1 | CMIP | NCAR | CESM2 | historical | r1i1p1f1 | day | pr | gn | gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r1... | NaN | 20190401 |
# access the data
ds_dict_multimodel = search_result_multimodel.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(ds_dict_multimodel.keys())
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
['CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr',
'CMIP.NCAR.CESM2.historical.day.gn']
# put both models into a single dataset
ds_multimodel = xce.create_ensemble([ds_dict_multimodel[k] for k in ds_dict_multimodel.keys()],
realizations = ["IPSL-CM6A-LR" ,'CESM2'],
calendar = 'noleap')
ds_multimodel
<xarray.Dataset> Size: 92GB Dimensions: (realization: 2, member_id: 1, dcpp_init_year: 1, time: 120451, lat: 333, lon: 288, axis_nbounds: 2, nbnd: 2) Coordinates: * lat (lat) float64 3kB -90.0 -89.06 -88.73 ... 88.73 89.06 90.0 * lon (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 356.2 357.5 358.8 * time (time) object 964kB 1850-01-01 00:00:00 ... 2015-01-01 00... * member_id (member_id) object 8B 'r1i1p1f1' * dcpp_init_year (dcpp_init_year) float64 8B nan time_bounds (time, axis_nbounds) datetime64[ns] 2MB dask.array<chunksize=(60227, 1), meta=np.ndarray> lat_bnds (lat, nbnd) float64 5kB dask.array<chunksize=(333, 2), meta=np.ndarray> lon_bnds (lon, nbnd) float64 5kB dask.array<chunksize=(288, 2), meta=np.ndarray> time_bnds (time, nbnd) object 2MB dask.array<chunksize=(60226, 1), meta=np.ndarray> * realization (realization) <U12 96B 'IPSL-CM6A-LR' 'CESM2' Dimensions without coordinates: axis_nbounds, nbnd Data variables: pr (realization, member_id, dcpp_init_year, time, lat, lon) float32 92GB dask.array<chunksize=(1, 1, 1, 1100, 333, 288), meta=np.ndarray> Attributes: (12/67) CMIP6_CV_version: cv=6.2.3.5-2-g63b123e Conventions: CF-1.7 CMIP-6.2 EXPID: historical NCO: "4.6.0" activity_id: CMIP branch_method: standard ... ... intake_esm_attrs:variable_id: pr intake_esm_attrs:grid_label: gr intake_esm_attrs:zstore: gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR... intake_esm_attrs:version: 20180803 intake_esm_attrs:_data_format_: zarr intake_esm_dataset_key: CMIP.IPSL.IPSL-CM6A-LR.historical.day.gr