6.2 Edmonton AC Loads: Multi-Model Analysis#

This notebook continues the workflow example about AC loads in the city of Edmonton, using a single ECCC station site as the spatial domain. Here we expand that example by including multiple CMIP6 models in the analysis. This allows us to characterize the uncertainty in our future projections due to uncertainty in modeling choices. We’ll use one ensemble member each from the models listed below, chosen because they have data available on Google Cloud for the ensemble member r4i1p1f1 (chosen because this member_id was used for many ssp370 runs) and both the historical and ssp370 scenarios. From each modeling centre, only one model is chosen (e.g. we don’t include both EC-Earth3 and EC-Earth3-AerChem from the EC-Earth-Consortium institution).

Models:

  • AWI-CM-1-1-MR

  • CESM2

  • CanESM5

  • EC-Earth3

  • FGOALS-g3

  • INM-CM5-0

  • IPSL-CM6A-LR

  • MPI-ESM1-2-LR

  • MRI-ESM2-0

The guided survey on the main UTCDW website allows users to clone a survey, in order to change your response to one or more questions. To generate the flowchart below, I cloned the survey that generated the flowchart shown in 6.1, and changed the answer to the model data question from “CESM2” to “CMIP6 Multi-Model Ensemble”, along with the title.

../_images/survey_station_multimodel.png

6.2.1 Multi-Model Analysis#

In the hidden code cells below, we’ll re-access the same station data as we did in the previous section. Then we’ll search the data catalog to get the download URLs for these model simulations, access the data, sub-select the data to our desired location and time periods, and then combine it all together using the xclim.ensembles framework.

Hide code cell content
import xarray as xr
from xclim import sdba
from xclim.core.calendar import convert_calendar
import xclim.indicators as xci
import xclim.ensembles as xce
import numpy as np
import pandas as pd
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import ec3
import gcsfs
import zarr
import os

# lat and lon coordinates for Edmonton
lat_edm = 53.5
lon_edm = -113.5

# time periods for historical and future periods
years_hist = range(1980, 2011) # remember that range(start, end) is not inclusive of `end`
years_future = range(2070, 2101)

# url for the CSV file that contains the data catalog
url_gcsfs_catalog = 'https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv'
Hide code cell content
# get the same station data as from 6.1. 
# The data file will only exist if you've already run
# the code for 6.1
stn_ds = xr.open_dataset('data_files/station_data_edmonton.nc')
stn_ds_noleap = convert_calendar(stn_ds, 'noleap')
tas_obs_noleap = stn_ds_noleap.tas
stn_lat = float(stn_ds.lat.values)
stn_lon = float(stn_ds.lon.values)
Hide code cell content
# open the Google Cloud model data catalog with pandas
df_catalog = pd.read_csv(url_gcsfs_catalog)

# search for models which have daily tas data from the historical and SSP3-3.0 scenarios
search_string_mm = "table_id == 'day' & variable_id == 'tas' & member_id == 'r4i1p1f1'" 
# continue on the next line
search_string_mm += " & experiment_id == ['historical', 'ssp370']"
df_search_mm = df_catalog.query(search_string_mm).sort_values('source_id')
df_search_mm
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
447474 CMIP CSIRO ACCESS-ESM1-5 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/hist... NaN 20200529
45526 CMIP AWI AWI-CM-1-1-MR historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/histor... NaN 20181218
203937 ScenarioMIP AWI AWI-CM-1-1-MR ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/AWI/AWI-CM-1-1-MR... NaN 20190529
61604 CMIP NCAR CESM2 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r4... NaN 20190308
446493 ScenarioMIP NCAR CESM2 ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/NCAR/CESM2/ssp370... NaN 20200528
91510 CMIP CCCma CanESM5 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical... NaN 20190429
108616 ScenarioMIP CCCma CanESM5 ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp... NaN 20190429
439125 CMIP EC-Earth-Consortium EC-Earth3 historical r4i1p1f1 day tas gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20200425
438347 ScenarioMIP EC-Earth-Consortium EC-Earth3 ssp370 r4i1p1f1 day tas gr gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti... NaN 20200425
501363 CMIP EC-Earth-Consortium EC-Earth3-AerChem historical r4i1p1f1 day tas gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20201214
513242 ScenarioMIP EC-Earth-Consortium EC-Earth3-AerChem ssp370 r4i1p1f1 day tas gr gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti... NaN 20210120
439284 CMIP EC-Earth-Consortium EC-Earth3-Veg historical r4i1p1f1 day tas gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20200425
438424 ScenarioMIP EC-Earth-Consortium EC-Earth3-Veg ssp370 r4i1p1f1 day tas gr gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti... NaN 20200425
395178 ScenarioMIP CAS FGOALS-g3 ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/CAS/FGOALS-g3/ssp... NaN 20191212
376691 CMIP CAS FGOALS-g3 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/CAS/FGOALS-g3/historical... NaN 20191030
215875 CMIP INM INM-CM5-0 historical r4i1p1f1 day tas gr1 gs://cmip6/CMIP6/CMIP/INM/INM-CM5-0/historical... NaN 20190704
244040 ScenarioMIP INM INM-CM5-0 ssp370 r4i1p1f1 day tas gr1 gs://cmip6/CMIP6/ScenarioMIP/INM/INM-CM5-0/ssp... NaN 20190723
208258 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r4i1p1f1 day tas gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190614
206858 CMIP IPSL IPSL-CM6A-LR historical r4i1p1f1 day tas gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20190614
363884 CMIP MIROC MIROC6 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/MIROC/MIROC6/historical/... NaN 20191016
239961 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-H... NaN 20190710
238542 CMIP MPI-M MPI-ESM1-2-HR historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-HR/hist... NaN 20190710
232957 CMIP MPI-M MPI-ESM1-2-LR historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/hist... NaN 20190710
224968 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/MPI-M/MPI-ESM1-2-... NaN 20190710
296317 ScenarioMIP MRI MRI-ESM2-0 ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/MRI/MRI-ESM2-0/ss... NaN 20190927
205666 CMIP MRI MRI-ESM2-0 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/MRI/MRI-ESM2-0/historica... NaN 20190603
249084 CMIP NUIST NESM3 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/NUIST/NESM3/historical/r... NaN 20190812
462633 CMIP NCC NorCPM1 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/NCC/NorCPM1/historical/r... NaN 20200724
Hide code cell content
# refine the list to include only the chosen models
models = ['AWI-CM-1-1-MR', 'CESM2', 'CanESM5', 'EC-Earth3', 
          'FGOALS-g3','INM-CM5-0', 'IPSL-CM6A-LR', 
          'MPI-ESM1-2-LR', 'MRI-ESM2-0']
df_search_mm = df_search_mm.query(f"source_id == {models}")
df_search_mm
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
45526 CMIP AWI AWI-CM-1-1-MR historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/histor... NaN 20181218
203937 ScenarioMIP AWI AWI-CM-1-1-MR ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/AWI/AWI-CM-1-1-MR... NaN 20190529
61604 CMIP NCAR CESM2 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r4... NaN 20190308
446493 ScenarioMIP NCAR CESM2 ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/NCAR/CESM2/ssp370... NaN 20200528
91510 CMIP CCCma CanESM5 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical... NaN 20190429
108616 ScenarioMIP CCCma CanESM5 ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp... NaN 20190429
439125 CMIP EC-Earth-Consortium EC-Earth3 historical r4i1p1f1 day tas gr gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20200425
438347 ScenarioMIP EC-Earth-Consortium EC-Earth3 ssp370 r4i1p1f1 day tas gr gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti... NaN 20200425
395178 ScenarioMIP CAS FGOALS-g3 ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/CAS/FGOALS-g3/ssp... NaN 20191212
376691 CMIP CAS FGOALS-g3 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/CAS/FGOALS-g3/historical... NaN 20191030
215875 CMIP INM INM-CM5-0 historical r4i1p1f1 day tas gr1 gs://cmip6/CMIP6/CMIP/INM/INM-CM5-0/historical... NaN 20190704
244040 ScenarioMIP INM INM-CM5-0 ssp370 r4i1p1f1 day tas gr1 gs://cmip6/CMIP6/ScenarioMIP/INM/INM-CM5-0/ssp... NaN 20190723
208258 ScenarioMIP IPSL IPSL-CM6A-LR ssp370 r4i1p1f1 day tas gr gs://cmip6/CMIP6/ScenarioMIP/IPSL/IPSL-CM6A-LR... NaN 20190614
206858 CMIP IPSL IPSL-CM6A-LR historical r4i1p1f1 day tas gr gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20190614
232957 CMIP MPI-M MPI-ESM1-2-LR historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/hist... NaN 20190710
224968 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/MPI-M/MPI-ESM1-2-... NaN 20190710
296317 ScenarioMIP MRI MRI-ESM2-0 ssp370 r4i1p1f1 day tas gn gs://cmip6/CMIP6/ScenarioMIP/MRI/MRI-ESM2-0/ss... NaN 20190927
205666 CMIP MRI MRI-ESM2-0 historical r4i1p1f1 day tas gn gs://cmip6/CMIP6/CMIP/MRI/MRI-ESM2-0/historica... NaN 20190603
Hide code cell content
# function for pre-processing data
def get_and_process_data(catalog_df, model, scenario, gcs, lat, lon, years):
    # get the ztore url for this model and scenario
    df_scen = catalog_df.query(f"source_id == '{model}' & experiment_id == '{scenario}'")
    zstore_url = df_scen.zstore.values[0]
    
    # get the GCS mapper from the url
    mapper = gcs.get_mapper(zstore_url)
    
    # open the file with xarray
    ds = xr.open_zarr(mapper, consolidated = True)
    
    # get the tas data, select the time period, and interp to the desired location
    tas_loc = ds.tas.sel(time = ds.time.dt.year.isin(years)).interp(lat = lat, lon = lon)
    
    # drop 'height' coordinate, which is always 2m but isn't present on all datasets
    if 'height' in tas_loc.coords.keys():
        tas_loc = tas_loc.reset_coords('height', drop = True)
        
    # some datasets put the date at 12:00 whereas some put it at 00:00. To make all
    # of them consistent, simply change the time coordinate to the date only
    tas_loc = tas_loc.assign_coords(time = tas_loc.time.dt.floor('D'))
    # convert from Kelvin to Celsius and return
    
    tas_loc = tas_loc - 273.15
    return tas_loc.compute()
Hide code cell content
# function for downloading the data
def download_data_multimodel(catalog_df, gcs, 
                             models, scenario,
                             stn_lat, stn_lon,
                             years_hist, years_future):

    ds_list_hist = []
    ds_list_future = []
    for model in models:
        print(f"========================{model}=============================")
        print('historical')
        tas_model_hist = get_and_process_data(catalog_df,
                                              model, 
                                              'historical', 
                                              gcs, 
                                              stn_lat,
                                              stn_lon,
                                              years_hist)
        ds_list_hist.append(tas_model_hist)
    
        # get the future simulation data for this model and scenario
        print(scenario)
        tas_model_future = get_and_process_data(catalog_df, 
                                                model, 
                                                scenario, 
                                                gcs, 
                                                stn_lat, 
                                                stn_lon, 
                                                years_future)
        ds_list_future.append(tas_model_future)

    print('finished acquiring model data')
    
    # concatenate the ds_lists together
    ds_ens_hist_raw = xce.create_ensemble(ds_list_hist,   
    # names of each model, which is represented by a dimension called 'realization'
                                          realizations = models,
                                          # common calendar to place all models onto
                                          calendar = 'noleap') 

    ds_ens_future_raw = xce.create_ensemble(ds_list_future, 
                                           realizations = models,
                                           calendar = 'noleap')

    
    # return
    return ds_ens_hist_raw, ds_ens_future_raw
# authenticate access to Google Cloud
gcs = gcsfs.GCSFileSystem(token='anon')

# file names to save the downloaded data, to save time later when re-running this notebook
fout_hist = 'data_files/tas.cmip6.daily.historical.r4i1p1f1.1980-2010.edmonton.nc'
fout_future = 'data_files/tas.cmip6.daily.ssp3.r4i1p1f1.2070-2100.edmonton.nc'

# use the function to download the data, this may take a few minutes to run
if (not os.path.exists(fout_hist)) or (not os.path.exists(fout_future)):
    ds_ens_hist_raw, ds_ens_ssp3_raw = download_data_multimodel(df_search_mm, 
                                                                gcs, 
                                                                models, 
                                                                "ssp370",
                                                                stn_lat, 
                                                                stn_lon, 
                                                                years_hist, 
                                                                years_future)

    # write the data to output files 
    ds_ens_hist_raw.to_netcdf(fout_hist)
    ds_ens_ssp3_raw.to_netcdf(fout_future)
else:
    # open the files that already exist
    ds_ens_hist_raw = xr.open_dataset(fout_hist)
    ds_ens_ssp3_raw = xr.open_dataset(fout_future)

6.2.2 Exploratory Multi-Model Analysis#

Warning

Whether it’s calculating a climate indicator from model output, or doing downscaling/bias correction, calculating ensemble statistics (across different models or different realizations of the same model) like a mean or standard deviation should be done as the last step before plotting or interpreting the results.

Calculating an ensemble mean too early in an analysis attenuates variability, and can artificially alter the model projections. Since different realizations of the same date across different ensemble members are unrelated, a given date (e.g., June 4th 2037, as an example) might exceed the CDD temperature threshold in some ensemble members, but not the others. The ensemble mean for that date (which isn’t a meaningful quantity) might fall below the threshold, so that day would contribute zero to the total CDDs for that year. If one did the CDD calculation first and then calculated the ensemble mean, you’d get a larger number, which more faithfully reflects what the model is projecting.

With this multi-model ensemble, we can re-do all of the same steps as before with the single model. First, let’s plot the daily climatologies of tas for each model, and compare them to the observations. To make the plot more legible, we’ll plot shading for only the observed one-\(\sigma\) range. We can calculate the daily climatology for all models with a single line of code:

Hide code cell content
# calculate daily climatologies for the multi-model ensemble
tas_clim_mm_hist_raw = ds_ens_hist_raw.tas.groupby('time.dayofyear').mean('time').compute()

# same for observations
tas_dailyclim_obs = tas_obs_noleap.groupby('time.dayofyear').mean('time').compute()
tas_dailyclim_std_obs = tas_obs_noleap.groupby('time.dayofyear').std('time').compute()
Hide code cell source
# plot the daily climatologies for all models
fig, ax = plt.subplots(figsize = (8,6))

# daily climatologies as 1D curves
# obs
tas_dailyclim_obs.plot.line(ax = ax, label = "Station Obs", color = 'k', linewidth = 3)

# models
tas_clim_mm_hist_raw.plot.line(ax = ax, hue = 'realization')

# 1 sigma shading
# obs
ax.fill_between(tas_dailyclim_obs.dayofyear,
                tas_dailyclim_obs - tas_dailyclim_std_obs, 
                tas_dailyclim_obs + tas_dailyclim_std_obs,
                alpha = 0.2, color = 'k')

ax.set_title("Edmonton Daily Mean Temperature Daily Climatology")
plt.show()
../_images/369d5bc40e2147108b6de7538ec936b87a441c5925a6a17f78f0b269176b3a43.png

This plot is quite illustrative of the variations across models. Like the observed internal variability, there is also more variation across models in the winter season. There is also a range of biases across the different models, with some having means that fall near the low end of the normal observed range (like MPI-ESM1-2-LR) and others that peak at the high end of the observed range (like FGOALS-g3). These models will likely produce very different CDDs than the observations or from the first model we looked at, CESM2. Let’s do the CDD calculation for each model using the raw data.

Hide code cell content
# assign unit to temperature data
tas_ens_hist_raw = ds_ens_hist_raw.tas
tas_ens_hist_raw.attrs['units'] = 'degC'
tas_obs_noleap.attrs['units'] = 'degC'

# calculate CDDs
cdd_obs =  xci.atmos.cooling_degree_days(tas_obs_noleap).compute()
cdds_mm_hist_raw = xci.atmos.cooling_degree_days(tas_ens_hist_raw).compute()

# long-term means and stdevs
cdds_mm_hist_raw_ltm = cdds_mm_hist_raw.mean('time')
cdds_mm_hist_raw_stdev = cdds_mm_hist_raw.std('time')

cdd_obs_ltm = cdd_obs.mean('time')
cdd_obs_stdev = cdd_obs.std('time')
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:42: UserWarning: Variable does not have a `cell_methods` attribute.
  _check_cell_methods(
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:46: UserWarning: Variable does not have a `standard_name` attribute.
  check_valid(vardata, "standard_name", data["standard_name"])
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:42: UserWarning: Variable does not have a `cell_methods` attribute.
  _check_cell_methods(
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:46: UserWarning: Variable does not have a `standard_name` attribute.
  check_valid(vardata, "standard_name", data["standard_name"])
Hide code cell source
# plot timeseries of CDDs for multi-model historical ensemble and obs
fig, axes = plt.subplots(ncols = 2, figsize = (12, 4))
cdd_obs.plot.line(ax = axes[0], color = 'k', linewidth = 3)
lines = cdds_mm_hist_raw.plot.line(ax = axes[0], hue = 'realization')
# set legend location
sns.move_legend(axes[0], loc='center left', bbox_to_anchor=(-0.55, 0.5))
axes[0].set_title("Edmonton Annual Cooling Degree Days")

# plot long-term means
bars = axes[1].bar(['Obs'] + models, np.append(cdd_obs_ltm.values, cdds_mm_hist_raw_ltm.values),
                  yerr = np.append(cdd_obs_stdev.values, cdds_mm_hist_raw_stdev.values), 
                  capsize = 5, error_kw = {'ecolor': 'darkgrey', 'elinewidth': 2})

axes[1].set_title("Edmonton Cooling Degree Days - Long Term Mean")

# rotate the model labels so they're legible
axes[1].set_xticks(axes[1].get_xticks(), axes[1].get_xticklabels(), rotation = 45, ha = 'right')

# match bar colors to line colors
bars[0].set_color('k')
for i, b in enumerate(bars[1:]):
    b.set_color(lines[i].get_color())
     
for ax in axes:
    ax.set(ylabel = "CDD")

plt.show()
../_images/84330249f69680732dc7e57c8cc2bd680d37548e9d3a7dfc619693133928c416.png

The inter-model variance in temperature has resulted in major variations in CDDs. The model with the coldest summer temperatures (MPI-ESM1-2-LR) has single-digit CDDs on average, while FGOALS-g3 exceeds 160 per year. This is an enormous range of biases, so for many of these models the bias correction should have a larger effect than it did for CESM2.

There are multiple reasons why MPI-ESM1-2-LR has such a substantial low bias in historical CDDs. First, it may have too few days that exceed the temperature threshold and second, it days that do exceed the threshold may not exceed it by as much as in observations. Let’s investigate the first reason by simply calculating the mean annual number of days over \(18^{\circ}\)C in observations and in MPI-ESM1-2-LR.

days_over_thresh_obs = xr.where(tas_obs_noleap > 18, 
                                1, 0).groupby('time.year').sum('time').mean('year')
tas_mpi_hist_raw = tas_ens_hist_raw.sel(realization = "MPI-ESM1-2-LR")
days_over_thresh_mpi = xr.where(tas_mpi_hist_raw > 18, 
                                1, 0).groupby('time.year').sum('time').mean('year')

print(f"Mean Annual Days over Threshold (Obs): {days_over_thresh_obs.values}")
print(f"Mean Annual Days over Threshold (MPI-ESM1-2-LR): {days_over_thresh_mpi.values}")
Mean Annual Days over Threshold (Obs): 36.70967741935484
Mean Annual Days over Threshold (MPI-ESM1-2-LR): 4.064516129032258

Indeed, MPI-ESM1-2-LR has far too few days per year that contribute to the CDD calculation. Nearly 37 days per year exceed the temperature threshold in the observations, but only about 4 per year in this model. With bias correction, we should expect this number to go way up for the model, and as a result, the CDDs calculated from bias-adjusted MPI-ESM1-2-LR should be much greater than for the raw model.

Now let’s examine the future projections from the raw model data. First the change in daily climatologies, then the CDDs.

Hide code cell content
# calculate daily climatologies for the multi-model ensemble, SSP3-7.0 end-of-century
tas_clim_mm_ssp3_raw = ds_ens_ssp3_raw.tas.groupby('time.dayofyear').mean('time').compute()

# climate change signal for daily clim
tas_clim_mm_delta_raw = tas_clim_mm_ssp3_raw - tas_clim_mm_hist_raw
Hide code cell source
# plot the daily climatologies for all models
fig, ax = plt.subplots(figsize = (6,4))

# daily climatologies as 1D curves
# historical
tas_clim_mm_delta_raw.plot.line(ax = ax, hue = 'realization')

sns.move_legend(ax, loc='center left', bbox_to_anchor=(1.05, 0.5))

ax.set_title("Change in Edmonton Daily Mean Temperature Daily Climatology\nSSP3-7.0 minus Historical")
plt.show()
../_images/b2c95bb3f5471ec35dba996f5bc750d4c9cba1739497a8e1b9e4b4f1a59bf57a.png

There is also a notable amount of model spread in the projections of future daily mean temperatures in Edmonton, especially in the winter season. The mean changes of up to 10 degrees for certain changes might have a big impact on the number of CDDs since this can extend the warm season and thus the number of days that exceed the temperature threshold. Interestingly, CESM2 shows the greatest peak warming in the summer season, so the projections for that model may be on the high end of the multi-model range. To investigate, let’s plot the mean changes in the number of CDDs for each model.

Hide code cell content
# calculate CDDs for future period
tas_ens_ssp3_raw = ds_ens_ssp3_raw.tas
tas_ens_ssp3_raw.attrs['units'] = 'degC'

cdds_mm_ssp3_raw = xci.atmos.cooling_degree_days(tas_ens_ssp3_raw).compute()

# long-term means
cdds_mm_ssp3_raw_ltm = cdds_mm_ssp3_raw.mean('time')

# climate change deltas
cdds_mm_ssp3_raw_delta = cdds_mm_ssp3_raw_ltm - cdds_mm_hist_raw_ltm
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:42: UserWarning: Variable does not have a `cell_methods` attribute.
  _check_cell_methods(
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:46: UserWarning: Variable does not have a `standard_name` attribute.
  check_valid(vardata, "standard_name", data["standard_name"])
Hide code cell source
# plot deltas to CDDs
fig, ax = plt.subplots( figsize = (6, 4))
# plot long-term means
bars = ax.bar(models, cdds_mm_ssp3_raw_delta.values)
ax.set_title("Edmonton Cooling Degree Days - Change in Long Term Mean\nSSP3-7.0 minus Historical")

# rotate the model labels so they're legible
ax.set_xticks(ax.get_xticks(), ax.get_xticklabels(), 
              rotation = 45, ha = 'right')

# match bar colors to line colors
for i, b in enumerate(bars):
    b.set_color(lines[i].get_color())
     
ax.set_ylabel('CDD')

plt.show()
../_images/c8d544d298c53c01169a1da1b1e4b96a9f18c30d3210a0b97d7dacdb08ebc70a.png

Indeed CESM2 is the model that projects the largest increase to CDDs in its raw output, though it is not alone as an outlier. Interestingly the MPI model, which has the fewest historical CDDs, also projects the smallest increase. It’s possible that this model’s cold bias means that even after a similar amount of warming to other models, the baseline is so low that not many days exceed the temperature threshold by a large amount. Let’s calculate the mean number of days per year that contribute to the CDD total for the future period:

tas_mpi_ssp3_raw = tas_ens_ssp3_raw.sel(realization = "MPI-ESM1-2-LR")
days_over_thresh_mpi_ssp3 = xr.where(tas_mpi_ssp3_raw > 18, 
                                     1, 0).groupby('time.year').sum('time').mean('year')

print(f"Mean Annual Days over Threshold (MPI-ESM1-2-LR 2071-2100): {days_over_thresh_mpi_ssp3.values}")
Mean Annual Days over Threshold (MPI-ESM1-2-LR 2071-2100): 47.064516129032256

Even in the end-of-century period, the MPI model has only about 10 more days per year above 18\(^{\circ}\)C than the 1980-2010 observations. The bias adjustment has the potential to have a large effect on this number, which in turn could have a large effect on the changes to CDDs.

6.2.3 Multi-Model Bias-Correction#

Applying the bias correction is the next step in the multi-model workflow. Doing so is no more complicated than it was for the single model case because the xclim routines automatically apply the correction separately over each dimension in the input dataset. Using multiple models is no different from using data with multiple spatial dimensions, the realization is just another dimension to the data which has no effect on the calculations.

Hide code cell content
# fix the time axis chunking so xclim won't complain
tas_ens_hist_raw = tas_ens_hist_raw.chunk({'time': -1})
tas_ens_ssp3_raw = tas_ens_ssp3_raw.chunk({'time': -1})
# train and then apply the QDM bias correction
QDM_trained_mm = sdba.adjustment.QuantileDeltaMapping.train(tas_obs_noleap,     
                                                            tas_ens_hist_raw, 
                                                            nquantiles = 50, 
                                                            kind = "+",
                                                            group = 'time.month' 
                                                            )


tas_ens_hist_qdm = QDM_trained_mm.adjust(tas_ens_hist_raw,                              
                                         interp = 'linear')

tas_ens_ssp3_qdm = QDM_trained_mm.adjust(tas_ens_ssp3_raw,                                 
                                         interp = 'linear')

6.2.4 Validating the Multi-Model BC’d Data#

Like before, we’ll plot the historical cliamtologies using the downscaled data.

Hide code cell content
# calculate daily climatologies for the multi-model ensemble
tas_clim_mm_hist_qdm = tas_ens_hist_qdm.groupby('time.dayofyear').mean('time').compute()
Hide code cell source
# plot the daily climatologies for all models, QDM
fig, ax = plt.subplots(figsize = (8,6))

# daily climatologies as 1D curves
# obs
tas_dailyclim_obs.plot.line(ax = ax, label = "Station Obs", color = 'k', linewidth = 3)

# models
tas_clim_mm_hist_qdm.plot.line(ax = ax, hue = 'realization')

# 1 sigma shading
# obs
ax.fill_between(tas_dailyclim_obs.dayofyear,
                tas_dailyclim_obs - tas_dailyclim_std_obs, 
                tas_dailyclim_obs + tas_dailyclim_std_obs,
                alpha = 0.2, color = 'k')

ax.set_title("Edmonton Daily Mean Temperature Daily Climatology - QDM")
ax.set_ylabel('tas')
plt.show()
../_images/bda85fdcb3a9401e7969d9781a939f7202cf96c90a27f808e82933abe17852c1.png

As expected, the bias-adjusted historical data matches the observed seasonal cycle very well, with much less spread than the raw model simulations. Next, we’ll plot the historical CDDs.

Hide code cell content
# assign unit to temperature data
tas_ens_hist_qdm.attrs['units'] = 'degC'

# calculate CDDs
cdds_mm_hist_qdm = xci.atmos.cooling_degree_days(tas_ens_hist_qdm).compute()

# long-term means
cdds_mm_hist_qdm_ltm = cdds_mm_hist_qdm.mean('time')
cdds_mm_hist_qdm_stdev = cdds_mm_hist_qdm.std('time')
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:42: UserWarning: Variable does not have a `cell_methods` attribute.
  _check_cell_methods(
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:46: UserWarning: Variable does not have a `standard_name` attribute.
  check_valid(vardata, "standard_name", data["standard_name"])
Hide code cell source
# plot timeseries of CDDs for bias-corrected multi-model historical ensemble and obs
fig, axes = plt.subplots(ncols = 2, figsize = (12, 4))
cdd_obs.plot.line(ax = axes[0], color = 'k', linewidth = 3)
lines = cdds_mm_hist_qdm.plot.line(ax = axes[0], hue = 'realization')
# set legend location
sns.move_legend(axes[0], loc='center left', bbox_to_anchor=(-0.55, 0.5))
axes[0].set_title("Edmonton Annual Cooling Degree Days - QDM")

# plot long-term means
bars = axes[1].bar(['Obs'] + models, np.append(cdd_obs_ltm.values, cdds_mm_hist_qdm_ltm.values),
                   yerr = np.append(cdd_obs_stdev.values, cdds_mm_hist_qdm_stdev.values), 
                   capsize = 5, error_kw = {'ecolor': 'darkgrey', 'elinewidth': 2})
axes[1].set_title("QDM Long Term Mean")


# rotate the model labels so they're legible
axes[1].set_xticks(axes[1].get_xticks(), axes[1].get_xticklabels(), rotation = 45, ha = 'right')

# match bar colors to line colors
bars[0].set_color('k')
for i, b in enumerate(bars[1:]):
    b.set_color(lines[i].get_color())
     
for ax in axes:
    ax.set(ylabel = "CDD")

plt.show()
../_images/5de53d557ee0c40bb0c548672e834fef83a2ccb0b05dbce687bc543599926125.png

Also as expected, the long-term means of the bias-corrected historical CDDs are also basically indistinguishable from the observations, though the ranges of variability differ slightly.

6.2.5 Downscaled Climate Indicator#

What impact has the bias correction had on the future projections? Let’s take a look.

Hide code cell content
# calculate CDDs for future period
tas_ens_ssp3_qdm.attrs['units'] = 'degC'

cdds_mm_ssp3_qdm = xci.atmos.cooling_degree_days(tas_ens_ssp3_qdm).compute()

# long-term means
cdds_mm_ssp3_qdm_ltm = cdds_mm_ssp3_qdm.mean('time')

# climate change deltas
cdds_mm_ssp3_qdm_delta = cdds_mm_ssp3_qdm_ltm - cdds_mm_hist_qdm_ltm
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:42: UserWarning: Variable does not have a `cell_methods` attribute.
  _check_cell_methods(
/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/xclim/core/cfchecks.py:46: UserWarning: Variable does not have a `standard_name` attribute.
  check_valid(vardata, "standard_name", data["standard_name"])
Hide code cell source
# plot deltas to CDDs - QDM
fig, ax = plt.subplots( figsize = (6, 4))
# plot long-term means
bars = ax.bar(models, cdds_mm_ssp3_qdm_delta.values)
ax.set_title("Edmonton Cooling Degree Days - Change in Long Term Mean\nQDM SSP3-7.0 minus Historical")

# rotate the model labels so they're legible
ax.set_xticks(ax.get_xticks(), ax.get_xticklabels(), rotation = 45, ha = 'right')

# match bar colors to line colors
for i, b in enumerate(bars):
    b.set_color(lines[i].get_color())
     
ax.set_ylabel('CDD')

plt.show()
../_images/0599f7afec0938174072dc309692793af00e9241ade9af6e152df4566191d22b.png

After bias correction, there is still a fairly sizeable range of projections across the different models, but the ordering has changed. Notably, the MPI model no longer projects the lowest increase, likely because the bias correction led to more days meeting the 18\(^{\circ}\)C threshold. Importantly, this demonstrates that bias correction can have a substantial impact on the climate change signal of your indicator. This holds especially for indicators based on a fixed threshold, calculated with a nonlinear formula, or using a complex model. For indicators like CDDs that use a fixed threshold, the values calculated using the raw model output are essentially meaningless. The 18\(^{\circ}\)C threshold was developed based on the observed climate, so if the raw model climate is substantially different, it doesn’t make sense to apply that same threshold.

CanESM5 now projects the largest increase instead of CESM2, though it had the third largest increase amongst the raw model data so the overall grouping hasn’t changed by much. Using this multi-model ensemble of projections, we can create a likely range of projected changes for our future period (2070-2100) under the SSP3-7.0 scenario. We’ll use the 10\(^{\text{th}}\)-90\(^{\text{th}}\) percentile range of the inter-model spread as our range of projections.

cdd_change_10p = cdds_mm_ssp3_qdm_delta.quantile(0.1, dim = 'realization')
cdd_change_90p = cdds_mm_ssp3_qdm_delta.quantile(0.9, dim = 'realization')

# round the values to the nearest tenth for printing
low_range = np.around(cdd_change_10p.values, 1)
high_range = np.around(cdd_change_90p.values, 1)

result_str = "Likely Range of Mean CDD Change from 1980-2010 to 2070-2100 under SSP3-7.0: "
result_str += f"{low_range} to {high_range} CDD/year"

print(result_str)
Likely Range of Mean CDD Change from 1980-2010 to 2070-2100 under SSP3-7.0: 208.7 to 564.6 CDD/year

Or, in terms of percent changes, which may be easier to interpret:

# multi-model mean historical CDDs
cdds_hist_qdm_ltm_mm_mean = cdds_mm_hist_qdm_ltm.mean('realization')

low_range_pct = int(np.around(100 * cdd_change_10p.values/cdds_hist_qdm_ltm_mm_mean.values, 0))
high_range_pct = int(np.around(100 * cdd_change_90p.values/cdds_hist_qdm_ltm_mm_mean.values, 0))

result_str = "Likely Range of Mean CDD Change from 1980-2010 to 2070-2100 under SSP3-7.0: "
result_str += f"{low_range_pct}% to "
result_str += f"{high_range_pct}%"

print(result_str)
Likely Range of Mean CDD Change from 1980-2010 to 2070-2100 under SSP3-7.0: 243% to 658%

This likely range of increases under this scenario is quite large, indicating that model structural uncertainty is very important to consider when studying changes to CDDs under climate change. Of course, there is also uncertainty relating to the future emissions scenario and internal climate variability. In the next section, we’ll quantify the effect of scenario uncertainty on future projections by considering multiple SSP scenarios.