6.1 Edmonton AC Loads: Single Simulation Analysis

6.1 Edmonton AC Loads: Single Simulation Analysis#

This toy example is about the potential effects of climate change on air conditioning loads in the Canadian city of Edmonton. To start small, this example will focus on a single spatial location, and as such, will demonstrate only bias-correction of climate model data, not true spatial downscaling.

Note

Because of the small scope of this example, you should be able to run the code on a personal computer (like all of the previous notebooks). The later examples which involve spatial downscaling will likely require more memory and processing power than a laptop or basic desktop computer can handle.

6.1.1 Preliminary Study Design#

Before starting any analysis, some preliminary decisions must be made. The guided survey provided by the UTCDW will help you work through each of the necessary decisions, explained in more detail in Chapter 5 of the UTCDW Guidebook. Herein we’ll specify the responses to each survey question, and generate a flowchart that will guide us through the steps of the analysis.

The first few questions of the survey have already been answered, just by defining our study problem. The subject of our study is changes to AC loads in Edmonton, which itself already includes the spatial domain we will study.

Next, we need to decide on the time periods of analysis, one to establish the historical baseline from which changes will be measured, and the other to use for future projections. Each period should be at least 30 years long, which is the standard for establishing climate normals. For the historical baseline, we will choose 1980-2010. This period is entirely covered by the historical CMIP6 simulations, which end in the year 2015. It also will allow us to later use the NRCanMet gridded observational data from PCIC, which is only available up to the year 2010. For the future period, we’ll use the end-of-century period 2070-2100. For a practical study, you might be interested in a closer-term future period, but using an end-of-century period for this pedagogical will maximize the strength of the signal.

For our future period, we must also specify one or more future scenarios to study, since we cannot know exactly what future emissions of climate forcers like greenhouse gases will be. To continue with the theme of starting small, we’ll begin with data from only one model and one future scenario to develop our code. After going through the whole bias-correction workflow with this model, we’ll expand the analysis to include multiple models and scenarios.

The scenario we will start with is SSP3-7.0. This is a high emissions scenario (but not the highest) and represents a future of “Regional Rivalry” where countries focus on their own economic goals instead of international cooperation on environmental issues.

The model we’ll begin with is the NCAR Community Earth System Model version 2. CESM2 is a state-of-the-art Earth System Model contributing to CMIP6 and is developed by a large team of scientists in the USA. The results from this model aren’t particularly special in the context of the CMIP6 ensemble, though it does have an equilibrium climate sensitivity towards the higher end of the CMIP6 spectrum. As mentioned, we’ll later expand this analysis to include multiple CMIP6 models.

Now we must specify our climate indicator - the way we’ll quantify AC loads as a function of climatic variables. A standard climate indicator that was developed to be a proxy for AC loads is Cooling Degree Days (CDDs). CDDs measure the cumulative annual AC load by summing the amount by which the daily mean temperature exceeds a certain threshold, on days when that threshold is exceeded. The formula is as follows:

\[ CDD = \sum_{i=1}^{365}\left(T_{i} - T_{thresh}\right) I\left(T_{i} > T_{thresh} \right) \]

Where \(I(x)\) is the indicator function, taking value 1 when \(x\) is true and \(0\) when \(x\) is false. The usual value for \(T_{thresh}\) is 18\(^{\circ}\)C. Remember that \(T_{i}\) is the daily mean temperature. You may not usually turn on your AC when the temperature is 18\(^{\circ}\)C, but the peak afternoon temperature on such a day is likely to be much warmer than 18\(^{\circ}\)C because of the diurnal cycle.

Knowing how to calculate CDDs has answered the next few survey questions for us: the input variable is surface air temperature, with the time sampling being daily averages.

Finally, we must specify the spatial sampling for our data. In this example, we’re interested in projections for a single discrete location: the site of an ECCC weather station from which we’ll draw observations of daily mean temperature. As mentioned, a later example will use gridded observations to produce downscaled results for an extended region, but this first example will start simple and demonstrate the workflow for a small case.

Having answered all of the survey questions, we can use the UTCDW website to generate a flowchart that explains the steps of our analysis:

The top half of the flowchart contains the answers to the survey questions, regarding the data we’ve chosen to use. These decisions feed into the analysis steps on the bottom half of the flowchart, which is standardized. Of course, there remain decisions to be made regarding the important aspects of the model data to validate, and your method of bias-correction/downscaling, but these can be dealt with after having decided on which datasets to use and may become more apparent after doing exploratory analysis with the data.

6.1.2 Selecting and Downloading the Datasets#

Having decided on the scope of the project, and some details that constrain the datasets that will be used in the study, we can start acquiring the data. Let’s begin with the observational dataset: ECCC weather station observations for a single site in Edmonton. We will use the ec3 package to search for an appropriate station and download the data. Then we’ll do some data cleaning with pandas, and eventually convert the station data into xarray format so it can be processed in the same way as the model data, and be used with xclim for bias correction and calculating the climate indicator.

# search for stations near our desired location
find_stn_results = ec3.find_station(target = (lat_edm, lon_edm), 
                                    period = range(1980, 2011),
                                    type = 'daily',
                                    dist = range(25),
                                    detect_recodes = True)

/Users/mikemorris/opt/anaconda3/envs/UTCDW-env2/lib/python3.12/site-packages/ec3.py:105: UserWarning: Cannot find the station inventory in the current working directory.
  warnings.warn("Cannot find the station inventory in the current working directory.")

Downloading Station Inventory EN.csv to /var/folders/nt/0d8x2n0x1bdbrmp50m9nn6f00000gn/T/tmpvjfse1fn

Note: In addition to the stations found, the following combinations may provide sufficient baseline data.


>> Combination 1 at coordinates 53.57 -113.52 

Station 1864 : EDMONTON CALDER (1975-1977)
Station 1867 : EDMONTON CITY CENTRE A (1937-2005)
Station 27214 : EDMONTON BLATCHFORD (1996-2023)
Station 31427 : EDMONTON CITY CENTRE AWOS (2005-2015)

Show code cell content Hide code cell content

find_stn_results

	Name	Province	Climate ID	Station ID	WMO ID	TC ID	Latitude (Decimal Degrees)	Longitude (Decimal Degrees)	Latitude	Longitude	Elevation (m)	First Year	Last Year	DLY First Year	DLY Last Year	Dist
2429	EDMONTON WOODBEND	ALBERTA	3012230	1872	NaN	NaN	53.42	-113.75	532500000.0	-1.134500e+09	670.6	1973	2015	1973.0	2015.0	18.84175793351712 km
2416	EDMONTON INT'L A	ALBERTA	3012205	1865	71123.0	YEG	53.32	-113.58	531900000.0	-1.133500e+09	723.3	1959	2012	1959.0	2012.0	20.72726507003484 km

To focus on an urban area where any change to CDDs is likely to have the most impact on people and building energy use, we’ll use the Edmonton City Centre station. This means combining data from Station IDs 1867 and 31427, to fit our whole historical reference period.

Having downloaded the station observational data, the next step is to download the raw model data. Since it tends to be more reliable than the ESGF, we’ll search the Google Cloud Services CMIP6 archive for data from our chosen model and scenarios.

# open the Google Cloud model data catalog with pandas
df_catalog = pd.read_csv(url_gcsfs_catalog)

# search for our selected model, both historical and SSP3-7.0 scenarios
search_string = "table_id == 'day' & source_id == 'CESM2' & variable_id == 'tas'" 
# continue on the next line
search_string += " & experiment_id == ['historical', 'ssp370']"
df_search = df_catalog.query(search_string)

# print a summary of the resulting dataframe, click to reveal
df_search 

Show code cell output Hide code cell output

	activity_id	institution_id	source_id	experiment_id	member_id	table_id	variable_id	grid_label	zstore	dcpp_init_year	version
58919	CMIP	NCAR	CESM2	historical	r1i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r1...	NaN	20190308
61527	CMIP	NCAR	CESM2	historical	r5i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r5...	NaN	20190308
61604	CMIP	NCAR	CESM2	historical	r4i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r4...	NaN	20190308
61627	CMIP	NCAR	CESM2	historical	r3i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r3...	NaN	20190308
62145	CMIP	NCAR	CESM2	historical	r6i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r6...	NaN	20190308
63061	CMIP	NCAR	CESM2	historical	r2i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r2...	NaN	20190308
64025	CMIP	NCAR	CESM2	historical	r7i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r7...	NaN	20190311
65331	CMIP	NCAR	CESM2	historical	r9i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r9...	NaN	20190311
65878	CMIP	NCAR	CESM2	historical	r8i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r8...	NaN	20190311
66385	CMIP	NCAR	CESM2	historical	r10i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r1...	NaN	20190313
200863	CMIP	NCAR	CESM2	historical	r11i1p1f1	day	tas	gn	gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r1...	NaN	20190514
445868	ScenarioMIP	NCAR	CESM2	ssp370	r10i1p1f1	day	tas	gn	gs://cmip6/CMIP6/ScenarioMIP/NCAR/CESM2/ssp370...	NaN	20200528
446463	ScenarioMIP	NCAR	CESM2	ssp370	r11i1p1f1	day	tas	gn	gs://cmip6/CMIP6/ScenarioMIP/NCAR/CESM2/ssp370...	NaN	20200528
446493	ScenarioMIP	NCAR	CESM2	ssp370	r4i1p1f1	day	tas	gn	gs://cmip6/CMIP6/ScenarioMIP/NCAR/CESM2/ssp370...	NaN	20200528

Because it has simulations available for both the historical and ssp370 experiments, we’ll start with the ensemble member r10i1p1f1. For this ensemble member of CESM2, we’ll download the tas data from Google Cloud, interpolate it to the coordinates of our station, and select the time periods for the study. For the historical scenario, this will be 1980–2010, the same time period we chose for the station observations. For the future period, we’ll use an end-of-century period, 2070-2100.

6.1.3 Assessing Model-Obs Consistency#

Now that we’ve acquired the model data for our study location, and for the right time periods, we can take a peek at the data and compare the raw model simulations to the observations. This will help us characterize the bias in the raw historical simulations, plus the climate change signal in the raw model projections relative to the historical simulation. First let’s plot the daily climatologies for each dataset, all on the same axes.

../_images/02628787175f8589e9ab1bfc93a6789f397aad53fdef18676509ede003430b4a.png

Strange, the observed data drops to a very low value at the end of the year and extends past the model data on the x axis. This is because the model calendar does not include leap years! To account for this discrepancy, we will drop leap days from the station data to rectify the difference in calendars, and then re-do the calculation.

# calculate station daily climatology again after converting its calendar to match the model
stn_ds_noleap = convert_calendar(stn_ds, 'noleap')
tas_obs_noleap = stn_ds_noleap.tas

tas_dailyclim_obs = tas_obs_noleap.groupby('time.dayofyear').mean('time').compute()
tas_dailyclim_std_obs = tas_obs_noleap.groupby('time.dayofyear').std('time').compute()

../_images/64b9ae6dfc4ce3a1c39f7a1b0a2c56e78c79f05a65e943fdb53ce0613823cf4c.png

This second plot is a more fair comparison of the daily climatologies. The model bias for this location (i.e. the difference between the obs and the historical simulation) isn’t especially large in this case, which is encouraging. The daily climatoloies do not agree perfectly, most notably in the summertime when the model historical climatology peaks later in the year than the observed climatology, but the one-standard-deviation (one-\(\sigma\)) ranges overlap a lot. We’ll continue to characterize the model bias in the next step by plotting the PDFs and CDFs of daily mean temperature for both datasets.

../_images/858cfecc1abdb10b8e821118002b504fa2bca8523f6342f8d9024f0adb253b5e.png

This situation is similar to Section 3.4.4 where we compared the temperature distributions for Toronto between CanESM5 and the Toronto City ECCC station. Here the mean bias is small and the overall variability is similar, but the model (CESM2) distribution is sharply bimodal while the observed temperature distribution is less so. This feature of the bias in the model temperatures cannot be determined by comparing the daily climatologies, but it’s important to account for. Thankfully, our quantile mapping based bias correction methods (like Quantile Delta Mapping) can help correct this. Later on, we’ll reproduce this plot with bias-corrected model data to see if it helps correct for the spurious bimodality of the daily mean temperature distribution.

The next thing to check in regards to model/obs agreement is our climate indicator, cooling degree days (CDDs). If your indicator requires high spatial resolution to calculate (e.g. if you need high-resolution inputs for a complex impacts model), then you won’t be able to do this step. Because CDDs are calculated using a simple formula and this can be calculated from a 1D timeseries of daily mean temperature, there’s no reason why we can’t compare the results from the raw model output to the observed values. We will use the standard temperature threshold of 18\(^{\circ}\)C in our calculation of CDDs (which is the default for xclim.indices.cooling_degree_days).

../_images/77b2109f692a09976cbcdb09751adf5744c260580605384681975db4082d4d42.png

Despite good agreement regarding the daily climatology and the overall mean and standard deviation, the model shows a fairly sizable bias in the number of annual CDDs. 12 CDDs per year, compared to the observed long-term mean of about 85 CDDs per year, is a mean bias of about 14%. Going through the exercise of calculating the climate indicator using the raw model data has illuminated this important bias, which wouldn’t be easily deduced by examining the previous plots. The spurious low-temperature peak of the model PDF hints that there may be too few warm days, but we couldn’t make this conclusion without calculating the CDDs.

6.1.4 Evaluating the Climate Change Signal#

Having characterized the model bias using the historical simulation and the observations, it is time to evaluate the climate change projections of the raw model simulations. First, we will plot the daily climatology of the SSP3-7.0 end-of-century projections, and compare it to the historical period model daily climatology.

../_images/0b636d132b4548705096d081f015e2c3b7183530de51c28d368aad5968489282.png

We see clear signs of warming in the end-of-century period under the SSP3-7.0 scenario, especially in the summer season. The low end of the one-\(\sigma\) range for this case is completely above the high end of the variability range for the model historical case. This is likely to result in large changes in our climate indicator, cooling degree days.

Before calculating the change in CDDs for the raw model output, let’s first test the statistical significance of the climate change signal, using the methods from Section 3.4.5.1. We’ll use the Students’ \(t\)-test, with correction for temporal autocorrelation, to test for an increase in the mean temperature.

# perform two_sample t-test to see if future temperatures are higher than past
tstat, pval_neff = stats.ttest_ind_from_stats(tas_hist_raw_mean,
                                              tas_hist_raw_stdev,
                                              # effective sample size 
                                              neff_hist_raw, 
                                              tas_ssp3_raw_mean, 
                                              tas_ssp3_raw_stdev, 
                                              neff_ssp3_raw,
                                              equal_var = False,
                                              alternative = 'less') 
# alt hypothesis is that the first dataset (historical) 
# has a lower mean than the second dataset (future)

print("p-value for t-test: %.4f" % pval_neff)

p-value for t-test: 0.0001

The low \(p\)-value for our test means we can reject the null hypothesis of no change, and conclude that the warming signal in the SSP3-7.0 end-of-century period is statistically significant. Next let’s calculate the future projected CDDs and do a similar test, to assess the significance of changes to CDDs in the raw model data.

../_images/0139f9f05c0fa3b6d093c96824a0551180c04751127ffc674cbeefdab079e1b3.png

# perform two_sample t-test to see if the future period has more CDDs
tstat, pval_cdds = stats.ttest_ind_from_stats(cdd_hist_raw_ltm,
                                              cdd_hist_raw_stdev,
# different years should be approx. indep. of each other
# so the number of DOF is the number of years
                                              len(cdd_hist_raw.time), 
                                              cdd_ssp3_raw_ltm,
                                              cdd_ssp3_raw_stdev,
                                              len(cdd_ssp3_raw.time),
                                              equal_var = False,
                                              alternative = 'less') 
# alt hypothesis is that the first dataset (historical)
# has a lower mean than the second dataset (future)

print("p-value for t-test: %.4f" % pval_cdds)

p-value for t-test: 0.0000

Wow! The raw model projects an enormous increase in annual CDDs, as we suspected based on the large increase in mean summertime temperatures. This should establish our baseline expectation for what to see in the bias-corrected projections. The exact change may be different, but because we are using a univariate method, we should still see a large increase in CDDs for the future scenario.

6.1.5 Applying the Bias Correction#

In this example, we will use Quantile Delta Mapping (QDM, Cannon et al. [2015] and Section 4.2.3.3) as the method for bias correction. Since we are using data only for a single location, there is no true “downscaling” involved, as the spatial sampling of the results will be the same as the inputs - a 1D timeseries. This bias-correction method is preferred because it corrects for biases in all quantiles of the distribution of the variable of interest and preserves the model-projected relative changes in the entire distribution as well. The former is true for any quantile-mapping based bias-correction method (such as EQM, Section 4.2.3.1), but the latter is the defining feature of QDM. This makes it particularly good for handling changes to extreme values (i.e. high and low quantiles of the distribution), as opposed to other methods which preserve only the mean change projected by the model.

xclim.sdba has implemented the QDM method with the class xclim.sdba.adjustment.QuantileDeltaMapping (documentation). All of the classes in xclim.sdba.adjust have a .train method, which fits the quantiles and calculates the adjustment factors, and a .adjust method, which applies the bias correction to the provided xr.DataArray. The content herein will try to demonstrate how to use this package, but you should also take the time to review the documentation for the adjustment method, plus their generic examples of how to use xclim.sdba, before proceeding. You should also review the tips for applying bias-correction methods provided in Section 5.4.2 of this Guidebook.

# estimate the quantiles and calculate the adjustment factors
QDM_trained = sdba.adjustment.QuantileDeltaMapping.train(# observational data
                                                         tas_obs_noleap, 
                                                         # raw model historical data
                                                         tas_hist_raw, 
# number of quantiles to estimate (see documentation)
                                                         nquantiles = 50, 
# additive adjustment, for interval variable (see documentation & Cannon et al. 2015)
                                                         kind = "+",
# separate adjustment applied to each month, 
# to correct for bias in seasonal cycle (see documentation)
                                                         group = 'time.month' 
                                                        )

# apply the bias correction to the historical and SSP3-7.0 data
tas_hist_qdm = QDM_trained.adjust(tas_hist_raw, 
# method for interpolating between the nquantiles discrete quantile estimates                                  
                                 interp = 'linear')

tas_ssp3_qdm = QDM_trained.adjust(tas_ssp3_raw,                                 
                                  interp = 'linear')

6.1.6 Validating the Bias-Corrected Data#

As a first check, let’s compare the daily climatologies and PDFs of the observed and adjusted historical simulation. The adjusted historical PDF should match the observed PDF essentially perfectly. Since the data are grouped by month when calculating the adjustment factors, the monthly climatologies should match perfectly, but there may be some small bias remaining in the daily climatology. We could have chosen to group by time.dayofyear, but with only 31 years in each dataset, there is probably too small of a sample size to robustly characterize the whole distribution.

# calculate the daily climatologies for the QDM data
tas_hist_qdm_dailyclim = tas_hist_qdm.groupby('time.dayofyear').mean('time').compute()
tas_hist_qdm_dailyclim_stdev = tas_hist_qdm.groupby('time.dayofyear').std('time').compute()

tas_ssp3_qdm_dailyclim = tas_ssp3_qdm.groupby('time.dayofyear').mean('time').compute()
tas_ssp3_qdm_dailyclim_stdev = tas_ssp3_qdm.groupby('time.dayofyear').std('time').compute()

../_images/6382084d8c7297d24fbee8a82bea5841e921a2960cc91857c29dd973fc67d636.png

The daily climatology of the QDM historical data matches much more closely with the station observations than the raw model did. Most notably, the uncertainty shading shows greater overlap, indicating a better match of the interannual variability for each day of the year. Additionally, the timing of the peak temperatures in the summertime matches better than it did for the raw model.

The bias-adjusted future projections look largely similar to those from the raw model. There is still a very large warming signal in the peak summertime temperatures, though it does not stand out as far from the historical range as it did in the raw model. This is likely because the raw model has a positive bias in summertime temperatures, so the relative increase in the bias-adjusted data has a smaller magnitude.

../_images/205b955dbd0fde998479c020a23d8e9890250208c00c98bc61b1726f3e62ad6d.png

Comparing the PDFs of the raw model and bias-corrected output, we can see the impact of applying QDM. The bias-adjusted historical simulation matches the observed PDF much more closely, though it still has a slightly higher low-temperature peak than the observations. This may be an artefact introduced by the monthly grouping - we enforced that the distributions for each month of the year must match individually, not the overall distribution. We can test if the bias-adjusted data follows a significantly different distribution than the observations using the Kolmogorov-Smirnov test, like in Section 4.4.1. The null hypothesis of this test is that both samples being compared are drawn from the same underlying probability distribution. The p-value of 0.51 is sufficiently high that we fail to reject the null hypothesis, and thus there isn’t sufficient evidence to claim that they follow different sampling distributions.

The effect of the bias adjustment on the SSP3-7.0 projections is fairly minor, though it does weaken the low-temperature peak of the PDF, much like it does for the historical distribution. Whether this leads to any substantial change in the projections of CDDs is difficult to discern from looking at the PDF. All in all, the bias was fairly minimal for this model, so it’s no surprise that the effect of the bias adjustment is marginal.

6.1.7 Downscaled Projections of our Climate Indicator#

Next, we will investigate the effect of the bias adjustment on the calculation of CDDs, including both validating the downscaled historical data and assessing the changes to the downscaled future projections.

../_images/00775accc9fd2161f389ffdf13b1dc6ec98c104cfb60ce0119f881915eeb46f9.png

The effect of the bias adjustment on the model historical CDDs is positive - the mean number of annual CDDs is very close to the observed long-term mean. The number of future CDDs also increases as a result of the bias correction, but the mean change is essentially unchanged. This behavior might lead you to question whether the bias correction was worth it at all, but remember that this model had a very low bias in daily mean temperature for our study site. Other models, which we’ll investigate in the next section, may not show the same behavior.