Analyzing the historical and projected changes in the precipitation amounts using CMIP6 model data

As an atmospheric data scientist, I would like to process the precipitation variable from the historical and projected CMIP6 climate data to answer specific questions such as:

  • How can I get a subset of CMIP6 data?

  • How do the global monthly precipitation amounts change between the historical and projected (ssp585)?

  • Do we observe any increase or decrease in the yearly total precipitation amounts areally averaged for Turkey?

  • To what extent our analysis can be questionable?

Open Data

For today’s analyses, we will be using the a subset of CMIP6 historical and projection data. The use of intake package (see documentation) makes the data-retrieving process easy for us, therefore we are going to make use of it.

Let’s begin with importing the necessary tools, which we will exploit in our analyses.

import pandas as pd 
import intake
import matplotlib.pyplot as plt
import numpy as np

Note that in addition to the packages above you will need the s3fs and intake-esm packages installed in your environment beforehand.

The intake package facilitates easily viewing the ESM’s CMIP6 climate model simulations stored at AWS or Google Cloud by making use of the open_esm_datastore function.

cmip_aws_link = "https://cmip6-pds.s3.amazonaws.com/pangeo-cmip6.json"
cmip_col = intake.open_esm_datastore(cmip_aws_link) # esm cmip6 collection
ERROR! Session/line number was not unique in database. History logging moved to new session 1928

cmip_col contains a collection of cmip6 model simulations using which you can view the information for each set of CMIP6 simulations

We can view the cmip_col collection as a dataframe

cmip_col.df.head(3)
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1 Amon va gn s3://cmip6-pds/CMIP6/HighResMIP/CMCC/CMCC-CM2-... NaN 20170706
1 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1 Amon zg gn s3://cmip6-pds/CMIP6/HighResMIP/CMCC/CMCC-CM2-... NaN 20170706
2 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1 6hrPlev psl gn s3://cmip6-pds/CMIP6/HighResMIP/CMCC/CMCC-CM2-... NaN 20170706

Here you’ll find a lot of different columns, each of which corresponds to a specific information of a particular CMIP6 model simulation. See this documentation to learn more about what each of these column names refer to.

We are going to use a specific monthly CMIP6 precipitation data for both the historical and projection (sssp585) periods. Let’s search for the data we are going to utilize!

col_subsetted = cmip_col.search(require_all_on=['source_id'],
                                experiment_id=['historical', 'ssp585'],
                                member_id = 'r1i1p1f1',
                                table_id='Amon',
                                variable_id=['pr'],   
                                grid_label='gr')
col_subsetted

pangeo-cmip6 catalog with 16 dataset(s) from 16 asset(s):

unique
activity_id 2
institution_id 5
source_id 8
experiment_id 2
member_id 1
table_id 1
variable_id 1
grid_label 1
zstore 16
dcpp_init_year 0
version 13

Now it’s time to view the available historical and projection datasets for the the search we have made.

ds_dict = col_subsetted.to_dataset_dict(zarr_kwargs={'consolidated': True},
                                        storage_options={'token': 'anon'})
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
C:\Users\User\anaconda3\envs\cmip6\lib\site-packages\xarray\coding\times.py:527: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
C:\Users\User\anaconda3\envs\cmip6\lib\site-packages\xarray\coding\times.py:527: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
100.00% [16/16 00:03<00:00]
C:\Users\User\anaconda3\envs\cmip6\lib\site-packages\xarray\core\indexing.py:419: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  return np.asarray(array[self.key], dtype=None)

Now that we have a dictionary of CMIP6 xarray data, we will make use of 1 historical and 1 projection model simulation. You can check out the keys of ds_dict variable to see other available CMIP6 model simulations for the search we’ve made.

# historical model simulation
ds_hist = ds_dict['CMIP.NIMS-KMA.KACE-1-0-G.historical.Amon.gr'].isel(member_id=0, bnds=0)

# projection model simulation
ds_proj = ds_dict['ScenarioMIP.NIMS-KMA.KACE-1-0-G.ssp585.Amon.gr'].isel(member_id=0, bnds=0)

Let’s briefly see the content of our datasets

ds_hist
<xarray.Dataset>
Dimensions:    (lat: 144, lon: 192, time: 1980)
Coordinates:
  * lat        (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38
    lat_bnds   (lat) float64 dask.array<chunksize=(144,), meta=np.ndarray>
  * lon        (lon) float64 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1
    lon_bnds   (lon) float64 dask.array<chunksize=(192,), meta=np.ndarray>
  * time       (time) object 1850-01-16 00:00:00 ... 2014-12-16 00:00:00
    time_bnds  (time) object dask.array<chunksize=(1980,), meta=np.ndarray>
    member_id  <U8 'r1i1p1f1'
Data variables:
    pr         (time, lat, lon) float32 dask.array<chunksize=(563, 144, 192), meta=np.ndarray>
Attributes: (12/51)
    Conventions:             CF-1.7 CMIP-6.2
    activity_id:             CMIP
    branch_method:           standard
    branch_time_in_child:    0.0
    branch_time_in_parent:   0.0
    cmor_version:            3.4.0
    ...                      ...
    variable_id:             pr
    variant_label:           r1i1p1f1
    netcdf_tracking_ids:     hdl:21.14100/8fc44423-1f8a-4780-ac8c-4d6e7fbd8884
    version_id:              v20190910
    intake_esm_varname:      ['pr']
    intake_esm_dataset_key:  CMIP.NIMS-KMA.KACE-1-0-G.historical.Amon.gr
ds_proj
<xarray.Dataset>
Dimensions:    (lat: 144, lon: 192, time: 1032)
Coordinates:
  * lat        (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38
    lat_bnds   (lat) float64 dask.array<chunksize=(144,), meta=np.ndarray>
  * lon        (lon) float64 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1
    lon_bnds   (lon) float64 dask.array<chunksize=(192,), meta=np.ndarray>
  * time       (time) object 2015-01-16 00:00:00 ... 2100-12-16 00:00:00
    time_bnds  (time) object dask.array<chunksize=(1032,), meta=np.ndarray>
    member_id  <U8 'r1i1p1f1'
Data variables:
    pr         (time, lat, lon) float32 dask.array<chunksize=(451, 144, 192), meta=np.ndarray>
Attributes: (12/51)
    Conventions:             CF-1.7 CMIP-6.2
    activity_id:             ScenarioMIP
    branch_method:           standard
    branch_time_in_child:    0.0
    branch_time_in_parent:   0.0
    cmor_version:            3.4.0
    ...                      ...
    variable_id:             pr
    variant_label:           r1i1p1f1
    netcdf_tracking_ids:     hdl:21.14100/8b57555a-95ed-47b0-8cad-049c403e745a
    version_id:              v20190920
    intake_esm_varname:      ['pr']
    intake_esm_dataset_key:  ScenarioMIP.NIMS-KMA.KACE-1-0-G.ssp585.Amon.gr

How does the mean monthly January precipitation change between the historical and projection (ssp585) periods?

We can answer this question simply by groupbying the months and getting the average of this grouped data. Since we are interested in analyzing the January precipitation amounts, we can select the first month, which is January, using the .isel() method.

hist_mean_jan = ds_hist.pr.groupby('time.month').mean().isel(month=0) * 86400
proj_mean_jan = ds_proj.pr.groupby('time.month').mean().isel(month=0) * 86400

So here’s the question most of you might ask:

  • Why did we multiply the data with 86400?

Quite simple. The unit of our precipitation data is kg m-2 s-1, which corresponds to 86400 mm/day. Given that it will be easier for us to analyze precipitation in mm/day, we will be converting the units to mm/day.

ds_hist.pr.units
'kg m-2 s-1'

Let’s plot our findings!

(proj_mean_jan - hist_mean_jan).plot()
<matplotlib.collections.QuadMesh at 0x29c162e0af0>
../_images/cmip6-basic-analysis_27_1.png

The change in the precipitation amount might have looked more dramatic if we had used more specific temporal periods (e.g., (1900-1949 ; 2050-2099 averages)). Also note that our precipitation now has a unit of mm/seconds. So, you could resample daily the values to yearly in order to have a broader picture of the projected changes in precipitation.

How does the area-averaged total precipitation in Turkey change during the historical and projection (ssp585) periods?

Our aim is to calculate the area-averaged total precipitation for Turkey.

So what should we do first?

Firstly, we will be subsetting our data to the coordinates of Turkey. Then we are going to implement the resample method to transform our monthly values to yearly summed values.

# Subset the data to the coordinates of Turkey
ds_hist_tr = ds_hist.pr.sel(lat=slice(36,42), lon=slice(26,45)) * 86400 * 30
ds_proj_tr = ds_proj.pr.sel(lat=slice(36,42), lon=slice(26,45)) * 86400 * 30

# Resample monthly to yearly summed values and get the areal average
ds_hist_tr_yearly_averaged = ds_hist_tr.resample(time='1Y').sum().mean(dim=['lat','lon'])
ds_proj_tr_yearly_averaged = ds_proj_tr.resample(time='1Y').sum().mean(dim=['lat','lon'])
# Historical change in precipitation
ds_hist_tr_yearly_averaged.plot()
[<matplotlib.lines.Line2D at 0x29c13c3b640>]
../_images/cmip6-basic-analysis_32_1.png
# Projected change in precipitation
ds_proj_tr_yearly_averaged.plot()
[<matplotlib.lines.Line2D at 0x29c07aee040>]
../_images/cmip6-basic-analysis_33_1.png

To what extent our analysis can be questionable?

Remember that so far we used 1 historical and 1 projection data. We could use a larger set of model simulations to perform ensemble analyses, employing various statistical and probabilistic approaches.

Elaborate on the topics below:

  • What is CMIP6 data?

  • What are the meanings for ssp585 and rcp85?

  • Where are these CMIP6 model simulations used and presented?

See you until the next time!

Blog by:
Berkay DÖNMEZ : Linkedin | Github | Twitter | Instagram
Kutay DÖNMEZ : LinkedIn | Github | Twitter | Instagram

ABOUT US