Toy weather data

Here is an example of how to easily manipulate a toy weather dataset usingxarray and other recommended Python libraries:

Shared setup:

  1. import numpy as np
  2. import pandas as pd
  3. import seaborn as sns # noqa, pandas aware plotting library
  4.  
  5. import xarray as xr
  6.  
  7. np.random.seed(123)
  8.  
  9. times = pd.date_range('2000-01-01', '2001-12-31', name='time')
  10. annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))
  11.  
  12. base = 10 + 15 * annual_cycle.reshape(-1, 1)
  13. tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
  14. tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)
  15.  
  16. ds = xr.Dataset({'tmin': (('time', 'location'), tmin_values),
  17. 'tmax': (('time', 'location'), tmax_values)},
  18. {'time': times, 'location': ['IA', 'IN', 'IL']})

Examine a dataset with pandas and seaborn

  1. In [1]: ds
  2. Out[1]:
  3. <xarray.Dataset>
  4. Dimensions: (location: 3, time: 731)
  5. Coordinates:
  6. * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
  7. * location (location) <U2 'IA' 'IN' 'IL'
  8. Data variables:
  9. tmin (time, location) float64 -8.037 -1.788 -3.932 ... -1.346 -4.544
  10. tmax (time, location) float64 12.98 3.31 6.779 ... 6.636 3.343 3.805
  11.  
  12. In [2]: df = ds.to_dataframe()
  13.  
  14. In [3]: df.head()
  15. Out[3]:
  16. tmin tmax
  17. location time
  18. IA 2000-01-01 -8.037369 12.980549
  19. 2000-01-02 -9.341157 0.447856
  20. 2000-01-03 -12.139719 5.322699
  21. 2000-01-04 -7.492914 1.889425
  22. 2000-01-05 -0.447129 0.791176
  23.  
  24. In [4]: df.describe()
  25. Out[4]:
  26. tmin tmax
  27. count 2193.000000 2193.000000
  28. mean 9.975426 20.108232
  29. std 10.963228 11.010569
  30. min -13.395763 -3.506234
  31. 25% -0.040347 9.853905
  32. 50% 10.060403 19.967409
  33. 75% 20.083590 30.045588
  34. max 33.456060 43.271148
  35.  
  36. In [5]: ds.mean(dim='location').to_dataframe().plot()
  37. Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7f34147b3a20>

../_images/examples_tmin_tmax_plot.png

  1. In [6]: sns.pairplot(df.reset_index(), vars=ds.data_vars)
  2. Out[6]: <seaborn.axisgrid.PairGrid at 0x7f3424685b38>

../_images/examples_pairplot.png

Probability of freeze by calendar month

  1. In [7]: freeze = (ds['tmin'] <= 0).groupby('time.month').mean('time')
  2.  
  3. In [8]: freeze
  4. Out[8]:
  5. <xarray.DataArray 'tmin' (month: 12, location: 3)>
  6. array([[0.951613, 0.887097, 0.935484],
  7. [0.842105, 0.719298, 0.77193 ],
  8. [0.241935, 0.129032, 0.16129 ],
  9. [0. , 0. , 0. ],
  10. [0. , 0. , 0. ],
  11. [0. , 0. , 0. ],
  12. [0. , 0. , 0. ],
  13. [0. , 0. , 0. ],
  14. [0. , 0. , 0. ],
  15. [0. , 0.016129, 0. ],
  16. [0.333333, 0.35 , 0.233333],
  17. [0.935484, 0.854839, 0.822581]])
  18. Coordinates:
  19. * location (location) <U2 'IA' 'IN' 'IL'
  20. * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
  21.  
  22. In [9]: freeze.to_pandas().plot()
  23. Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0x7f34259762e8>

../_images/examples_freeze_prob.png

Monthly averaging

  1. In [10]: monthly_avg = ds.resample(time='1MS').mean()
  2.  
  3. In [11]: monthly_avg.sel(location='IA').to_dataframe().plot(style='s-')
  4. Out[11]: <matplotlib.axes._subplots.AxesSubplot at 0x7f34258bae80>

../_images/examples_tmin_tmax_plot_mean.pngNote that MS here refers to Month-Start; M labels Month-End (the lastday of the month).

Calculate monthly anomalies

In climatology, “anomalies” refer to the difference between observations andtypical weather for a particular season. Unlike observations, anomalies shouldnot show any seasonal cycle.

  1. In [12]: climatology = ds.groupby('time.month').mean('time')
  2.  
  3. In [13]: anomalies = ds.groupby('time.month') - climatology
  4.  
  5. In [14]: anomalies.mean('location').to_dataframe()[['tmin', 'tmax']].plot()
  6. Out[14]: <matplotlib.axes._subplots.AxesSubplot at 0x7f342581f748>

../_images/examples_anomalies_plot.png

Calculate standardized monthly anomalies

You can create standardized anomalies where the difference between theobservations and the climatological monthly mean isdivided by the climatological standard deviation.

  1. In [15]: climatology_mean = ds.groupby('time.month').mean('time')
  2.  
  3. In [16]: climatology_std = ds.groupby('time.month').std('time')
  4.  
  5. In [17]: stand_anomalies = xr.apply_ufunc(
  6. ....: lambda x, m, s: (x - m) / s,
  7. ....: ds.groupby('time.month'),
  8. ....: climatology_mean, climatology_std)
  9. ....:
  10.  
  11. In [18]: stand_anomalies.mean('location').to_dataframe()[['tmin', 'tmax']].plot()
  12. Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3425767cc0>

../_images/examples_standardized_anomalies_plot.png

Fill missing values with climatology

The fillna() method on grouped objects lets you easilyfill missing values by group:

  1. # throw away the first half of every month
  2. In [19]: some_missing = ds.tmin.sel(time=ds['time.day'] > 15).reindex_like(ds)
  3.  
  4. In [20]: filled = some_missing.groupby('time.month').fillna(climatology.tmin)
  5.  
  6. In [21]: both = xr.Dataset({'some_missing': some_missing, 'filled': filled})
  7.  
  8. In [22]: both
  9. Out[22]:
  10. <xarray.Dataset>
  11. Dimensions: (location: 3, time: 731)
  12. Coordinates:
  13. * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
  14. * location (location) object 'IA' 'IN' 'IL'
  15. month (time) int64 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12
  16. Data variables:
  17. some_missing (time, location) float64 nan nan nan ... 2.063 -1.346 -4.544
  18. filled (time, location) float64 -5.163 -4.216 ... -1.346 -4.544
  19.  
  20. In [23]: df = both.sel(time='2000').mean('location').reset_coords(drop=True).to_dataframe()
  21.  
  22. In [24]: df[['filled', 'some_missing']].plot()
  23. Out[24]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3425752c18>

../_images/examples_filled.png