Time series data
A major use case for xarray is multi-dimensional time-series data.Accordingly, we’ve copied many of features that make working with time-seriesdata in pandas such a joy to xarray. In most cases, we rely on pandas for thecore functionality.
Creating datetime64 data
xarray uses the numpy dtypes datetime64[ns]
and timedelta64[ns]
torepresent datetime data, which offer vectorized (if sometimes buggy) operationswith numpy and smooth integration with pandas.
To convert to or create regular arrays of datetime64
data, we recommendusing pandas.to_datetime()
and pandas.date_range()
:
- In [1]: pd.to_datetime(['2000-01-01', '2000-02-02'])
- Out[1]: DatetimeIndex(['2000-01-01', '2000-02-02'], dtype='datetime64[ns]', freq=None)
- In [2]: pd.date_range('2000-01-01', periods=365)
- Out[2]:
- DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
- '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
- '2000-01-09', '2000-01-10',
- ...
- '2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
- '2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
- '2000-12-29', '2000-12-30'],
- dtype='datetime64[ns]', length=365, freq='D')
Alternatively, you can supply arrays of Python datetime
objects. These getconverted automatically when used as arguments in xarray objects:
- In [3]: import datetime
- In [4]: xr.Dataset({'time': datetime.datetime(2000, 1, 1)})
- Out[4]:
- <xarray.Dataset>
- Dimensions: ()
- Data variables:
- time datetime64[ns] 2000-01-01
When reading or writing netCDF files, xarray automatically decodes datetime andtimedelta arrays using CF conventions (that is, by using a units
attribute like 'days since 2000-01-01'
).
Note
When decoding/encoding datetimes for non-standard calendars or for datesbefore year 1678 or after year 2262, xarray uses the cftime library.It was previously packaged with the netcdf4-python
package under thename netcdftime
but is now distributed separately. cftime
is anoptional dependency of xarray.
You can manual decode arrays in this form by passing a dataset todecode_cf()
:
- In [5]: attrs = {'units': 'hours since 2000-01-01'}
- In [6]: ds = xr.Dataset({'time': ('time', [0, 1, 2, 3], attrs)})
- In [7]: xr.decode_cf(ds)
- Out[7]:
- <xarray.Dataset>
- Dimensions: (time: 4)
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-01-01T03:00:00
- Data variables:
- *empty*
One unfortunate limitation of using datetime64[ns]
is that it limits thenative representation of dates to those that fall between the years 1678 and2262. When a netCDF file contains dates outside of these bounds, dates will bereturned as arrays of cftime.datetime
objects and a CFTimeIndex
will be used for indexing. CFTimeIndex
enables a subset ofthe indexing functionality of a pandas.DatetimeIndex
and is onlyfully compatible with the standalone version of cftime
(not the versionpackaged with earlier versions netCDF4
). See Non-standard calendars and dates outside the Timestamp-valid range for moreinformation.
Datetime indexing
xarray borrows powerful indexing machinery from pandas (see Indexing and selecting data).
This allows for several useful and succinct forms of indexing, particularly fordatetime64 data. For example, we support indexing with strings for singleitems and with the slice object:
- In [8]: time = pd.date_range('2000-01-01', freq='H', periods=365 * 24)
- In [9]: ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})
- In [10]: ds.sel(time='2000-01')
- Out[10]:
- <xarray.Dataset>
- Dimensions: (time: 744)
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-01-31T23:00:00
- Data variables:
- foo (time) int64 0 1 2 3 4 5 6 7 8 ... 736 737 738 739 740 741 742 743
- In [11]: ds.sel(time=slice('2000-06-01', '2000-06-10'))
- Out[11]:
- <xarray.Dataset>
- Dimensions: (time: 240)
- Coordinates:
- * time (time) datetime64[ns] 2000-06-01 ... 2000-06-10T23:00:00
- Data variables:
- foo (time) int64 3648 3649 3650 3651 3652 ... 3883 3884 3885 3886 3887
You can also select a particular time by indexing with adatetime.time
object:
- In [12]: ds.sel(time=datetime.time(12))
- Out[12]:
- <xarray.Dataset>
- Dimensions: (time: 365)
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01T12:00:00 ... 2000-12-30T12:00:00
- Data variables:
- foo (time) int64 12 36 60 84 108 132 ... 8628 8652 8676 8700 8724 8748
For more details, read the pandas documentation.
Datetime components
Similar to pandas, the components of datetime objects contained in agiven DataArray
can be quickly computed using a special .dt
accessor.
- In [13]: time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)
- In [14]: ds = xr.Dataset({'foo': ('time', np.arange(365 * 4)), 'time': time})
- In [15]: ds.time.dt.hour
- Out[15]:
- <xarray.DataArray 'hour' (time: 1460)>
- array([ 0, 6, 12, ..., 6, 12, 18])
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
- In [16]: ds.time.dt.dayofweek
- Out[16]:
- <xarray.DataArray 'dayofweek' (time: 1460)>
- array([5, 5, 5, ..., 5, 5, 5])
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
The .dt
accessor works on both coordinate dimensions as well asmulti-dimensional data.
xarray also supports a notion of “virtual” or “derived” coordinates fordatetime components implemented by pandas, including “year”, “month”,“day”, “hour”, “minute”, “second”, “dayofyear”, “week”, “dayofweek”, “weekday”and “quarter”:
- In [17]: ds['time.month']
- Out[17]:
- <xarray.DataArray 'month' (time: 1460)>
- array([ 1, 1, 1, ..., 12, 12, 12])
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
- In [18]: ds['time.dayofyear']
- Out[18]:
- <xarray.DataArray 'dayofyear' (time: 1460)>
- array([ 1, 1, 1, ..., 365, 365, 365])
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
For use as a derived coordinate, xarray adds 'season'
to the list ofdatetime components supported by pandas:
- In [19]: ds['time.season']
- Out[19]:
- <xarray.DataArray 'season' (time: 1460)>
- array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'], dtype='<U3')
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
- In [20]: ds['time'].dt.season
- Out[20]:
- <xarray.DataArray 'season' (time: 1460)>
- array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'], dtype='<U3')
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
The set of valid seasons consists of ‘DJF’, ‘MAM’, ‘JJA’ and ‘SON’, labeled bythe first letters of the corresponding months.
You can use these shortcuts with both Datasets and DataArray coordinates.
In addition, xarray supports rounding operations floor
, ceil
, and round
. These operations require that you supply a rounding frequency as a string argument.
- In [21]: ds['time'].dt.floor('D')
- Out[21]:
- <xarray.DataArray 'floor' (time: 1460)>
- array(['2000-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000',
- '2000-01-01T00:00:00.000000000', ..., '2000-12-30T00:00:00.000000000',
- '2000-12-30T00:00:00.000000000', '2000-12-30T00:00:00.000000000'],
- dtype='datetime64[ns]')
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
The .dt
accessor can also be used to generate formatted datetime stringsfor arrays utilising the same formatting as the standard datetime.strftime.
- In [22]: ds['time'].dt.strftime('%a, %b %d %H:%M')
- Out[22]:
- <xarray.DataArray 'strftime' (time: 1460)>
- array(['Sat, Jan 01 00:00', 'Sat, Jan 01 06:00', 'Sat, Jan 01 12:00', ...,
- 'Sat, Dec 30 06:00', 'Sat, Dec 30 12:00', 'Sat, Dec 30 18:00'],
- dtype=object)
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
Resampling and grouped operations
Datetime components couple particularly well with grouped operations (seeGroupBy: split-apply-combine) for analyzing features that repeat over time. Here’s how tocalculate the mean by time of day:
- In [23]: ds.groupby('time.hour').mean()
- Out[23]:
- <xarray.Dataset>
- Dimensions: (hour: 4)
- Coordinates:
- * hour (hour) int64 0 6 12 18
- Data variables:
- foo (hour) float64 728.0 729.0 730.0 731.0
For upsampling or downsampling temporal resolutions, xarray offers aresample()
method building on the core functionalityoffered by the pandas method of the same name. Resample uses essentially thesame api as resample
in pandas.
For example, we can downsample our dataset from hourly to 6-hourly:
- In [24]: ds.resample(time='6H')
- Out[24]: <xarray.core.resample.DatasetResample at 0x7f33f2404668>
This will create a specialized Resample
object which saves informationnecessary for resampling. All of the reduction methods which work withResample
objects can also be used for resampling:
- In [25]: ds.resample(time='6H').mean()
- Out[25]:
- <xarray.Dataset>
- Dimensions: (time: 1460)
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
- Data variables:
- foo (time) float64 0.0 1.0 2.0 3.0 ... 1.457e+03 1.458e+03 1.459e+03
You can also supply an arbitrary reduction function to aggregate over eachresampling group:
- In [26]: ds.resample(time='6H').reduce(np.mean)
- Out[26]:
- <xarray.Dataset>
- Dimensions: (time: 1460)
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
- Data variables:
- foo (time) float64 0.0 1.0 2.0 3.0 ... 1.457e+03 1.458e+03 1.459e+03
For upsampling, xarray provides six methods: asfreq
, ffill
, bfill
, pad
,nearest
and interpolate
. interpolate
extends scipy.interpolate.interp1d
and supports all of its schemes. All of these resampling operations work on bothDataset and DataArray objects with an arbitrary number of dimensions.
In order to limit the scope of the methods ffill
, bfill
, pad
andnearest
the tolerance
argument can be set in coordinate units.Data that has indices outside of the given tolerance
are set to NaN
.
- In [27]: ds.resample(time='1H').nearest(tolerance='1H')
- Out[27]:
- <xarray.Dataset>
- Dimensions: (time: 8755)
- Coordinates:
- * time (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
- Data variables:
- foo (time) float64 0.0 0.0 nan nan nan ... nan nan 1.459e+03 1.459e+03
For more examples of using grouped operations on a time dimension, seeToy weather data.