- What’s New
- v0.12.3 (10 July 2019)
- v0.12.2 (29 June 2019)
- v0.12.1 (4 April 2019)
- v0.12.0 (15 March 2019)
- v0.11.3 (26 January 2019)
- v0.11.2 (2 January 2019)
- v0.11.1 (29 December 2018)
- v0.11.0 (7 November 2018)
- v0.10.9 (21 September 2018)
- v0.10.8 (18 July 2018)
- v0.10.7 (7 June 2018)
- v0.10.6 (31 May 2018)
- v0.10.4 (16 May 2018)
- v0.10.3 (13 April 2018)
- v0.10.2 (13 March 2018)
- v0.10.1 (25 February 2018)
- v0.10.0 (20 November 2017)
- v0.9.6 (8 June 2017)
- v0.9.5 (17 April, 2017)
- v0.9.3 (16 April, 2017)
- v0.9.2 (2 April 2017)
- v0.9.1 (30 January 2017)
- v0.9.0 (25 January 2017)
- v0.8.2 (18 August 2016)
- v0.8.1 (5 August 2016)
- v0.8.0 (2 August 2016)
- v0.7.2 (13 March 2016)
- v0.7.1 (16 February 2016)
- v0.7.0 (21 January 2016)
- v0.6.1 (21 October 2015)
- v0.6.0 (21 August 2015)
- v0.5.2 (16 July 2015)
- v0.5.1 (15 June 2015)
- v0.5 (1 June 2015)
- v0.4.1 (18 March 2015)
- v0.4 (2 March, 2015)
- v0.3.2 (23 December, 2014)
- v0.3.1 (22 October, 2014)
- v0.3 (21 September 2014)
- v0.2 (14 August 2014)
- v0.1.1 (20 May 2014)
- v0.1 (2 May 2014)
What’s New
v0.12.3 (10 July 2019)
New functions/methods
- New methods
Dataset.to_stacked_array()
andDataArray.to_unstacked_dataset()
for reshaping Datasets of variableswith different dimensions(GH1317).This is useful for feeding data from xarray into machine learning models,as described in Stacking different variables together.By Noah Brenowitz.
Enhancements
Support for renaming
Dataset
variables and dimensions independentlywithrename_vars()
andrename_dims()
(GH3026).By Julia Kent.Add
scales
,offsets
,units
anddescriptions
attributes toDataArray
returned byopen_rasterio()
. (GH3013)By Erle Carrara.
Bug fixes
Resolved deprecation warnings from newer versions of matplotlib and dask.
Compatibility fixes for the upcoming pandas 0.25 and NumPy 1.17 releases.By Stephan Hoyer.
Fix summaries for multiindex coordinates (GH3079).By Jonas Hörsch.
Fix HDF5 error that could arise when reading multiple groups from a file atonce (GH2954).By Stephan Hoyer.
v0.12.2 (29 June 2019)
New functions/methods
- Two new functions,
combine_nested()
andcombine_by_coords()
, allow for combining datasets along anynumber of dimensions, instead of the one-dimensional list of datasetssupported byconcat()
.
The new combine_nested
will accept the datasets as a nestedlist-of-lists, and combine by applying a series of concat and mergeoperations. The new combine_by_coords
instead uses the dimensioncoordinates of datasets to order them.
open_mfdataset()
can use either combine_nested
orcombine_by_coords
to combine datasets along multiple dimensions, byspecifying the argument combine='nested'
or combine='by_coords'
.
The older function auto_combine()
has been deprecated,because its functionality has been subsumed by the new functions.To avoid FutureWarnings switch to using combine_nested
orcombine_by_coords
, (or set the combine
argument inopen_mfdataset
). (GH2159)By Tom Nicholas.
rolling_exp()
androlling_exp()
added, similar to pandas’pd.DataFrame.ewm
method. Calling.mean
on the resulting objectwill return an exponentially weighted moving average.By Maximilian Roos.New
DataArray.str
for stringrelated manipulations, based onpandas.Series.str
.By 0x0L.Added
strftime
method to.dt
accessor, making it simpler to hand adatetimeDataArray
to other code expecting formatted dates and times.(GH2090).strftime()
is also nowavailable onCFTimeIndex
.By Alan Brammer andRyan May.quantile()
is now a method ofGroupBy
objects (GH3018).By David Huard.Argument and return types are added to most methods on
DataArray
andDataset
, allowing static type checking both within xarray and externallibraries. Type checking with mypy is enabled inCI (though not required yet).By Guido Imperialeand Maximilian Roos.
Enhancements to existing functionality
Add
keepdims
argument for reduce operations (GH2170)By Scott Wales.Enable
@
operator for DataArray. This is equivalent toDataArray.dot()
By Maximilian Roos.Add
fill_value
argument for reindex, align, and merge operationsto enable custom fill values. (GH2876)By Zach Griffith.DataArray.transpose()
now accepts a keyword argumenttranspose_coords
which enables transposition of coordinates in thesame way asDataset.transpose()
.DataArray.groupby()
DataArray.groupby_bins()
, andDataArray.resample()
nowaccept a keyword argumentrestore_coord_dims
which keeps the orderof the dimensions of multi-dimensional coordinates intact (GH1856).By Peter Hausamann.Clean up Python 2 compatibility in code (GH2950)By Guido Imperiale.
Better warning message when supplying invalid objects to
xr.merge
(GH2948). By Mathias Hauser.Add
errors
keyword argument toDataset.drop()
andDataset.drop_dims()
that allows ignoring errors if a passed label or dimension is not in the dataset(GH2994).By Andrew Ross.
IO related enhancements
Implement
load_dataset()
andload_dataarray()
as alternatives toopen_dataset()
andopen_dataarray()
toopen, load into memory, and close files, returning the Dataset or DataArray.These functions are helpful for avoiding file-lock errors when trying towrite to files opened usingopen_dataset()
oropen_dataarray()
.(GH2887)By Dan Nowacki.It is now possible to extend existing Zarr datasets, by using
mode='a'
and the newappend_dim
argument into_zarr()
.By Jendrik Jördening,David Brochart,Ryan Abernathey andShikhar Goenka.xr.open_zarr
now accepts manually specified chunks with thechunks=
parameter.auto_chunk=True
is equivalent tochunks='auto'
forbackwards compatibility. Theoverwrite_encoded_chunks
parameter isadded to remove the original zarr chunk encoding.By Lily Wang.netCDF chunksizes are now only dropped when original_shape is different,not when it isn’t found. (GH2207)By Karel van de Plassche.
Character arrays’ character dimension name decoding and encoding handled by
var.encoding['char_dim_name']
(GH2895)By James McCreight.open_rasterio() now supports rasterio.vrt.WarpedVRT with custom transform,width and height (GH2864).By Julien Michel.
Bug fixes
Rolling operations on xarray objects containing dask arrays could silentlycompute the incorrect result or use large amounts of memory (GH2940).By Stephan Hoyer.
Don’t set encoding attributes on bounds variables when writing to netCDF.(GH2921)By Deepak Cherian.
NetCDF4 output: variables with unlimited dimensions must be chunked (notcontiguous) on output. (GH1849)By James McCreight.
indexing with an empty list creates an object with zero-length axis (GH2882)By Mayeul d’Avezac.
Return correct count for scalar datetime64 arrays (GH2770)By Dan Nowacki.
Fixed max, min exception when applied to a multiIndex (GH2923)By Ian Castleden
A deep copy deep-copies the coords (GH1463)By Martin Pletcher.
Increased support for missing_value (GH2871)By Deepak Cherian.
Removed usages of pytest.config, which is deprecated (GH2988)By Maximilian Roos.
Fixed performance issues with cftime installed (GH3000)By 0x0L.
Replace incorrect usages of message in pytest assertionswith match (GH3011)By Maximilian Roos.
Add explicit pytest markers, now required by pytest(GH3032).By Maximilian Roos.
Test suite fixes for newer versions of pytest (GH3011, GH3032).By Maximilian Roosand Stephan Hoyer.
v0.12.1 (4 April 2019)
Enhancements
- Allow
expand_dims
method to support inserting/broadcasting dimensionswith size > 1. (GH2710)By Martin Pletcher.
Bug fixes
Dataset.copy(deep=True) now creates a deep copy of the attrs (GH2835).By Andras Gefferth.
Fix incorrect
indexes
resulting from variousDataset
operations(e.g.,swap_dims
,isel
,reindex
,[]
) (GH2842,GH2856).By Stephan Hoyer.
v0.12.0 (15 March 2019)
Highlights include:
Removed support for Python 2. This is the first version of xarray that isPython 3 only!
New
coarsen()
andintegrate()
methods. See Coarsen large arraysand Computation using Coordinates for details.Many improvements to cftime support. See below for details.
Deprecations
- The
compat
argument toDataset
and theencoding
argument toDataArray
are deprecated and will be removed in a future release.(GH1188)By Maximilian Roos.
cftime related enhancements
Resampling of standard and non-standard calendars indexed by
CFTimeIndex
is now possible. (GH2191).By Jwen Fai Low andSpencer Clark.Taking the mean of arrays of
cftime.datetime
objects, andby extension, use ofcoarsen()
withcftime.datetime
coordinates is now possible. By Spencer Clark.Internal plotting now supports
cftime.datetime
objects as time series.(GH2164)By Julius Busecke andSpencer Clark.cftime_range()
now supports QuarterBegin and QuarterEnd offsets (GH2663).By Jwen Fai Lowopen_dataset()
now accepts ause_cftime
argument, whichcan be used to require thatcftime.datetime
objects are always used, ornever used when decoding dates encoded with a standard calendar. This can beused to ensure consistent date types are returned when usingopen_mfdataset()
(GH1263) and/or to silenceserialization warnings raised if dates from a standard calendar are found tobe outside thepandas.Timestamp
-valid range (GH2754). BySpencer Clark.pandas.Series.dropna()
is now supported for apandas.Series
indexed by aCFTimeIndex
(GH2688). By Spencer Clark.
Other enhancements
Added ability to open netcdf4/hdf5 file-like objects with
open_dataset
.Requires (h5netcdf>0.7 and h5py>2.9.0). (GH2781)By Scott HendersonAdd
data=False
option toto_dict()
methods. (GH2656)By Ryan AbernatheyDataArray.coarsen()
andDataset.coarsen()
are newly added.See Coarsen large arrays for details.(GH2525)By Keisuke Fujii.Upsampling an array via interpolation with resample is now dask-compatible,as long as the array is not chunked along the resampling dimension.By Spencer Clark.
xarray.testing.assert_equal()
andxarray.testing.assert_identical()
now provide a more detailedreport showing what exactly differs between the two objects (dimensions /coordinates / variables / attributes) (GH1507).By Benoit Bovy.Add
tolerance
option toresample()
methodsbfill
,pad
,nearest
. (GH2695)By Hauke Schulz.DataArray.integrate()
andDataset.integrate()
are newly added.See Computation using Coordinates for the detail.(GH1332)By Keisuke Fujii.Added
drop_dims()
(GH1949).By Kevin Squire.
Bug fixes
Silenced warnings that appear when using pandas 0.24.By Stephan Hoyer
Interpolating via resample now internally specifies
bounds_error=False
as an argument toscipy.interpolate.interp1d
, allowing for interpolationfrom higher frequencies to lower frequencies. Datapoints outside the boundsof the original time coordinate are now filled with NaN (GH2197). BySpencer Clark.Line plots with the
x
argument set to a non-dimensional coord now plot the correct data for 1D DataArrays.(GH27251). By Tom Nicholas.Subtracting a scalar
cftime.datetime
object from aCFTimeIndex
now results in apandas.TimedeltaIndex
instead of raising aTypeError
(GH2671). By Spencer Clark.backend_kwargs are no longer ignored when using open_dataset with pynio engine(:issue:‘2380’)By Jonathan Joyce.
Fix
open_rasterio
creating a WKT CRS instead of PROJ.4 withrasterio
1.0.14+ (GH2715).By David Hoese.Masking data arrays with
xarray.DataArray.where()
now returns anarray with the name of the original masked array (GH2748 and GH2457).By Yohai Bar-Sinai.Fixed error when trying to reduce a DataArray using a function which does notrequire an axis argument. (GH2768)By Tom Nicholas.
Concatenating a sequence of
DataArray
with varying namessets the name of the output array toNone
, instead of the name of thefirst input array. If the names are the same it sets the name to that,instead to the name of the first DataArray in the list as it did before.(GH2775). By Tom Nicholas.Per CF conventions,specifying
'standard'
as the calendar type incftime_range()
now correctly refers to the'gregorian'
calendar instead of the'proleptic_gregorian'
calendar (GH2761).
v0.11.3 (26 January 2019)
Bug fixes
Saving files with times encoded with reference dates with timezones(e.g. ‘2000-01-01T00:00:00-05:00’) no longer raises an error(GH2649). By Spencer Clark.
Fixed performance regression with
open_mfdataset
(GH2662).By Tom Nicholas.Fixed supplying an explicit dimension in the
concat_dim
argument totoopen_mfdataset
(GH2647).By Ben Root.
v0.11.2 (2 January 2019)
Removes inadvertently introduced setup dependency on pytest-runner(GH2641). Otherwise, this release is exactly equivalent to 0.11.1.
Warning
This is the last xarray release that will support Python 2.7. Future releaseswill be Python 3 only, but older versions of xarray will always be availablefor Python 2.7 users. For the more details, see:
v0.11.1 (29 December 2018)
This minor release includes a number of enhancements and bug fixes, and two(slightly) breaking changes.
Breaking changes
Minimum rasterio version increased from 0.36 to 1.0 (for
open_rasterio
)Time bounds variables are now also decoded according to CF conventions(GH2565). The previous behavior was to decode them only if theyhad specific time attributes, now these attributes are copiedautomatically from the corresponding time coordinate. This mightbreak downstream code that was relying on these variables to bebrake downstream code that was relying on these variables to benot decoded.By Fabien Maussion.
Enhancements
Ability to read and write consolidated metadata in zarr stores (GH2558).By Ryan Abernathey.
CFTimeIndex
uses slicing for string indexing when possible (likepandas.DatetimeIndex
), which avoids unnecessary copies.By Stephan HoyerEnable passing
rasterio.io.DatasetReader
orrasterio.vrt.WarpedVRT
toopen_rasterio
instead of file path string. Allows for in-memoryreprojection, see (GH2588).By Scott Henderson.Like
pandas.DatetimeIndex
,CFTimeIndex
now supports“dayofyear” and “dayofweek” accessors (GH2597). Note this requires aversion of cftime greater than 1.0.2. By Spencer Clark.The option
'warn_for_unclosed_files'
(False by default) has been added toallow users to enable a warning when files opened by xarray are deallocatedbut were not explicitly closed. This is mostly useful for debugging; werecommend enabling it in your test suites if you use xarray for IO.By Stephan HoyerSupport Dask
HighLevelGraphs
by Matthew Rocklin.DataArray.resample()
andDataset.resample()
now supports theloffset
kwarg just like Pandas.By Deepak CherianDatasets are now guaranteed to have a
'source'
encoding, so the sourcefile name is always stored (GH2550).By Tom Nicholas.The
apply
methods forDatasetGroupBy
,DataArrayGroupBy
,DatasetResample
andDataArrayResample
now support passing positionalarguments to the applied function as a tuple to theargs
argument.By Matti Eskelinen.0d slices of ndarrays are now obtained directly through indexing, rather thanextracting and wrapping a scalar, avoiding unnecessary copying. By DanielWennberg.
Added support for
fill_value
withshift()
andshift()
By Maximilian Roos
Bug fixes
Ensure files are automatically closed, if possible, when no longer referencedby a Python variable (GH2560).By Stephan Hoyer
Fixed possible race conditions when reading/writing to disk in parallel(GH2595).By Stephan Hoyer
Fix h5netcdf saving scalars with filters or chunks (GH2563).By Martin Raspaud.
Fix parsing of
_Unsigned
attribute set by OPENDAP servers. (GH2583).By Deepak CherianFix failure in time encoding when exporting to netCDF with versions of pandasless than 0.21.1 (GH2623). By Spencer Clark.
Fix MultiIndex selection to update label and level (GH2619).By Keisuke Fujii.
v0.11.0 (7 November 2018)
Breaking changes
Finished deprecations (changed behavior with this release):
Dataset.T
has been removed as a shortcut forDataset.transpose()
.CallDataset.transpose()
directly instead.Iterating over a
Dataset
now includes only data variables, not coordinates.Similarily, callinglen
andbool
on aDataset
nowincludes only data variables.DataArray.contains
(used by Python’sin
operator) now checksarray data, not coordinates.The old resample syntax from before xarray 0.10, e.g.,
data.resample('1D', dim='time', how='mean')
, is no longer supported willraise an error in most cases. You need to use the new resample syntaxinstead, e.g.,data.resample(time='1D').mean()
ordata.resample({'time': '1D'}).mean()
.
New deprecations (behavior will be changed in xarray 0.12):
Reduction of
DataArray.groupby()
andDataArray.resample()
without dimension argument will change in the next release.Now we warn a FutureWarning.By Keisuke Fujii.The
inplace
kwarg of a number of DataArray and Dataset methods is beingdeprecated and will be removed in the next release.By Deepak Cherian.
Refactored storage backends:
- Xarray’s storage backends now automatically open and close files whennecessary, rather than requiring opening a file with
autoclose=True
. Aglobal least-recently-used cache is used to store open files; the defaultlimit of 128 open files should suffice in most cases, but can be adjusted ifnecessary withxarray.set_options(file_cache_maxsize=…)
. Theautoclose
argumenttoopen_dataset
and related functions has been deprecated and is now ano-op.
- Xarray’s storage backends now automatically open and close files whennecessary, rather than requiring opening a file with
This change, along with an internal refactor of xarray’s storage backends,should significantly improve performance when reading and writingnetCDF files with Dask, especially when working with many files or usingDask Distributed. By Stephan Hoyer
Support for non-standard calendars used in climate science:
Xarray will now always use
cftime.datetime
objects, ratherthan by default trying to coerce them intonp.datetime64[ns]
objects.ACFTimeIndex
will be used for indexing along timecoordinates in these cases.A new method
to_datetimeindex()
has been addedto aid in converting from aCFTimeIndex
to apandas.DatetimeIndex
for the remaining use-cases whereusing aCFTimeIndex
is still a limitation (e.g. forresample or plotting).Setting the
enable_cftimeindex
option is now a no-op and emits aFutureWarning
.
Enhancements
xarray.DataArray.plot.line()
can now accept multidimensionalcoordinate variables as input. hue must be a dimension name in this case.(GH2407)By Deepak Cherian.Added support for Python 3.7. (GH2271).By Joe Hamman.
Added support for plotting data with pandas.Interval coordinates, such as thosecreated by
groupby_bins()
By Maximilian Maahn.Added
shift()
for shifting the values of aCFTimeIndex by a specified frequency. (GH2244).By Spencer Clark.Added support for using
cftime.datetime
coordinates withdifferentiate()
,differentiate()
,interp()
, andinterp()
.By Spencer ClarkThere is now a global option to either always keep or always discarddataset and dataarray attrs upon operations. The option is set with
xarray.set_options(keep_attrs=True)
, and the default is to use the oldbehaviour.By Tom Nicholas.Added a new backend for the GRIB file format based on ECMWF cfgrib_python driver and _ecCodes C-library. (GH2475)By Alessandro Amici,sponsored by ECMWF.
Resample now supports a dictionary mapping from dimension to frequency asits first argument, e.g.,
data.resample({'time': '1D'}).mean()
. This isconsistent with other xarray functions that accept either dictionaries orkeyword arguments. By Stephan Hoyer.The preferred way to access tutorial data is now to load it lazily with
xarray.tutorial.opendataset()
.xarray.tutorial.load_dataset()
calls _Dataset.load() priorto returning (and is now deprecated). This was changed in order to facilitateusing tutorial datasets with dask.By Joe Hamman.DataArray
can now usexr.setoption(keep_attrs=True)
and retain attributes in binary operations,such as (+, -, * ,/
). Default behaviour is unchanged (_Attributes will be dismissed). By Michael Blaschek
Bug fixes
FacetGrid
now properly uses thecbar_kwargs
keyword argument.(GH1504, GH1717)By Deepak Cherian.Addition and subtraction operators used with a CFTimeIndex now preserve theindex’s type. (GH2244).By Spencer Clark.
We now properly handle arrays of
datetime.datetime
anddatetime.timedelta
provided as coordinates. (GH2512)By Deepak Cherian.xarray.DataArray.roll
correctly handles multidimensional arrays.(GH2445)By Keisuke Fujii.xarray.plot()
now properly accepts anorm
argument and does not overridethe norm’svmin
andvmax
. (GH2381)By Deepak Cherian.xarray.DataArray.std()
now correctly acceptsddof
keyword argument.(GH2240)By Keisuke Fujii.Restore matplotlib’s default of plotting dashed negative contours whena single color is passed to
DataArray.contour()
e.g.colors='k'
.By Deepak Cherian.Fix a bug that caused some indexing operations on arrays opened with
open_rasterio
to error (GH2454).By Stephan Hoyer.Subtracting one CFTimeIndex from another now returns a
pandas.TimedeltaIndex
, analogous to the behavior for DatetimeIndexes(GH2484). By Spencer Clark.Adding a TimedeltaIndex to, or subtracting a TimedeltaIndex from aCFTimeIndex is now allowed (GH2484).By Spencer Clark.
Avoid use of Dask’s deprecated
get=
parameter in testsby Matthew Rocklin.An
OverflowError
is now accurately raised and caught during theencoding process if a reference date is used that is so distant thatthe dates must be encoded using cftime rather than NumPy (GH2272).By Spencer Clark.Chunked datasets can now roundtrip to Zarr storage continuallywith to_zarr and
open_zarr
(GH2300).By Lily Wang.
v0.10.9 (21 September 2018)
This minor release contains a number of backwards compatible enhancements.
Announcements of note:
Xarray is now a NumFOCUS fiscally sponsored project! Readthe anouncementfor more details.
We have a new Development roadmap that outlines our future development plans.
Dataset.apply now properly documents the way func is called.By Matti Eskelinen.
Enhancements
differentiate()
anddifferentiate()
are newly added.(GH1332)By Keisuke Fujii.Default colormap for sequential and divergent data can now be set via
set_options()
(GH2394)By Julius Busecke.min_count option is newly supported in
sum()
,prod()
andsum()
, andprod()
.(GH2230)By Keisuke Fujii.plot()
now accepts the kwargsxscale, yscale, xlim, ylim, xticks, yticks
just like Pandas. Alsoxincrease=False, yincrease=False
now use matplotlib’s axis inverting methods instead of setting limits.By Deepak Cherian. (GH2224)DataArray coordinates and Dataset coordinates and data variables arenow displayed as a b … y z rather than a b c d ….(GH1186)By Seth P.
A new CFTimeIndex-enabled
cftime_range()
function for use ingenerating dates from standard or non-standard calendars. By Spencer Clark.When interpolating over a
datetime64
axis, you can now provide a datetime string instead of adatetime64
object. E.g.da.interp(time='1991-02-01')
(GH2284)By Deepak Cherian.A clear error message is now displayed if a
set
ordict
is passed in place of an array(GH2331)By Maximilian Roos.Applying
unstack
to a large DataArray or Dataset is now much faster if the MultiIndex has not been modified after stacking the indices.(GH1560)By Maximilian Maahn.You can now control whether or not to offset the coordinates when usingthe
roll
method and the current behavior, coordinates rolled by default,raises a deprecation warning unless explicitly setting the keyword argument.(GH1875)By Andrew Huang.You can now call
unstack
without arguments to unstack every MultiIndex in a DataArray or Dataset.By Julia Signell.Added the ability to pass a data kwarg to
copy
to create a new object with thesame metadata as the original object but using new values.By Julia Signell.
Bug fixes
xarray.plot.imshow()
correctly uses theorigin
argument.(GH2379)By Deepak Cherian.Fixed
DataArray.to_iris()
failure while creatingDimCoord
byfalling back to creatingAuxCoord
. Fixed dependency onvar_name
attribute being set.(GH2201)By Thomas Voigt.Fixed a bug in
zarr
backend which prevented use with datasets withinvalid chunk size encoding after reading from an existing store(GH2278).By Joe Hamman.Tests can be run in parallel with pytest-xdistBy Tony Tung.
Follow up the renamings in dask; from dask.ghost to dask.overlapBy Keisuke Fujii.
Now raises a ValueError when there is a conflict between dimension names andlevel names of MultiIndex. (GH2299)By Keisuke Fujii.
Follow up the renamings in dask; from dask.ghost to dask.overlapBy Keisuke Fujii.
Now
xr.apply_ufunc()
raises a ValueError when the size ofinput_core_dims
is inconsistent with the number of arguments.(GH2341)By Keisuke Fujii.Fixed
Dataset.filter_by_attrs()
behavior not matchingnetCDF4.Dataset.get_variables_by_attributes()
.When more than onekey=value
is passed intoDataset.filter_by_attrs()
it will now return a Dataset with variables which passall the filters.(GH2315)By Andrew Barna.
v0.10.8 (18 July 2018)
Breaking changes
Xarray no longer supports python 3.4. Additionally, the minimum supportedversions of the following dependencies has been updated and/or clarified:
Pandas: 0.18 -> 0.19
NumPy: 1.11 -> 1.12
Dask: 0.9 -> 0.16
Matplotlib: unspecified -> 1.5
(GH2204). By Joe Hamman.
Enhancements
interp_like()
andinterp_like()
methods are newly added.(GH2218)By Keisuke Fujii.Added support for curvilinear and unstructured generic gridsto
to_cdms2()
andfrom_cdms2()
(GH2262).By Stephane Raynaud.
Bug fixes
Fixed a bug in
zarr
backend which prevented use with datasets withincomplete chunks in multiple dimensions (GH2225).By Joe Hamman.Fixed a bug in
to_netcdf()
which prevented writingdatasets when the arrays had different chunk sizes (GH2254).By Mike Neish.Fixed masking during the conversion to cdms2 objects by
to_cdms2()
(GH2262).By Stephane Raynaud.Fixed a bug in 2D plots which incorrectly raised an error when 2D coordinatesweren’t monotonic (GH2250).By Fabien Maussion.
Fixed warning raised in
to_netcdf()
due to deprecation ofeffective_get in dask (GH2238).By Joe Hamman.
v0.10.7 (7 June 2018)
Enhancements
Plot labels now make use of metadata that follow CF conventions(GH2135).By Deepak Cherian and Ryan Abernathey.
Line plots now support facetting with
row
andcol
arguments(GH2107).By Yohai Bar Sinai.interp()
andinterp()
methods are newly added.See interpolating values with interp for the detail.(GH2079)By Keisuke Fujii.
Bug fixes
- Fixed a bug in
rasterio
backend which prevented use withdistributed
.Therasterio
backend now returns pickleable objects (GH2021).By Joe Hamman.
v0.10.6 (31 May 2018)
The minor release includes a number of bug-fixes and backwards compatibleenhancements.
Enhancements
New PseudoNetCDF backend for many Atmospheric data formats includingGEOS-Chem, CAMx, NOAA arlpacked bit and many others. SeeFormats supported by PseudoNetCDF for more details.By Barron Henderson.
The
Dataset
constructor now alignsDataArray
arguments indata_vars
to indexes set explicitly incoords
,where previously an error would be raised.(GH674)By Maximilian Roos.sel()
,isel()
&reindex()
,(and theirDataset
counterparts) now support supplying adict
as a first argument, as an alternative to the existing approachof supplying kwargs. This allows for more robust behaviorof dimension names which conflict with other keyword names, or arenot strings.By Maximilian Roos.rename()
now supports supplying**kwargs
, as analternative to the existing approach of supplying adict
as thefirst argument.By Maximilian Roos.cumsum()
andcumprod()
now supportaggregation over multiple dimensions at the same time. This is the defaultbehavior when dimensions are not specified (previously this raised an error).By Stephan HoyerDataArray.dot()
anddot()
are partly supported with olderdask<0.17.4. (related to GH2203)By Keisuke Fujii.Xarray now uses Versioneerto manage its version strings. (GH1300).By Joe Hamman.
Bug fixes
Fixed a regression in 0.10.4, where explicitly specifying
dtype='S1'
ordtype=str
inencoding
withto_netcdf()
raised an error(GH2149).Stephan Hoyerapply_ufunc()
now directly validates output variables(GH1931).By Stephan Hoyer.Fixed a bug where
to_netcdf(…, unlimited_dims='bar')
yielded NetCDFfiles with spurious 0-length dimensions (i.e.b
,a
, andr
)(GH2134).By Joe Hamman.Removed spurious warnings with
Dataset.update(Dataset)
(GH2161)andarray.equals(array)
whenarray
containsNaT
(GH2162).By Stephan Hoyer.Aggregations with
Dataset.reduce()
(includingmean
,sum
,etc) no longer drop unrelated coordinates (GH1470). Also fixed abug where non-scalar data-variables that did not include the aggregationdimension were improperly skipped.By Stephan HoyerFix
stack()
with non-unique coordinates on pandas 0.23(GH2160).By Stephan HoyerSelecting data indexed by a length-1
CFTimeIndex
with a slice of stringsnow behaves as it does when using a length-1DatetimeIndex
(i.e. it nolonger falsely returns an empty array when the slice includes the value inthe index) (GH2165).By Spencer Clark.Fix
DataArray.groupby().reduce()
mutating coordinates on the input arraywhen grouping over dimension coordinates with duplicated entries(GH2153).By Stephan HoyerFix
Dataset.to_netcdf()
cannot create group withengine="h5netcdf"
(GH2177).By Stephan Hoyer
v0.10.4 (16 May 2018)
The minor release includes a number of bug-fixes and backwards compatibleenhancements. A highlight is CFTimeIndex
, which offers support fornon-standard calendars used in climate modeling.
Documentation
New FAQ entry, faq.other_projects.By Deepak Cherian.
Assigning values with indexing now includes examples on how to select and assignvalues to a
DataArray
with.loc
.By Chiara Lepore.
Enhancements
Add an option for using a
CFTimeIndex
for indexing times withnon-standard calendars and/or outside the Timestamp-valid range; this indexenables a subset of the functionality of a standardpandas.DatetimeIndex
.See Non-standard calendars and dates outside the Timestamp-valid range for full details.(GH789, GH1084, GH1252)By Spencer Clark with help fromStephan Hoyer.Allow for serialization of
cftime.datetime
objects (GH789,GH1084, GH2008, GH1252) using the standalonecftime
library.By Spencer Clark.Support writing lists of strings as netCDF attributes (GH2044).By Dan Nowacki.
to_netcdf()
withengine='h5netcdf'
now accepts h5pyencoding settingscompression
andcompression_opts
, along with theNetCDF4-Python style settingsgzip=True
andcomplevel
.This allows using any compression plugin installed in hdf5, e.g. LZF(GH1536). By Guido Imperiale.dot()
on dask-backed data will now calldask.array.einsum()
.This greatly boosts speed and allows chunking on the core dims.The function now requires dask >= 0.17.3 to work on dask-backed data(GH2074). By Guido Imperiale.plot.line()
learned new kwargs:xincrease
,yincrease
that changethe direction of the respective axes.By Deepak Cherian.Added the
parallel
option toopen_mfdataset()
. This option usesdask.delayed
to parallelize the open and preprocessing steps withinopen_mfdataset
. This is expected to provide performance improvements whenopening many files, particularly when used in conjunction with dask’smultiprocessing or distributed schedulers (GH1981).By Joe Hamman.New
compute
option into_netcdf()
,to_zarr()
, andsave_mfdataset()
toallow for the lazy computation of netCDF and zarr stores. This feature iscurrently only supported by the netCDF4 and zarr backends. (GH1784).By Joe Hamman.
Bug fixes
ValueError
is raised when coordinates with the wrong size are assigned toaDataArray
. (GH2112)By Keisuke Fujii.Fixed a bug in
rolling()
with bottleneck. Also,fixed a bug in rolling an integer dask array. (GH2113)By Keisuke Fujii.Fixed a bug where keep_attrs=True flag was neglected if
apply_ufunc()
was used withVariable
. (GH2114)By Keisuke Fujii.When assigning a
DataArray
toDataset
, any conflictednon-dimensional coordinates of the DataArray are now dropped.(GH2068)By Keisuke Fujii.Better error handling in
open_mfdataset
(GH2077).By Stephan Hoyer.plot.line()
does not callautofmt_xdate()
anymore. Instead it changesthe rotation and horizontal alignment of labels without removing the x-axes ofany other subplots in the figure (if any).By Deepak Cherian.Colorbar limits are now determined by excluding ±Infs too.By Deepak Cherian.By Joe Hamman.
Fixed
to_iris
to maintain lazy dask array after conversion (GH2046).By Alex Hilson and Stephan Hoyer.
v0.10.3 (13 April 2018)
The minor release includes a number of bug-fixes and backwards compatible enhancements.
Enhancements
isin()
andisin()
methods,which test each value in the array for whether it is contained in thesupplied list, returning a bool array. See Selecting values with isinfor full details. Similar to thenp.isin
function.By Maximilian Roos.Some speed improvement to construct
DataArrayRolling
object (GH1993)By Keisuke Fujii.Handle variables with different values for
missing_value
and_FillValue
by masking values for both attributes; previously thisresulted in aValueError
. (GH2016)By Ryan May.
Bug fixes
Fixed
decode_cf
function to operate lazily on dask arrays(GH1372). By Ryan Abernathey.Fixed labeled indexing with slice bounds given by xarray objects withdatetime64 or timedelta64 dtypes (GH1240).By Stephan Hoyer.
Attempting to convert an xarray.Dataset into a numpy array now raises aninformative error message.By Stephan Hoyer.
Fixed a bug in decode_cf_datetime where
int32
arrays weren’t parsedcorrectly (GH2002).By Fabien Maussion.When calling xr.auto_combine() or xr.open_mfdataset() with a concat_dim,the resulting dataset will have that one-element dimension (it wassilently dropped, previously) (GH1988).By Ben Root.
v0.10.2 (13 March 2018)
The minor release includes a number of bug-fixes and enhancements, along withone possibly backwards incompatible change.
Backwards incompatible changes
- The addition of
array_ufunc
for xarray objects (see below) means thatNumPy ufunc methods (e.g.,np.add.reduce
) that previously worked onxarray.DataArray
objects by converting them into NumPy arrays will nowraiseNotImplementedError
instead. In all cases, the work-around issimple: convert your objects explicitly into NumPy arrays before calling theufunc (e.g., with.values
).
Enhancements
Added
dot()
, equivalent tonp.einsum()
.Also,dot()
now supportsdims
option,which specifies the dimensions to sum over.(GH1951)By Keisuke Fujii.Support for writing xarray datasets to netCDF files (netcdf4 backend only)when using the dask.distributedscheduler (GH1464).By Joe Hamman.
Support lazy vectorized-indexing. After this change, flexible indexing suchas orthogonal/vectorized indexing, becomes possible for all the backendarrays. Also, lazy
transpose
is now also supported. (GH1897)By Keisuke Fujii.Implemented NumPy’s
array_ufunc
protocol for all xarray objects(GH1617). This enables using NumPy ufuncs directly onxarray.Dataset
objects with recent versions of NumPy (v1.13 and newer):
- In [1]: ds = xr.Dataset({'a': 1})
- In [2]: np.sin(ds)
- Out[2]:
- <xarray.Dataset>
- Dimensions: ()
- Data variables:
- a float64 0.8415
This obliviates the need for the xarray.ufuncs
module, which will bedeprecated in the future when xarray drops support for older versions ofNumPy. By Stephan Hoyer.
Improve
rolling()
logic.DataArrayRolling()
object now supportsconstruct()
method that returns a viewof the DataArray / Dataset object with the rolling-window dimension addedto the last axis. This enables more flexible operation, such as stridedrolling, windowed rolling, ND-rolling, short-time FFT and convolution.(GH1831, GH1142, GH819)By Keisuke Fujii.line()
learned to make plots with data on x-axis if so specified. (GH575)By Deepak Cherian.
Bug fixes
Raise an informative error message when using
apply_ufunc
with numpyv1.11 (GH1956).By Stephan Hoyer.Fix the precision drop after indexing datetime64 arrays (GH1932).By Keisuke Fujii.
Silenced irrelevant warnings issued by
open_rasterio
(GH1964).By Stephan Hoyer.Fix kwarg colors clashing with auto-inferred cmap (GH1461)By Deepak Cherian.
Fix
imshow()
error when passed an RGB array withsize one in a spatial dimension.By Zac Hatfield-Dodds.
v0.10.1 (25 February 2018)
The minor release includes a number of bug-fixes and backwards compatible enhancements.
Documentation
Added a new guide on Contributing to xarray (GH640)By Joe Hamman.
Added apply_ufunc example to Toy weather data (GH1844).By Liam Brannigan.
New entry Why don’t aggregations return Python scalars? in theFrequently Asked Questions (GH1726).By 0x0L.
Enhancements
New functions and methods:
Added
DataArray.to_iris()
andDataArray.from_iris()
forconverting data arrays to and from Iris Cubes with the same data and coordinates(GH621 and GH37).By Neil Parley and Duncan Watson-Parris.Experimental support for using Zarr as storage layer for xarray(GH1223).By Ryan Abernathey andJoe Hamman.
New
rank()
on arrays and datasets. Requiresbottleneck (GH1731).By 0x0L..dt
accessor can now ceil, floor and round timestamps to specified frequency.By Deepak Cherian.
Plotting enhancements:
xarray.plot.imshow()
now handles RGB and RGBA images.Saturation can be adjusted withvmin
andvmax
, or withrobust=True
.By Zac Hatfield-Dodds.contourf()
learned to contour 2D variables that have both a1D coordinate (e.g. time) and a 2D coordinate (e.g. depth as a function oftime) (GH1737).By Deepak Cherian.plot()
rotates x-axis ticks if x-axis is time.By Deepak Cherian.line()
can draw multiple lines if provided with a2D variable.By Deepak Cherian.
Other enhancements:
- Reduce methods such as
DataArray.sum()
now handles object-type array.
- In [3]: da = xr.DataArray(np.array([True, False, np.nan], dtype=object), dims='x')
- In [4]: da.sum()
- Out[4]:
- <xarray.DataArray ()>
- array(1)
(GH1866)By Keisuke Fujii.
Reduce methods such as
DataArray.sum()
now acceptsdtype
arguments. (GH1838)By Keisuke Fujii.Added nodatavals attribute to DataArray when using
open_rasterio()
. (GH1736).By Alan Snow.Use
pandas.Grouper
class in xarray resample methods rather than thedeprecatedpandas.TimeGrouper
class (GH1766).By Joe Hamman.Experimental support for parsing ENVI metadata to coordinates and attributesin
xarray.open_rasterio()
.By Matti Eskelinen.Reduce memory usage when decoding a variable with a scale_factor, byconverting 8-bit and 16-bit integers to float32 instead of float64(PR1840), and keeping float16 and float32 as float32 (GH1842).Correspondingly, encoded variables may also be saved with a smaller dtype.By Zac Hatfield-Dodds.
Speed of reindexing/alignment with dask array is orders of magnitude fasterwhen inserting missing values (GH1847).By Stephan Hoyer.
Fix
axis
keyword ignored when applyingnp.squeeze
toDataArray
(GH1487).By Florian Pinault.netcdf4-python
has moved the its time handling in thenetcdftime
module toa standalone package (netcdftime). As such, xarray now considers netcdftimean optional dependency. One benefit of this change is that it allows forencoding/decoding of datetimes with non-standard calendars without thenetcdf4-python
dependency (GH1084).By Joe Hamman.
New functions/methods
Bug fixes
Rolling aggregation with
center=True
option now gives the same resultwith pandas including the last element (GH1046).By Keisuke Fujii.Support indexing with a 0d-np.ndarray (GH1921).By Keisuke Fujii.
Added warning in api.py of a netCDF4 bug that occurs whenthe filepath has 88 characters (GH1745).By Liam Brannigan.
Fixed encoding of multi-dimensional coordinates in
to_netcdf()
(GH1763).By Mike Neish.Fixed chunking with non-file-based rasterio datasets (GH1816) andrefactored rasterio test suite.By Ryan Abernathey
Bug fix in open_dataset(engine=’pydap’) (GH1775)By Keisuke Fujii.
Bug fix in vectorized assignment (GH1743, GH1744).Now item assignment to
setitem()
checksBug fix in vectorized assignment (GH1743, GH1744).Now item assignment to
DataArray.setitem()
checkscoordinates of target, destination and keys. If there are any conflict amongthese coordinates,IndexError
will be raised.By Keisuke Fujii.Properly point
DataArray.dask_scheduler()
todask.threaded.get
. By Matthew Rocklin.Bug fixes in
DataArray.plot.imshow()
: all-NaN arrays and arrayswith size one in some dimension can now be plotted, which is good forexploring satellite imagery (GH1780).By Zac Hatfield-Dodds.Fixed
UnboundLocalError
when opening netCDF file (GH1781).By Stephan Hoyer.The
variables
,attrs
, anddimensions
properties have beendeprecated as part of a bug fix addressing an issue where backends wereunintentionally loading the datastores data and attributes repeatedly duringwrites (GH1798).By Joe Hamman.Compatibility fixes to plotting module for Numpy 1.14 and Pandas 0.22(GH1813).By Joe Hamman.
Bug fix in encoding coordinates with
{'_FillValue': None}
in netCDFmetadata (GH1865).By Chris Roth.Fix indexing with lists for arrays loaded from netCDF files with
engine='h5netcdf
(GH1864).By Stephan Hoyer.Corrected a bug with incorrect coordinates for non-georeferenced geotifffiles (GH1686). Internally, we now use the rasterio coordinatetransform tool instead of doing the computations ourselves. A
parse_coordinates
kwarg has beed added toopen_rasterio()
(set toTrue
per default).By Fabien Maussion.The colors of discrete colormaps are now the same regardless if _seaborn_is installed or not (GH1896).By Fabien Maussion.
Fixed dtype promotion rules in
where()
andconcat()
tomatch pandas (GH1847). A combination of strings/numbers orunicode/bytes now promote to object dtype, instead of strings or unicode.By Stephan Hoyer.Fixed bug where
isnull()
was loading datastored as dask arrays (GH1937).By Joe Hamman.
v0.10.0 (20 November 2017)
This is a major release that includes bug fixes, new features and a fewbackwards incompatible changes. Highlights include:
Indexing now supports broadcasting over dimensions, similar to NumPy’svectorized indexing (but better!).
resample()
has a new groupby-like API like pandas.apply_ufunc()
facilitates wrapping and parallelizingfunctions written for NumPy arrays.Performance improvements, particularly for dask and
open_mfdataset()
.
Breaking changes
- xarray now supports a form of vectorized indexing with broadcasting, wherethe result of indexing depends on dimensions of indexers,e.g.,
array.sel(x=ind)
withind.dims == ('y',)
. Alignment betweencoordinates on indexed and indexing objects is also now enforced.Due to these changes, existing uses of xarray objects to index other xarrayobjects will break in some cases.
The new indexing API is much more powerful, supporting outer, diagonal andvectorized indexing in a single interface.The isel_points
and sel_points
methods are deprecated, since they arenow redundant with the isel
/ sel
methods.See Vectorized Indexing for the details (GH1444,GH1436).By Keisuke Fujii andStephan Hoyer.
- A new resampling interface to match pandas’ groupby-like API was added to
Dataset.resample()
andDataArray.resample()
(GH1272). Timeseries resampling isfully supported for data with arbitrary dimensions as is both downsamplingand upsampling (including linear, quadratic, cubic, and spline interpolation).
Old syntax:
- In [5]: ds.resample('24H', dim='time', how='max')
- Out[5]:
- <xarray.Dataset>
- [...]
New syntax:
- In [6]: ds.resample(time='24H').max()
- Out[6]:
- <xarray.Dataset>
- [...]
Note that both versions are currently supported, but using the old syntax willproduce a warning encouraging users to adopt the new syntax.By Daniel Rothenberg.
Calling
repr()
or printing xarray objects at the command line or in aJupyter Notebook will not longer automatically compute dask variables orload data on arrays lazily loaded from disk (GH1522).By Guido Imperiale.Supplying
coords
as a dictionary to theDataArray
constructor withoutalso supplying an explicitdims
argument is no longer supported. Thisbehavior was deprecated in version 0.9 but will now raise an error(GH727).Several existing features have been deprecated and will change to newbehavior in xarray v0.11. If you use any of them with xarray v0.10, youshould see a
FutureWarning
that describes how to update your code:Dataset.T
has been deprecated an alias forDataset.transpose()
(GH1232). In the next major version of xarray, it will provide short-cut lookup for variables or attributes with name'T'
.DataArray.contains
(e.g.,key in data_array
) currently checksfor membership inDataArray.coords
. In the next major version ofxarray, it will check membership in the array data found inDataArray.values
instead (GH1267).Direct iteration over and counting a
Dataset
(e.g.,[k for k in ds]
,ds.keys()
,ds.values()
,len(ds)
andif ds
) currentlyincludes all variables, both data and coordinates. For improved usabilityand consistency with pandas, in the next major version of xarray these willchange to only include data variables (GH884). Useds.variables
,ds.data_vars
ords.coords
as alternatives.
Changes to minimum versions of dependencies:
Old numpy < 1.11 and pandas < 0.18 are no longer supported (GH1512).By Keisuke Fujii.
The minimum supported version bottleneck has increased to 1.1(GH1279).By Joe Hamman.
Enhancements
New functions/methods
New helper function
apply_ufunc()
for wrapping functionswritten to work on NumPy arrays to support labels on xarray objects(GH770).apply_ufunc
also support automatic parallelization formany functions with dask. See Wrapping custom computation andAutomatic parallelization for details.By Stephan Hoyer.Added new method
Dataset.to_dask_dataframe()
, convert a dataset intoa dask dataframe.This allows lazy loading of data from a dataset containing dask arrays (GH1462).By James Munroe.New function
where()
for conditionally switching betweenvalues in xarray objects, likenumpy.where()
:
- In [7]: import xarray as xr
- In [8]: arr = xr.DataArray([[1, 2, 3], [4, 5, 6]], dims=('x', 'y'))
- In [9]: xr.where(arr % 2, 'even', 'odd')
- Out[9]:
- <xarray.DataArray (x: 2, y: 3)>
- array([['even', 'odd', 'even'],
- ['odd', 'even', 'odd']],
- dtype='<U4')
- Dimensions without coordinates: x, y
Equivalently, the where()
method also now supportsthe other
argument, for filling with a value other than NaN
(GH576).By Stephan Hoyer.
- Added
show_versions()
function to aid in debugging(GH1485).By Joe Hamman.
Performance improvements
concat()
was computing variables that aren’t in memory(e.g. dask-based) multiple times;open_mfdataset()
was loading them multiple times from disk. Now, both functions will insteadload them at most once and, if they do, store them in memory in theconcatenated array/dataset (GH1521).By Guido Imperiale.Speed-up (x 100) of
decode_cf_datetime()
.By Christian Chwala.
IO related improvements
Unicode strings (
str
on Python 3) are now round-tripped successfully evenwhen written as character arrays (e.g., as netCDF3 files or when usingengine='scipy'
) (GH1638). This is controlled by the_Encoding
attribute convention, which is also understood directly by the netCDF4-Pythoninterface. See String encoding for full details.By Stephan Hoyer.Support for
data_vars
andcoords
keywords fromconcat()
added toopen_mfdataset()
(GH438). Using these keyword arguments can significantly reducememory usage and increase speed.By Oleksandr Huziy.Support for
pathlib.Path
objects added toopen_dataset()
,open_mfdataset()
,to_netcdf()
, andsave_mfdataset()
(GH799):
- In [10]: from pathlib import Path # In Python 2, use pathlib2!
- In [11]: data_dir = Path("data/")
- In [12]: one_file = data_dir / "dta_for_month_01.nc"
- In [13]: xr.open_dataset(one_file)
- Out[13]:
- <xarray.Dataset>
- [...]
By Willi Rath.
You can now explicitly disable any default
_FillValue
(NaN
forfloating point values) by passing the enconding{'_FillValue': None}
(GH1598).By Stephan Hoyer.More attributes available in
attrs
dictionary whenraster files are opened withopen_rasterio()
.By Greg Brener.Support for NetCDF files using an
_Unsigned
attribute to indicate that aa signed integer data type should be interpreted as unsigned bytes(GH1444).By Eric Bruning.Support using an existing, opened netCDF4
Dataset
withNetCDF4DataStore
. This permits creating anDataset
from a netCDF4Dataset
that has been opened usingother means (GH1459).By Ryan May.Changed
PydapDataStore
to take a Pydap dataset.This permits opening Opendap datasets that require authentication, byinstantiating a Pydap dataset with a session object. Also addedxarray.backends.PydapDataStore.open()
which takes a url and sessionobject (GH1068).By Philip Graae.Support reading and writing unlimited dimensions with h5netcdf (GH1636).By Joe Hamman.
Other improvements
Added
ipython_key_completions
to xarray objects, to enableautocompletion for dictionary-like access in IPython, e.g.,ds['tem
+ tab ->ds['temperature']
(GH1628).By Keisuke Fujii.Support passing keyword arguments to
load
,compute
, andpersist
methods. Any keyword arguments supplied to these methods are passed on tothe corresponding dask function (GH1523).By Joe Hamman.Encoding attributes are now preserved when xarray objects are concatenated.The encoding is copied from the first object (GH1297).By Joe Hamman andGerrit Holl.
Support applying rolling window operations using bottleneck’s moving windowfunctions on data stored as dask arrays (GH1279).By Joe Hamman.
Experimental support for the Dask collection interface (GH1674).By Matthew Rocklin.
Bug fixes
Suppress
RuntimeWarning
issued bynumpy
for “invalid value comparisons”(e.g.NaN
). Xarray now behaves similarly to Pandas in its treatment ofbinary and unary operations on objects with NaNs (GH1657).By Joe Hamman.Unsigned int support for reduce methods with
skipna=True
(GH1562).By Keisuke Fujii.Fixes to ensure xarray works properly with pandas 0.21:
to_series()
andto_dataframe()
should not return apandas.MultiIndex
for 1D data (GH1548).Fix plotting with datetime64 axis labels (GH1661).
By Stephan Hoyer.
open_rasterio()
method now shifts the rasteriocoordinates so that they are centered in each pixel (GH1468).By Greg Brener.rename()
method now doesn’t throw errorsif someVariable
is renamed to the same name as anotherVariable
as long as that otherVariable
is also renamed (GH1477). Thismethod now does throw when twoVariables
would end up with the same nameafter the rename (since one of them would get overwritten in this case).By Prakhar Goel.Fix
xarray.testing.assert_allclose()
to actually useatol
andrtol
arguments when called onDataArray
objects (GH1488).By Stephan Hoyer.xarray
quantile
methods now properly raise aTypeError
when applied toobjects with data stored asdask
arrays (GH1529).By Joe Hamman.Fix positional indexing to allow the use of unsigned integers (GH1405).By Joe Hamman andGerrit Holl.
Creating a
Dataset
now raisesMergeError
if a coordinateshares a name with a dimension but is comprised of arbitrary dimensions(GH1120).By Joe Hamman.open_rasterio()
method now skips rasterio’scrs
attribute if its value isNone
(GH1520).By Leevi Annala.Fix
xarray.DataArray.to_netcdf()
to return bytes when no path isprovided (GH1410).By Joe Hamman.Fix
xarray.save_mfdataset()
to properly raise an informative errorwhen objects other thanDataset
are provided (GH1555).By Joe Hamman.xarray.Dataset.copy()
would not preserve the encoding property(GH1586).By Guido Imperiale.xarray.concat()
would eagerly load dask variables into memory ifthe first argument was a numpy variable (GH1588).By Guido Imperiale.Fix bug in
to_netcdf()
when writing in append mode(GH1215).By Joe Hamman.Fix
netCDF4
backend to properly roundtrip theshuffle
encoding option(GH1606).By Joe Hamman.Fix bug when using
pytest
class decorators to skiping certain unittests.The previous behavior unintentionally causing additional tests to be skipped(GH1531). By Joe Hamman.Fix pynio backend for upcoming release of pynio with Python 3 support(GH1611). By Ben Hillman.
Fix
seaborn
import warning for Seaborn versions 0.8 and newer when theapionly
module was deprecated.(GH1633). By Joe Hamman.Fix COMPAT: MultiIndex checking is fragile(GH1833). By Florian Pinault.
Fix
rasterio
backend for Rasterio versions 1.0alpha10 and newer.(GH1641). By Chris Holden.
Bug fixes after rc1
Suppress warning in IPython autocompletion, related to the deprecationof
.T
attributes (GH1675).By Keisuke Fujii.Fix a bug in lazily-indexing netCDF array. (GH1688)By Keisuke Fujii.
(Internal bug) MemoryCachedArray now supports the orthogonal indexing.Also made some internal cleanups around array wrappers (GH1429).By Keisuke Fujii.
(Internal bug) MemoryCachedArray now always wraps
np.ndarray
byNumpyIndexingAdapter
. (GH1694)By Keisuke Fujii.Fix importing xarray when running Python with
-OO
(GH1706).By Stephan Hoyer.Saving a netCDF file with a coordinates with a spaces in its names now raisesan appropriate warning (GH1689).By Stephan Hoyer.
Fix two bugs that were preventing dask arrays from being specified ascoordinates in the DataArray constructor (GH1684).By Joe Hamman.
Fixed
apply_ufunc
withdask='parallelized'
for scalar arguments(GH1697).By Stephan Hoyer.Fix “Chunksize cannot exceed dimension size” error when writing netCDF4 filesloaded from disk (GH1225).By Stephan Hoyer.
Validate the shape of coordinates with names matching dimensions in theDataArray constructor (GH1709).By Stephan Hoyer.
Raise
NotImplementedError
when attempting to save a MultiIndex to anetCDF file (GH1547).By Stephan Hoyer.Remove netCDF dependency from rasterio backend tests.By Matti Eskelinen
Bug fixes after rc2
Fixed unexpected behavior in
Dataset.set_index()
andDataArray.set_index()
introduced by Pandas 0.21.0. Setting a newindex with a single variable resulted in 1-levelpandas.MultiIndex
instead of a simplepandas.Index
(GH1722). By Benoit Bovy.Fixed unexpected memory loading of backend arrays after
print
.(GH1720). By Keisuke Fujii.
v0.9.6 (8 June 2017)
This release includes a number of backwards compatible enhancements and bugfixes.
Enhancements
New
sortby()
method toDataset
andDataArray
that enable sorting along dimensions (GH967).See the docs for examples.By Chun-Wei Yuan andKyle Heuton.Add
.dt
accessor to DataArrays for computing datetime-like propertiesfor the values they contain, similar topandas.Series
(GH358).By Daniel Rothenberg.Renamed internal dask arrays created by
open_dataset
to match new daskconventions (GH1343).By Ryan Abernathey.as_variable()
is now part of the public API (GH1303).By Benoit Bovy.align()
now supportsjoin='exact'
, which raisesan error instead of aligning when indexes to be aligned are not equal.By Stephan Hoyer.New function
open_rasterio()
for opening raster files withthe rasterio library.See the docs for details.By Joe Hamman,Nic Wayand andFabien Maussion
Bug fixes
Fix error from repeated indexing of datasets loaded from disk (GH1374).By Stephan Hoyer.
Fix a bug where
.isel_points
wrongly assigns unselected coordinate todata_vars
.By Keisuke Fujii.Tutorial datasets are now checked against a reference MD5 sum to confirmsuccessful download (GH1392). By Matthew Gidden.
DataArray.chunk()
now accepts dask specific kwargs likeDataset.chunk()
does. By Fabien Maussion.Support for
engine='pydap'
with recent releases of Pydap (3.2.2+),including on Python 3 (GH1174).
Documentation
- A new galleryallows to add interactive examples to the documentation.By Fabien Maussion.
Testing
Fix test suite failure caused by changes to
pandas.cut
function(GH1386).By Ryan Abernathey.Enhanced tests suite by use of
@network
decorator, which iscontrolled via—run-network-tests
command line argumenttopy.test
(GH1393).By Matthew Gidden.
v0.9.5 (17 April, 2017)
Remove an inadvertently introduced print statement.
v0.9.3 (16 April, 2017)
This minor release includes bug-fixes and backwards compatible enhancements.
Enhancements
New
persist()
method to Datasets and DataArrays toenable persisting data in distributed memory when using Dask (GH1344).By Matthew Rocklin.New
expand_dims()
method forDataArray
andDataset
(GH1326).By Keisuke Fujii.
Bug fixes
Fix
.where()
withdrop=True
when arguments do not have indexes(GH1350). This bug, introduced in v0.9, resulted in xarray producingincorrect results in some cases.By Stephan Hoyer.Fixed writing to file-like objects with
to_netcdf()
(GH1320).Stephan Hoyer.Fixed explicitly setting
engine='scipy'
withto_netcdf
when notproviding a path (GH1321).Stephan Hoyer.Fixed open_dataarray does not pass properly its parameters to open_dataset(GH1359).Stephan Hoyer.
Ensure test suite works when runs from an installed version of xarray(GH1336). Use
@pytest.mark.slow
instead of a custom flag to markslow tests.By Stephan Hoyer
v0.9.2 (2 April 2017)
The minor release includes bug-fixes and backwards compatible enhancements.
Enhancements
rolling
on Dataset is now supported (GH859)..rolling()
on Dataset is now supported (GH859).By Keisuke Fujii.When bottleneck version 1.1 or later is installed, use bottleneck for rolling
var
,argmin
,argmax
, andrank
computations. Also, rollingmedian now accepts amin_periods
argument (GH1276).By Joe Hamman.When
.plot()
is called on a 2D DataArray and only one dimension isspecified withx=
ory=
, the other dimension is now guessed(GH1291).By Vincent Noel.Added new method
assign_attrs()
toDataArray
andDataset
, a chained-method compatible implementation of thedict.update
method on attrs (GH1281).By Henry S. Harrison.Added new
autoclose=True
argument toopen_mfdataset()
to explicitly close opened files when not inuse to prevent occurrence of an OS Error related to too many open files(GH1198).Note, the default isautoclose=False
, which is consistent withprevious xarray behavior.By Phillip J. Wolfram.The
repr()
ofDataset
andDataArray
attributes uses a similarformat to coordinates and variables, with vertically aligned entriestruncated to fit on a single line (GH1319). Hopefully this will stoppeople writingdata.attrs = {}
and discarding metadata in notebooks forthe sake of cleaner output. The full metadata is still available asdata.attrs
.By Zac Hatfield-Dodds.Enhanced tests suite by use of
@slow
and@flaky
decorators, which arecontrolled via—run-flaky
and—skip-slow
command line argumentstopy.test
(GH1336).By Stephan Hoyer andPhillip J. Wolfram.New aggregation on rolling objects
DataArray.rolling(…).count()
which providing a rolling count of valid values (GH1138).
Bug fixes
Rolling operations now keep preserve original dimension order (GH1125).By Keisuke Fujii.
Fixed
sel
withmethod='nearest'
on Python 2.7 and 64-bit Windows(GH1140).Stephan Hoyer.Fixed
where
withdrop='True'
for empty masks (GH1341).By Stephan Hoyer andPhillip J. Wolfram.
v0.9.1 (30 January 2017)
Renamed the “Unindexed dimensions” section in the Dataset
andDataArray
repr (added in v0.9.0) to “Dimensions without coordinates”(GH1199).
v0.9.0 (25 January 2017)
This major release includes five months worth of enhancements and bug fixes from24 contributors, including some significant changes that are not fully backwardscompatible. Highlights include:
Coordinates are now optional in the xarray data model, even for dimensions.
Changes to caching, lazy loading and pickling to improve xarray’s experiencefor parallel computing.
Improvements for accessing and manipulating
pandas.MultiIndex
levels.Many new methods and functions, including
quantile()
,cumsum()
,cumprod()
combine_first
set_index()
,reset_index()
,reorder_levels()
,full_like()
,zeros_like()
,ones_like()
open_dataarray()
,compute()
,Dataset.info()
,testing.assert_equal()
,testing.assert_identical()
, andtesting.assert_allclose()
.
Breaking changes
- Index coordinates for each dimensions are now optional, and no longer createdby default GH1017. You can identify such dimensions without coordinatesby their appearance in list of “Dimensions without coordinates” in the
Dataset
orDataArray
repr:
- In [14]: xr.Dataset({'foo': (('x', 'y'), [[1, 2]])})
- Out[14]:
- <xarray.Dataset>
- Dimensions: (x: 1, y: 2)
- Dimensions without coordinates: x, y
- Data variables:
- foo (x, y) int64 1 2
This has a number of implications:
align()
andreindex()
can now error, ifdimensions labels are missing and dimensions have different sizes.Because pandas does not support missing indexes, methods such as
to_dataframe
/from_dataframe
andstack
/unstack
no longerroundtrip faithfully on all inputs. Usereset_index()
toremove undesired indexes.Dataset.delitem
anddrop()
no longer delete/dropvariables that have dimensions matching a deleted/dropped variable.DataArray.coords.delitem
is now allowed on variables matchingdimension names..sel
and.loc
now handle indexing along a dimension withoutcoordinate labels by doing integer based indexing. SeeMissing coordinate labels for an example.indexes
is no longer guaranteed to include alldimensions names as keys. The new methodget_index()
hasbeen added to get an index for a dimension guaranteed, falling back toproduce a defaultRangeIndex
if necessary.
The default behavior of
merge
is nowcompat='no_conflicts'
, so somemerges will now succeed in cases that previously raisedxarray.MergeError
. Setcompat='broadcast_equals'
to restore theprevious default. See Merging with ‘no_conflicts’ for more details.Reading
values
no longer always caches values in a NumPyarray GH1128. Caching of.values
on variables read from netCDFfiles on disk is still the default whenopen_dataset()
is called withcache=True
.By Guido Imperiale andStephan Hoyer.Pickling a
Dataset
orDataArray
linked to a file on disk no longercaches its values into memory before pickling (GH1128). Instead, picklestores file paths and restores objects by reopening file references. Thisenables preliminary, experimental use of xarray for opening files withdask.distributed.By Stephan Hoyer.Coordinates used to index a dimension are now loaded eagerly into
pandas.Index
objects, instead of loading the values lazily.By Guido Imperiale.Automatic levels for 2d plots are now guaranteed to land on
vmin
andvmax
when these kwargs are explicitly provided (GH1191). Theautomated level selection logic also slightly changed.By Fabien Maussion.DataArray.rename()
behavior changed to strictly change theDataArray.name
if called with string argument, or strictly change coordinate names if called withdict-like argument.By Markus Gonser.By default
to_netcdf()
add a_FillValue = NaN
attributes to float types.By Frederic Laliberte.repr
onDataArray
objects uses an shortened display for NumPy arraydata that is less likely to overflow onto multiple pages (GH1207).By Stephan Hoyer.xarray no longer supports python 3.3, versions of dask prior to v0.9.0,or versions of bottleneck prior to v1.0.
Deprecations
Renamed the
Coordinate
class from xarray’s low level API toIndexVariable
.Variable.to_variable
andVariable.to_coord
have been renamed toto_base_variable()
andto_index_variable()
.Deprecated supplying
coords
as a dictionary to theDataArray
constructor without also supplying an explicitdims
argument. The oldbehavior encouraged relying on the iteration order of dictionaries, which isa bad practice (GH727).Removed a number of methods deprecated since v0.7.0 or earlier:
load_data
,vars
,drop_vars
,dump
,dumps
and thevariables
keyword argument toDataset
.Removed the dummy module that enabled
import xray
.
Enhancements
Added new method
combine_first()
toDataArray
andDataset
, based on the pandas method of the same name (see Combine).By Chun-Wei Yuan.Added the ability to change default automatic alignment (arithmetic_join=”inner”)for binary operations via
set_options()
(see Automatic alignment).By Chun-Wei Yuan.Add checking of
attr
names and values when saving to netCDF, raising usefulerror messages if they are invalid. (GH911).By Robin Wilson.Added ability to save
DataArray
objects directly to netCDF files usingto_netcdf()
, and to load directly from netCDF filesusingopen_dataarray()
(GH915). These remove the needto convert aDataArray
to aDataset
before saving as a netCDF file,and deals with names to ensure a perfect ‘roundtrip’ capability.By Robin Wilson.Multi-index levels are now accessible as “virtual” coordinate variables,e.g.,
ds['time']
can pull out the'time'
level of a multi-index(see Coordinates).sel
also accepts providing multi-index levelsas keyword arguments, e.g.,ds.sel(time='2000-01')
(see Multi-level indexing).By Benoit Bovy.Added
set_index
,reset_index
andreorder_levels
methods toeasily create and manipulate (multi-)indexes (see Set and reset index).By Benoit Bovy.Added the
compat
option'no_conflicts'
tomerge
, allowing thecombination of xarray objects with disjoint (GH742) oroverlapping (GH835) coordinates as long as all present data agrees.By Johnnie Gray. SeeMerging with ‘no_conflicts’ for more details.It is now possible to set
concat_dim=None
explicitly inopen_mfdataset()
to disable inferring a dimension alongwhich to concatenate.By Stephan Hoyer.Added methods
DataArray.compute()
,Dataset.compute()
, andVariable.compute()
as a non-mutating alternative toload()
.By Guido Imperiale.Adds DataArray and Dataset methods
cumsum()
andcumprod()
. By Phillip J. Wolfram.New properties
Dataset.sizes
andDataArray.sizes
forproviding consistent access to dimension length on bothDataset
andDataArray
(GH921).By Stephan Hoyer.New keyword argument
drop=True
forsel()
,isel()
andsqueeze()
for droppingscalar coordinates that arise from indexing.DataArray
(GH242).By Stephan Hoyer.New top-level functions
full_like()
,zeros_like()
, andones_like()
By Guido Imperiale.Overriding a preexisting attribute with
register_dataset_accessor()
orregister_dataarray_accessor()
now issues a warning instead ofraising an error (GH1082).By Stephan Hoyer.Options for axes sharing between subplots are exposed to
FacetGrid
andplot()
, so axessharing can be disabled for polar plots.By Bas Hoonhout.New utility functions
assert_equal()
,assert_identical()
, andassert_allclose()
for asserting relationshipsbetween xarray objects, designed for use in a pytest test suite.figsize
,size
andaspect
plot arguments are now supported for allplots (GH897). See Controlling the figure size for more details.By Stephan Hoyer andFabien Maussion.New
info()
method to summarizeDataset
variablesand attributes. The method prints to a buffer (e.g.stdout
) with outputsimilar to what the command line utilityncdump -h
produces (GH1150).By Joe Hamman.Added the ability write unlimited netCDF dimensions with the
scipy
andnetcdf4
backends via the newencoding
attributeor via theunlimited_dims
argument toto_netcdf()
.By Joe Hamman.New
quantile()
method to calculate quantiles fromDataArray objects (GH1187).By Joe Hamman.
Bug fixes
groupby_bins
now restores empty bins by default (GH1019).By Ryan Abernathey.Fix issues for dates outside the valid range of pandas timestamps(GH975). By Mathias Hauser.
Unstacking produced flipped array after stacking decreasing coordinate values(GH980).By Stephan Hoyer.
Setting
dtype
via theencoding
parameter ofto_netcdf
failed ifthe encoded dtype was the same as the dtype of the original array(GH873).By Stephan Hoyer.Fix issues with variables where both attributes
_FillValue
andmissing_value
are set toNaN
(GH997).By Marco Zühlke..where()
and.fillna()
now preserve attributes (GH1009).By Fabien Maussion.Applying
broadcast()
to an xarray object based on the dask backendwon’t accidentally convert the array from dask to numpy anymore (GH978).By Guido Imperiale.Dataset.concat()
now preserves variables order (GH1027).By Fabien Maussion.Fixed an issue with pcolormesh (GH781). A new
infer_intervals
keyword gives control on whether the cell intervalsshould be computed or not.By Fabien Maussion.Grouping over an dimension with non-unique values with
groupby
givescorrect groups.By Stephan Hoyer.Fixed accessing coordinate variables with non-string names from
.coords
.By Stephan Hoyer.rename()
now simultaneously renames the array andany coordinate with the same name, when supplied via adict
(GH1116).By Yves Delley.Fixed sub-optimal performance in certain operations with object arrays (GH1121).By Yves Delley.
Fix
.groupby(group)
whengroup
has datetime dtype (GH1132).By Jonas Sølvsteen.Fixed a bug with facetgrid (the
norm
keyword was ignored, GH1159).By Fabien Maussion.Resolved a concurrency bug that could cause Python to crash whensimultaneously reading and writing netCDF4 files with dask (GH1172).By Stephan Hoyer.
Fix to make
.copy()
actually copy dask arrays, which will be relevant forfuture releases of dask in which dask arrays will be mutable (GH1180).By Stephan Hoyer.Fix opening NetCDF files with multi-dimensional time variables(GH1229).By Stephan Hoyer.
Performance improvements
isel_points()
andsel_points()
now use vectorised indexing in numpyand dask (GH1161), which can result in several orders of magnitudespeedup.By Jonathan Chambers.
v0.8.2 (18 August 2016)
This release includes a number of bug fixes and minor enhancements.
Breaking changes
broadcast()
andconcat()
now auto-aligninputs, usingjoin=outer
. Previously, these functions raisedValueError
for non-aligned inputs.By Guido Imperiale.
Enhancements
New documentation on Transitioning from pandas.Panel to xarray. ByMaximilian Roos.
New
Dataset
andDataArray
methodsto_dict()
andfrom_dict()
to allow easy conversion betweendictionaries and xarray objects (GH432). Seedictionary IO for more details.By Julia Signell.Added
exclude
andindexes
optional parameters toalign()
,andexclude
optional parameter tobroadcast()
.By Guido Imperiale.Better error message when assigning variables without dimensions(GH971). By Stephan Hoyer.
Better error message when reindex/align fails due to duplicate index values(GH956). By Stephan Hoyer.
Bug fixes
Ensure xarray works with h5netcdf v0.3.0 for arrays with
dtype=str
(GH953). By Stephan Hoyer.Dataset.dir()
(i.e. the method python calls to get autocompleteoptions) failed if one of the dataset’s keys was not a string (GH852).By Maximilian Roos.Dataset
constructor can now take arbitrary objects as values(GH647). By Maximilian Roos.Clarified
copy
argument forreindex()
andalign()
, which now consistently always return new xarrayobjects (GH927).Fix
open_mfdataset
withengine='pynio'
(GH936).By Stephan Hoyer.groupby_bins
sorted bin labels as strings (GH952).By Stephan Hoyer.Fix bug introduced by v0.8.0 that broke assignment to datasets when both theleft and right side have the same non-unique index values (GH956).
v0.8.1 (5 August 2016)
Bug fixes
- Fix bug in v0.8.0 that broke assignment to Datasets with non-uniqueindexes (GH943). By Stephan Hoyer.
v0.8.0 (2 August 2016)
This release includes four months of new features and bug fixes, includingseveral breaking changes.
Breaking changes
Dropped support for Python 2.6 (GH855).
Indexing on multi-index now drop levels, which is consistent with pandas.It also changes the name of the dimension / coordinate when the multi-index isreduced to a single index (GH802).
Contour plots no longer add a colorbar per default (GH866). Filledcontour plots are unchanged.
DataArray.values
and.data
now always returns an NumPy array-likeobject, even for 0-dimensional arrays with object dtype (GH867).Previously,.values
returned native Python objects in such cases. Toconvert the values of scalar arrays to Python objects, use the.item()
method.
Enhancements
Groupby operations now support grouping over multidimensional variables. A newmethod called
groupby_bins()
has also been added toallow users to specify bins for grouping. The new features are described inMultidimensional Grouping and Working with Multidimensional Coordinates.By Ryan Abernathey.DataArray and Dataset method
where()
now supports adrop=True
option that clips coordinate elements that are fully masked. ByPhillip J. Wolfram.New top level
merge()
function allows for combining variables fromany number ofDataset
and/orDataArray
variables. See Mergefor more details. By Stephan Hoyer.DataArray and Dataset method
resample()
now supports thekeep_attrs=False
option that determines whether variable and datasetattributes are retained in the resampled object. ByJeremy McGibbon.Better multi-index support in DataArray and Dataset
sel()
andloc()
methods, which now behave more closely to pandas and whichalso accept dictionaries for indexing based on given level names and labels(see Multi-level indexing). ByBenoit Bovy.New (experimental) decorators
register_dataset_accessor()
andregister_dataarray_accessor()
for registering custom xarrayextensions without subclassing. They are described in the new documentationpage on xarray Internals. By Stephan Hoyer.Round trip boolean datatypes. Previously, writing boolean datatypes to netCDFformats would raise an error since netCDF does not have a bool datatype.This feature reads/writes a dtype attribute to boolean variables in netCDFfiles. By Joe Hamman.
2D plotting methods now have two new keywords (cbar_ax and cbar_kwargs),allowing more control on the colorbar (GH872).By Fabien Maussion.
New Dataset method
filter_by_attrs()
, akin tonetCDF4.Dataset.get_variables_by_attributes
, to easily filterdata variables using its attributes.Filipe Fernandes.
Bug fixes
Attributes were being retained by default for some resamplingoperations when they should not. With the
keep_attrs=False
option, theywill no longer be retained by default. This may be backwards-incompatiblewith some scripts, but the attributes may be kept by adding thekeep_attrs=True
option. ByJeremy McGibbon.Concatenating xarray objects along an axis with a MultiIndex or PeriodIndexpreserves the nature of the index (GH875). ByStephan Hoyer.
Fixed bug in arithmetic operations on DataArray objects whose dimensionsare numpy structured arrays or recarrays GH861, GH837. ByMaciek Swat.
decode_cf_timedelta
now accepts arrays withndim
>1 (GH842).- This fixes issue GH665.Filipe Fernandes.
Fix a bug where xarray.ufuncs that take two arguments would incorrectlyuse to numpy functions instead of dask.array functions (GH876). ByStephan Hoyer.
Support for pickling functions from
xarray.ufuncs
(GH901). ByStephan Hoyer.Variable.copy(deep=True)
no longer converts MultiIndex into a base Index(GH769). By Benoit Bovy.Fixes for groupby on dimensions with a multi-index (GH867). ByStephan Hoyer.
Fix printing datasets with unicode attributes on Python 2 (GH892). ByStephan Hoyer.
Fixed incorrect test for dask version (GH891). ByStephan Hoyer.
Fixed dim argument for isel_points/sel_points when a pandas.Index ispassed. By Stephan Hoyer.
contour()
now plots the correct number of contours(GH866). By Fabien Maussion.
v0.7.2 (13 March 2016)
This release includes two new, entirely backwards compatible features andseveral bug fixes.
Enhancements
New DataArray method
DataArray.dot()
for calculating the dotproduct of two DataArrays along shared dimensions. ByDean Pospisil.Rolling window operations on DataArray objects are now supported via a new
DataArray.rolling()
method. For example:
- In [15]: import xarray as xr; import numpy as np
- In [16]: arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5),
- dims=('x', 'y'))
- In [17]: arr
- Out[17]:
- <xarray.DataArray (x: 3, y: 5)>
- array([[ 0. , 0.5, 1. , 1.5, 2. ],
- [ 2.5, 3. , 3.5, 4. , 4.5],
- [ 5. , 5.5, 6. , 6.5, 7. ]])
- Coordinates:
- * x (x) int64 0 1 2
- * y (y) int64 0 1 2 3 4
- In [18]: arr.rolling(y=3, min_periods=2).mean()
- Out[18]:
- <xarray.DataArray (x: 3, y: 5)>
- array([[ nan, 0.25, 0.5 , 1. , 1.5 ],
- [ nan, 2.75, 3. , 3.5 , 4. ],
- [ nan, 5.25, 5.5 , 6. , 6.5 ]])
- Coordinates:
- * x (x) int64 0 1 2
- * y (y) int64 0 1 2 3 4
See Rolling window operations for more details. ByJoe Hamman.
Bug fixes
Fixed an issue where plots using pcolormesh and Cartopy axes were being distortedby the inference of the axis interval breaks. This change chooses not to modifythe coordinate variables when the axes have the attribute
projection
, allowingCartopy to handle the extent of pcolormesh plots (GH781). ByJoe Hamman.2D plots now better handle additional coordinates which are not
DataArray
dimensions (GH788). By Fabien Maussion.
v0.7.1 (16 February 2016)
This is a bug fix release that includes two small, backwards compatible enhancements.We recommend that all users upgrade.
Enhancements
Numerical operations now return empty objects on no overlapping labels ratherthan raising
ValueError
(GH739).Series
is now supported as valid input to theDataset
constructor (GH740).
Bug fixes
Restore checks for shape consistency between data and coordinates in theDataArray constructor (GH758).
Single dimension variables no longer transpose as part of a broader
.transpose
. This behavior was causingpandas.PeriodIndex
dimensionsto lose their type (GH749)Dataset
labels remain as their native type on.to_dataset
.Previously they were coerced to strings (GH745)Fixed a bug where replacing a
DataArray
index coordinate would improperlyalign the coordinate (GH725).DataArray.reindex_like
now maintains the dtype of complex numbers whenreindexing leads to NaN values (GH738).Dataset.rename
andDataArray.rename
support the old and new namesbeing the same (GH724).Fix
from_dataset()
for DataFrames with Categoricalcolumn and a MultiIndex index (GH737).Fixes to ensure xarray works properly after the upcoming pandas v0.18 andNumPy v1.11 releases.
Acknowledgments
The following individuals contributed to this release:
Edward Richards
Maximilian Roos
Rafael Guedes
Spencer Hill
Stephan Hoyer
v0.7.0 (21 January 2016)
This major release includes redesign of DataArray
internals, as well as new methods for reshaping, rolling and shiftingdata. It includes preliminary support for pandas.MultiIndex
,as well as a number of other features and bug fixes, several of whichoffer improved compatibility with pandas.
New name
The project formerly known as “xray” is now “xarray”, pronounced “x-array”!This avoids a namespace conflict with the entire field of x-ray science. Renamingour project seemed like the right thing to do, especially because somescientists who work with actual x-rays are interested in using this project intheir work. Thanks for your understanding and patience in this transition. Youcan now find our documentation and code repository at new URLs:
To ease the transition, we have simultaneously released v0.7.0 of bothxray
and xarray
on the Python Package Index. These packages areidentical. For now, import xray
still works, except it issues adeprecation warning. This will be the last xray release. Going forward, werecommend switching your import statements to import xarray as xr
.
Breaking changes
- The internal data model used by
DataArray
has beenrewritten to fix several outstanding issues (GH367, GH634,this stackoverflow report). Internally,DataArray
is now implementedin terms of._variable
and._coords
attributes instead of holdingvariables in aDataset
object.
This refactor ensures that if a DataArray has thesame name as one of its coordinates, the array and the coordinate no longershare the same data.
In practice, this means that creating a DataArray with the same name
asone of its dimensions no longer automatically uses that array to label thecorresponding coordinate. You will now need to provide coordinate labelsexplicitly. Here’s the old behavior:
- In [19]: xray.DataArray([4, 5, 6], dims='x', name='x')
- Out[19]:
- <xray.DataArray 'x' (x: 3)>
- array([4, 5, 6])
- Coordinates:
- * x (x) int64 4 5 6
and the new behavior (compare the values of the x
coordinate):
- In [20]: xray.DataArray([4, 5, 6], dims='x', name='x')
- Out[20]:
- <xray.DataArray 'x' (x: 3)>
- array([4, 5, 6])
- Coordinates:
- * x (x) int64 0 1 2
- It is no longer possible to convert a DataArray to a Dataset with
xray.DataArray.to_dataset()
if it is unnamed. This will nowraiseValueError
. If the array is unnamed, you need to supply thename
argument.
Enhancements
- Basic support for
MultiIndex
coordinates on xray objects, includingindexing,stack()
andunstack()
:
- In [21]: df = pd.DataFrame({'foo': range(3),
- ....: 'x': ['a', 'b', 'b'],
- ....: 'y': [0, 0, 1]})
- ....:
- In [22]: s = df.set_index(['x', 'y'])['foo']
- In [23]: arr = xray.DataArray(s, dims='z')
- In [24]: arr
- Out[24]:
- <xray.DataArray 'foo' (z: 3)>
- array([0, 1, 2])
- Coordinates:
- * z (z) object ('a', 0) ('b', 0) ('b', 1)
- In [25]: arr.indexes['z']
- Out[25]:
- MultiIndex(levels=[[u'a', u'b'], [0, 1]],
- labels=[[0, 1, 1], [0, 0, 1]],
- names=[u'x', u'y'])
- In [26]: arr.unstack('z')
- Out[26]:
- <xray.DataArray 'foo' (x: 2, y: 2)>
- array([[ 0., nan],
- [ 1., 2.]])
- Coordinates:
- * x (x) object 'a' 'b'
- * y (y) int64 0 1
- In [27]: arr.unstack('z').stack(z=('x', 'y'))
- Out[27]:
- <xray.DataArray 'foo' (z: 4)>
- array([ 0., nan, 1., 2.])
- Coordinates:
- * z (z) object ('a', 0) ('a', 1) ('b', 0) ('b', 1)
See Stack and unstack for more details.
Warning
xray’s MultiIndex support is still experimental, and we have a long to-do list of desired additions (GH719), including better display ofmulti-index levels when printing a Dataset
, and support for savingdatasets with a MultiIndex to a netCDF file. User contributions in thisarea would be greatly appreciated.
Support for reading GRIB, HDF4 and other file formats via PyNIO. SeeFormats supported by PyNIO for more details.
Better error message when a variable is supplied with the same name asone of its dimensions.
Plotting: more control on colormap parameters (GH642).
vmin
andvmax
will not be silently ignored anymore. Settingcenter=False
prevents automatic selection of a divergent colormap.New
shift()
androll()
methodsfor shifting/rotating datasets or arrays along a dimension:
- In [28]: array = xray.DataArray([5, 6, 7, 8], dims='x')
- In [29]: array.shift(x=2)
- Out[29]:
- <xarray.DataArray (x: 4)>
- array([nan, nan, 5., 6.])
- Dimensions without coordinates: x
- In [30]: array.roll(x=2)
- Out[30]:
- <xarray.DataArray (x: 4)>
- array([7, 8, 5, 6])
- Dimensions without coordinates: x
Notice that shift
moves data independently of coordinates, but roll
moves both data and coordinates.
Assigning a
pandas
object directly as aDataset
variable is now permitted. Itsindex names correspond to thedims
of theDataset
, and its data is aligned.Passing a
pandas.DataFrame
orpandas.Panel
to a Dataset constructoris now permitted.New function
broadcast()
for explicitly broadcastingDataArray
andDataset
objects against each other. For example:
- In [31]: a = xray.DataArray([1, 2, 3], dims='x')
- In [32]: b = xray.DataArray([5, 6], dims='y')
- In [33]: a
- Out[33]:
- <xarray.DataArray (x: 3)>
- array([1, 2, 3])
- Dimensions without coordinates: x
- In [34]: b
- Out[34]:
- <xarray.DataArray (y: 2)>
- array([5, 6])
- Dimensions without coordinates: y
- In [35]: a2, b2 = xray.broadcast(a, b)
- In [36]: a2
- Out[36]:
- <xarray.DataArray (x: 3, y: 2)>
- array([[1, 1],
- [2, 2],
- [3, 3]])
- Dimensions without coordinates: x, y
- In [37]: b2
- Out[37]:
- <xarray.DataArray (x: 3, y: 2)>
- array([[5, 6],
- [5, 6],
- [5, 6]])
- Dimensions without coordinates: x, y
Bug fixes
Fixes for several issues found on
DataArray
objects with the same nameas one of their coordinates (see Breaking changes for more details).DataArray.to_masked_array
always returns masked array with mask being anarray (not a scalar value) (GH684)Allows for (imperfect) repr of Coords when underlying index is PeriodIndex (GH645).
Fixes for several issues found on
DataArray
objects with the same nameas one of their coordinates (see Breaking changes for more details).Attempting to assign a
Dataset
orDataArray
variable/attribute usingattribute-style syntax (e.g.,ds.foo = 42
) now raises an error ratherthan silently failing (GH656, GH714).You can now pass pandas objects with non-numpy dtypes (e.g.,
categorical
ordatetime64
with a timezone) into xray without an error(GH716).
Acknowledgments
The following individuals contributed to this release:
Antony Lee
Fabien Maussion
Joe Hamman
Maximilian Roos
Stephan Hoyer
Takeshi Kanmae
femtotrader
v0.6.1 (21 October 2015)
This release contains a number of bug and compatibility fixes, as wellas enhancements to plotting, indexing and writing files to disk.
Note that the minimum required version of dask for use with xray is nowversion 0.6.
API Changes
- The handling of colormaps and discrete color lists for 2D plots in
plot()
was changed to provide more compatibilitywith matplotlib’scontour
andcontourf
functions (GH538).Now discrete lists of colors should be specified usingcolors
keyword,rather thancmap
.
Enhancements
Faceted plotting through
FacetGrid
and theplot()
method. See Faceting for more detailsand examples.sel()
andreindex()
now supportthetolerance
argument for controlling nearest-neighbor selection(GH629):
- In [38]: array = xray.DataArray([1, 2, 3], dims='x')
- In [39]: array.reindex(x=[0.9, 1.5], method='nearest', tolerance=0.2)
- Out[39]:
- <xray.DataArray (x: 2)>
- array([ 2., nan])
- Coordinates:
- * x (x) float64 0.9 1.5
This feature requires pandas v0.17 or newer.
New
encoding
argument into_netcdf()
for writingnetCDF files with compression, as described in the new documentationsection on Writing encoded data.Add
real
andimag
attributes to Dataset and DataArray (GH553).More informative error message with
from_dataframe()
if the frame has duplicate columns.xray now uses deterministic names for dask arrays it creates or opens fromdisk. This allows xray users to take advantage of dask’s nascent support forcaching intermediate computation results. See GH555 for an example.
Bug fixes
Forwards compatibility with the latest pandas release (v0.17.0). We wereusing some internal pandas routines for datetime conversion, whichunfortunately have now changed upstream (GH569).
Aggregation functions now correctly skip
NaN
for data forcomplex128
dtype (GH554).Fixed indexing 0d arrays with unicode dtype (GH568).
name()
and Dataset keys must be a string or None tobe written to netCDF (GH533).where()
now uses dask instead of numpy if either thearray orother
is a dask array. Previously, ifother
was a numpy arraythe method was evaluated eagerly.Global attributes are now handled more consistently when loading remotedatasets using
engine='pydap'
(GH574).It is now possible to assign to the
.data
attribute of DataArray objects.coordinates
attribute is now kept in the encoding dictionary afterdecoding (GH610).Compatibility with numpy 1.10 (GH617).
Acknowledgments
The following individuals contributed to this release:
Ryan Abernathey
Pete Cable
Clark Fitzgerald
Joe Hamman
Stephan Hoyer
Scott Sinclair
v0.6.0 (21 August 2015)
This release includes numerous bug fixes and enhancements. Highlightsinclude the introduction of a plotting module and the new Dataset and DataArraymethods isel_points()
, sel_points()
,where()
and diff()
. There are nobreaking changes from v0.5.2.
Enhancements
Plotting methods have been implemented on DataArray objects
plot()
through integration with matplotlib(GH185). For an introduction, see Plotting.Variables in netCDF files with multiple missing values are now decoded as NaNafter issuing a warning if open_dataset is called with mask_and_scale=True.
We clarified our rules for when the result from an xray operation is a copyvs. a view (see copies vs views for more details).
Dataset variables are now written to netCDF files in order of appearancewhen using the netcdf4 backend (GH479).
Added
isel_points()
andsel_points()
to support pointwise indexing of Datasets and DataArrays (GH475).
- In [40]: da = xray.DataArray(np.arange(56).reshape((7, 8)),
- ....: coords={'x': list('abcdefg'),
- ....: 'y': 10 * np.arange(8)},
- ....: dims=['x', 'y'])
- ....:
- In [41]: da
- Out[41]:
- <xray.DataArray (x: 7, y: 8)>
- array([[ 0, 1, 2, 3, 4, 5, 6, 7],
- [ 8, 9, 10, 11, 12, 13, 14, 15],
- [16, 17, 18, 19, 20, 21, 22, 23],
- [24, 25, 26, 27, 28, 29, 30, 31],
- [32, 33, 34, 35, 36, 37, 38, 39],
- [40, 41, 42, 43, 44, 45, 46, 47],
- [48, 49, 50, 51, 52, 53, 54, 55]])
- Coordinates:
- * y (y) int64 0 10 20 30 40 50 60 70
- * x (x) |S1 'a' 'b' 'c' 'd' 'e' 'f' 'g'
- # we can index by position along each dimension
- In [42]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points')
- Out[42]:
- <xray.DataArray (points: 3)>
- array([ 0, 9, 48])
- Coordinates:
- y (points) int64 0 10 0
- x (points) |S1 'a' 'b' 'g'
- * points (points) int64 0 1 2
- # or equivalently by label
- In [43]: da.sel_points(x=['a', 'b', 'g'], y=[0, 10, 0], dim='points')
- Out[43]:
- <xray.DataArray (points: 3)>
- array([ 0, 9, 48])
- Coordinates:
- y (points) int64 0 10 0
- x (points) |S1 'a' 'b' 'g'
- * points (points) int64 0 1 2
- New
where()
method for masking xray objects accordingto some criteria. This works particularly well with multi-dimensional data:
- In [44]: ds = xray.Dataset(coords={'x': range(100), 'y': range(100)})
- In [45]: ds['distance'] = np.sqrt(ds.x ** 2 + ds.y ** 2)
- In [46]: ds.distance.where(ds.distance < 100).plot()
- Out[46]: <matplotlib.collections.QuadMesh at 0x7f34256a3278>
Added new methods
DataArray.diff
andDataset.diff
for finitedifference calculations along a given axis.New
to_masked_array()
convenience method forreturning a numpy.ma.MaskedArray.
- In [47]: da = xray.DataArray(np.random.random_sample(size=(5, 4)))
- In [48]: da.where(da < 0.5)
- Out[48]:
- <xarray.DataArray (dim_0: 5, dim_1: 4)>
- array([[0.12697 , nan, 0.260476, nan],
- [0.37675 , 0.336222, 0.451376, nan],
- [0.123102, nan, 0.373012, 0.447997],
- [0.129441, nan, nan, 0.352054],
- [0.228887, nan, nan, 0.137554]])
- Dimensions without coordinates: dim_0, dim_1
- In [49]: da.where(da < 0.5).to_masked_array(copy=True)
- Out[49]:
- masked_array(
- data=[[0.12696983303810094, --, 0.26047600586578334, --],
- [0.37674971618967135, 0.33622174433445307, 0.45137647047539964, --],
- [0.12310214428849964, --, 0.37301222522143085, 0.4479968246859435],
- [0.12944067971751294, --, --, 0.35205353914802473],
- [0.2288873043216132, --, --, 0.1375535565632705]],
- mask=[[False, True, False, True],
- [False, False, False, True],
- [False, True, False, False],
- [False, True, True, False],
- [False, True, True, False]],
- fill_value=1e+20)
- Added new flag “drop_variables” to
open_dataset()
forexcluding variables from being parsed. This may be useful to dropvariables with problems or inconsistent values.
Bug fixes
Fixed aggregation functions (e.g., sum and mean) on big-endian arrays whenbottleneck is installed (GH489).
Dataset aggregation functions dropped variables with unsigned integer dtype(GH505).
.any()
and.all()
were not lazy when used on xray objects containingdask arrays.Fixed an error when attempting to saving datetime64 variables to netCDFfiles when the first element is
NaT
(GH528).Fix pickle on DataArray objects (GH515).
Fixed unnecessary coercion of float64 to float32 when using netcdf3 andnetcdf4_classic formats (GH526).
v0.5.2 (16 July 2015)
This release contains bug fixes, several additional options for opening andsaving netCDF files, and a backwards incompatible rewrite of the advancedoptions for xray.concat
.
Backwards incompatible changes
- The optional arguments
concat_over
andmode
inconcat()
havebeen removed and replaced bydata_vars
andcoords
. The new arguments are bothmore easily understood and more robustly implemented, and allowed us to fix a bugwhereconcat
accidentally loaded data into memory. If you set values forthese optional arguments manually, you will need to update your code. The defaultbehavior should be unchanged.
Enhancements
open_mfdataset()
now supports apreprocess
argument forpreprocessing datasets prior to concatenaton. This is useful if datasetscannot be otherwise merged automatically, e.g., if the original datasetshave conflicting index coordinates (GH443).open_dataset()
andopen_mfdataset()
now use aglobal thread lock by default for reading from netCDF files with dask. Thisavoids possible segmentation faults for reading from netCDF4 files when HDF5is not configured properly for concurrent access (GH444).Added support for serializing arrays of complex numbers with engine=’h5netcdf’.
The new
save_mfdataset()
function allows for saving multipledatasets to disk simultaneously. This is useful when processing large datasetswith dask.array. For example, to save a dataset too big to fit into memoryto one file per year, we could write:
- In [50]: years, datasets = zip(*ds.groupby('time.year'))
- In [51]: paths = ['%s.nc' % y for y in years]
- In [52]: xray.save_mfdataset(datasets, paths)
Bug fixes
Fixed
min
,max
,argmin
andargmax
for arrays with string orunicode types (GH453).open_dataset()
andopen_mfdataset()
supportsupplying chunks as a single integer.Fixed a bug in serializing scalar datetime variable to netCDF.
Fixed a bug that could occur in serialization of 0-dimensional integer arrays.
Fixed a bug where concatenating DataArrays was not always lazy (GH464).
When reading datasets with h5netcdf, bytes attributes are decoded to strings.This allows conventions decoding to work properly on Python 3 (GH451).
v0.5.1 (15 June 2015)
This minor release fixes a few bugs and an inconsistency with pandas. It alsoadds the pipe
method, copied from pandas.
Enhancements
Added
pipe()
, replicating the new pandas method in version0.16.2. See Transforming datasets for more details.assign()
andassign_coords()
now assign new variables in sorted (alphabetical) order, mirroring thebehavior in pandas. Previously, the order was arbitrary.
Bug fixes
xray.concat
fails in an edge case involving identical coordinate variables (GH425)We now decode variables loaded from netCDF3 files with the scipy engine using nativeendianness (GH416). This resolves an issue when aggregating these arrays withbottleneck installed.
v0.5 (1 June 2015)
Highlights
The headline feature in this release is experimental support for out-of-corecomputing (data that doesn’t fit into memory) with dask. This includes a newtop-level function open_mfdataset()
that makes it easy to opena collection of netCDF (using dask) as a single xray.Dataset
object. Formore on dask, read the blog post introducing xray + dask and the newdocumentation section Parallel computing with Dask.
Dask makes it possible to harness parallelism and manipulate gigantic datasetswith xray. It is currently an optional dependency, but it may become requiredin the future.
Backwards incompatible changes
- The logic used for choosing which variables are concatenated with
concat()
has changed. Previously, by default any variableswhich were equal across a dimension were not concatenated. This lead to somesurprising behavior, where the behavior of groupby and concat operationscould depend on runtime values (GH268). For example:
- In [53]: ds = xray.Dataset({'x': 0})
- In [54]: xray.concat([ds, ds], dim='y')
- Out[54]:
- <xray.Dataset>
- Dimensions: ()
- Coordinates:
- *empty*
- Data variables:
- x int64 0
Now, the default always concatenates data variables:
- In [55]: xray.concat([ds, ds], dim='y')
- Out[55]:
- <xarray.Dataset>
- Dimensions: (y: 2)
- Dimensions without coordinates: y
- Data variables:
- x (y) int64 0 0
To obtain the old behavior, supply the argument concat_over=[]
.
Enhancements
- New
to_array()
and enhancedto_dataset()
methods make it easy to switch backand forth between arrays and datasets:
- In [56]: ds = xray.Dataset({'a': 1, 'b': ('x', [1, 2, 3])},
- ....: coords={'c': 42}, attrs={'Conventions': 'None'})
- ....:
- In [57]: ds.to_array()
- Out[57]:
- <xarray.DataArray (variable: 2, x: 3)>
- array([[1, 1, 1],
- [1, 2, 3]])
- Coordinates:
- c int64 42
- * variable (variable) <U1 'a' 'b'
- Dimensions without coordinates: x
- Attributes:
- Conventions: None
- In [58]: ds.to_array().to_dataset(dim='variable')
- Out[58]:
- <xarray.Dataset>
- Dimensions: (x: 3)
- Coordinates:
- c int64 42
- Dimensions without coordinates: x
- Data variables:
- a (x) int64 1 1 1
- b (x) int64 1 2 3
- Attributes:
- Conventions: None
- New
fillna()
method to fill missing values, modeledoff the pandas method of the same name:
- In [59]: array = xray.DataArray([np.nan, 1, np.nan, 3], dims='x')
- In [60]: array.fillna(0)
- Out[60]:
- <xarray.DataArray (x: 4)>
- array([0., 1., 0., 3.])
- Dimensions without coordinates: x
fillna
works on both Dataset
and DataArray
objects, and usesindex based alignment and broadcasting like standard binary operations. Italso can be applied by group, as illustrated inFill missing values with climatology.
- New
assign()
andassign_coords()
methods patterned off the newDataFrame.assign
method in pandas:
- In [61]: ds = xray.Dataset({'y': ('x', [1, 2, 3])})
- In [62]: ds.assign(z = lambda ds: ds.y ** 2)
- Out[62]:
- <xarray.Dataset>
- Dimensions: (x: 3)
- Dimensions without coordinates: x
- Data variables:
- y (x) int64 1 2 3
- z (x) int64 1 4 9
- In [63]: ds.assign_coords(z = ('x', ['a', 'b', 'c']))
- Out[63]:
- <xarray.Dataset>
- Dimensions: (x: 3)
- Coordinates:
- z (x) <U1 'a' 'b' 'c'
- Dimensions without coordinates: x
- Data variables:
- y (x) int64 1 2 3
These methods return a new Dataset (or DataArray) with updated data orcoordinate variables.
sel()
now supports themethod
parameter, which workslike the paramter of the same name onreindex()
. Itprovides a simple interface for doing nearest-neighbor interpolation:
- In [64]: ds.sel(x=1.1, method='nearest')
- Out[64]:
- <xray.Dataset>
- Dimensions: ()
- Coordinates:
- x int64 1
- Data variables:
- y int64 2
- In [65]: ds.sel(x=[1.1, 2.1], method='pad')
- Out[65]:
- <xray.Dataset>
- Dimensions: (x: 2)
- Coordinates:
- * x (x) int64 1 2
- Data variables:
- y (x) int64 2 3
See Nearest neighbor lookups for more details.
You can now control the underlying backend used for accessing remotedatasets (via OPeNDAP) by specifying
engine='netcdf4'
orengine='pydap'
.xray now provides experimental support for reading and writing netCDF4 files directlyvia h5py with the h5netcdf package, avoiding the netCDF4-Python package. Youwill need to install h5netcdf and specify
engine='h5netcdf'
to try thisfeature.Accessing data from remote datasets now has retrying logic (with exponentialbackoff) that should make it robust to occasional bad responses from DAPservers.
You can control the width of the Dataset repr with
xray.set_options
.It can be used either as a context manager, in which case the default is restoredoutside the context:
- In [66]: ds = xray.Dataset({'x': np.arange(1000)})
- In [67]: with xray.set_options(display_width=40):
- ....: print(ds)
- ....:
- <xarray.Dataset>
- Dimensions: (x: 1000)
- Coordinates:
- * x (x) int64 0 1 2 ... 998 999
- Data variables:
- *empty*
Or to set a global option:
- In [68]: xray.set_options(display_width=80)
The default value for the display_width
option is 80.
Deprecations
- The method
load_data()
has been renamed to the more succinctload()
.
v0.4.1 (18 March 2015)
The release contains bug fixes and several new features. All changes should befully backwards compatible.
Enhancements
New documentation sections on Time series data andCombining multiple files.
resample()
lets you resample a dataset or data array toa new temporal resolution. The syntax is the same as pandas, except youneed to supply the time dimension explicitly:
- In [69]: time = pd.date_range('2000-01-01', freq='6H', periods=10)
- In [70]: array = xray.DataArray(np.arange(10), [('time', time)])
- In [71]: array.resample('1D', dim='time')
You can specify how to do the resampling with the how
argument and otheroptions such as closed
and label
let you control labeling:
- In [72]: array.resample('1D', dim='time', how='sum', label='right')
If the desired temporal resolution is higher than the original data(upsampling), xray will insert missing values:
- In [73]: array.resample('3H', 'time')
first
andlast
methods on groupby objects let you take the first orlast examples from each group along the grouped axis:
- In [74]: array.groupby('time.day').first()
These methods combine well with resample
:
- In [75]: array.resample('1D', dim='time', how='first')
swap_dims()
allows for easily swapping one dimensionout for another:
- In [76]: ds = xray.Dataset({'x': range(3), 'y': ('x', list('abc'))})
- In [77]: ds
- Out[77]:
- <xarray.Dataset>
- Dimensions: (x: 3)
- Coordinates:
- * x (x) int64 0 1 2
- Data variables:
- y (x) <U1 'a' 'b' 'c'
- In [78]: ds.swap_dims({'x': 'y'})
- Out[78]:
- <xarray.Dataset>
- Dimensions: (y: 3)
- Coordinates:
- x (y) int64 0 1 2
- * y (y) <U1 'a' 'b' 'c'
- Data variables:
- *empty*
This was possible in earlier versions of xray, but required some contortions.
open_dataset()
andto_netcdf()
nowaccept anengine
argument to explicitly select which underlying library(netcdf4 or scipy) is used for reading/writing a netCDF file.
Bug fixes
Fixed a bug where data netCDF variables read from disk with
engine='scipy'
could still be associated with the file on disk, evenafter closing the file (GH341). This manifested itself in warningsabout mmapped arrays and segmentation faults (if the data was accessed).Silenced spurious warnings about all-NaN slices when using nan-awareaggregation methods (GH344).
Dataset aggregations with
keep_attrs=True
now preserve attributes ondata variables, not just the dataset itself.Tests for xray now pass when run on Windows (GH360).
Fixed a regression in v0.4 where saving to netCDF could fail with the error
ValueError: could not automatically determine time units
.
v0.4 (2 March, 2015)
This is one of the biggest releases yet for xray: it includes some majorchanges that may break existing code, along with the usual collection of minorenhancements and bug fixes. On the plus side, this release includes allhitherto planned breaking changes, so the upgrade path for xray should besmoother going forward.
Breaking changes
- We now automatically align index labels in arithmetic, dataset construction,merging and updating. This means the need for manually invoking methods like
align()
andreindex_like()
should bevastly reduced.
For arithmetic, we alignbased on the intersection of labels:
- In [79]: lhs = xray.DataArray([1, 2, 3], [('x', [0, 1, 2])])
- In [80]: rhs = xray.DataArray([2, 3, 4], [('x', [1, 2, 3])])
- In [81]: lhs + rhs
- Out[81]:
- <xarray.DataArray (x: 2)>
- array([4, 6])
- Coordinates:
- * x (x) int64 1 2
For dataset construction and merging, we align based on theunion of labels:
- In [82]: xray.Dataset({'foo': lhs, 'bar': rhs})
- Out[82]:
- <xarray.Dataset>
- Dimensions: (x: 4)
- Coordinates:
- * x (x) int64 0 1 2 3
- Data variables:
- foo (x) float64 1.0 2.0 3.0 nan
- bar (x) float64 nan 2.0 3.0 4.0
For update and setitem, we align based on the originalobject:
- In [83]: lhs.coords['rhs'] = rhs
- In [84]: lhs
- Out[84]:
- <xarray.DataArray (x: 3)>
- array([1, 2, 3])
- Coordinates:
- * x (x) int64 0 1 2
- rhs (x) float64 nan 2.0 3.0
- Aggregations like
mean
ormedian
now skip missing values by default:
- In [85]: xray.DataArray([1, 2, np.nan, 3]).mean()
- Out[85]:
- <xarray.DataArray ()>
- array(2.)
You can turn this behavior off by supplying the keyword arugmentskipna=False
.
These operations are lightning fast thanks to integration with bottleneck,which is a new optional dependency for xray (numpy is used if bottleneck isnot installed).
- Scalar coordinates no longer conflict with constant arrays with the samevalue (e.g., in arithmetic, merging datasets and concat), even if they havedifferent shape (GH243). For example, the coordinate
c
herepersists through arithmetic, even though it has different shapes on eachDataArray:
- In [86]: a = xray.DataArray([1, 2], coords={'c': 0}, dims='x')
- In [87]: b = xray.DataArray([1, 2], coords={'c': ('x', [0, 0])}, dims='x')
- In [88]: (a + b).coords
- Out[88]:
- Coordinates:
- c (x) int64 0 0
This functionality can be controlled through the compat
option, whichhas also been added to the Dataset
constructor.
- Datetime shortcuts such as
'time.month'
now return aDataArray
withthe name'month'
, not'time.month'
(GH345). This makes iteasier to index the resulting arrays when they are used withgroupby
:
- In [89]: time = xray.DataArray(pd.date_range('2000-01-01', periods=365),
- ....: dims='time', name='time')
- ....:
- In [90]: counts = time.groupby('time.month').count()
- In [91]: counts.sel(month=2)
- Out[91]:
- <xarray.DataArray 'time' ()>
- array(29)
- Coordinates:
- month int64 2
Previously, you would need to use something likecounts.sel(**{'time.month': 2}})
, which is much more awkward.
- The
season
datetime shortcut now returns an array of string labelssuch ‘DJF’:
- In [92]: ds = xray.Dataset({'t': pd.date_range('2000-01-01', periods=12, freq='M')})
- In [93]: ds['t.season']
- Out[93]:
- <xarray.DataArray 'season' (t: 12)>
- array(['DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA', 'SON', 'SON',
- 'SON', 'DJF'], dtype='<U3')
- Coordinates:
- * t (t) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-11-30 2000-12-31
Previously, it returned numbered seasons 1 through 4.
We have updated our use of the terms of “coordinates” and “variables”. Whatwere known in previous versions of xray as “coordinates” and “variables” arenow referred to throughout the documentation as “coordinate variables” and“data variables”. This brings xray in closer alignment to
CF Conventions
_.The only visible change besides the documentation is thatDataset.vars
has been renamedDataset.data_vars
.You will need to update your code if you have been ignoring deprecationwarnings: methods and attributes that were deprecated in xray v0.3 or earlier(e.g.,
dimensions
,attributes`
) have gone away.
Enhancements
- Support for
reindex()
with a fill method. Thisprovides a useful shortcut for upsampling:
- In [94]: data = xray.DataArray([1, 2, 3], [('x', range(3))])
- In [95]: data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
- Out[95]:
- <xarray.DataArray (x: 5)>
- array([1, 2, 2, 3, 3])
- Coordinates:
- * x (x) float64 0.5 1.0 1.5 2.0 2.5
This will be especially useful once pandas 0.16 is released, at which pointxray will immediately support reindexing withmethod=’nearest’.
Use functions that return generic ndarrays with DataArray.groupby.apply andDataset.apply (GH327 and GH329). Thanks Jeff Gerard!
Consolidated the functionality of
dumps
(writing a dataset to a netCDF3bytestring) intoto_netcdf()
(GH333).to_netcdf()
now supports writing to groups in netCDF4files (GH333). It also finally has a full docstring – you should readit!open_dataset()
andto_netcdf()
nowwork on netCDF3 files when netcdf4-python is not installed as long as scipyis available (GH333).The new
Dataset.drop
andDataArray.drop
methods makes it easy to dropexplicitly listed variables or index labels:
- # drop variables
- In [96]: ds = xray.Dataset({'x': 0, 'y': 1})
- In [97]: ds.drop('x')
- Out[97]:
- <xarray.Dataset>
- Dimensions: ()
- Data variables:
- y int64 1
- # drop index labels
- In [98]: arr = xray.DataArray([1, 2, 3], coords=[('x', list('abc'))])
- In [99]: arr.drop(['a', 'c'], dim='x')
- Out[99]:
- <xarray.DataArray (x: 1)>
- array([2])
- Coordinates:
- * x (x) <U1 'b'
broadcast_equals()
has been added to correspond tothe newcompat
option.Long attributes are now truncated at 500 characters when printing a dataset(GH338). This should make things more convenient for working withdatasets interactively.
Added a new documentation example, Calculating Seasonal Averages from Timeseries of Monthly Means. Thanks JoeHamman!
Bug fixes
Several bug fixes related to decoding time units from netCDF files(GH316, GH330). Thanks Stefan Pfenninger!
xray no longer requires
decode_coords=False
when reading datasets withunparseable coordinate attributes (GH308).Fixed
DataArray.loc
indexing with…
(GH318).Fixed an edge case that resulting in an error when reindexingmulti-dimensional variables (GH315).
Slicing with negative step sizes (GH312).
Invalid conversion of string arrays to numeric dtype (GH305).
Fixed
repr()
on dataset objects with non-standard dates (GH347).
Deprecations
dump
anddumps
have been deprecated in favor ofto_netcdf()
.drop_vars
has been deprecated in favor ofdrop()
.
Future plans
The biggest feature I’m excited about working toward in the immediate futureis supporting out-of-core operations in xray using Dask, a part of the Blazeproject. For a preview of using Dask with weather data, readthis blog post by Matthew Rocklin. See GH328 for more details.
v0.3.2 (23 December, 2014)
This release focused on bug-fixes, speedups and resolving some nigglinginconsistencies.
There are a few cases where the behavior of xray differs from the previousversion. However, I expect that in almost all cases your code will continue torun unmodified.
Warning
xray now requires pandas v0.15.0 or later. This was necessary forsupporting TimedeltaIndex without too many painful hacks.
Backwards incompatible changes
- Arrays of
datetime.datetime
objects are now automatically cast todatetime64[ns]
arrays when stored in an xray object, using machineryborrowed from pandas:
- In [100]: from datetime import datetime
- In [101]: xray.Dataset({'t': [datetime(2000, 1, 1)]})
- Out[101]:
- <xarray.Dataset>
- Dimensions: (t: 1)
- Coordinates:
- * t (t) datetime64[ns] 2000-01-01
- Data variables:
- *empty*
xray now has support (including serialization to netCDF) for
TimedeltaIndex
.datetime.timedelta
objectsare thus accordingly cast totimedelta64[ns]
objects when appropriate.Masked arrays are now properly coerced to use
NaN
as a sentinel value(GH259).
Enhancements
- Due to popular demand, we have added experimental attribute style access asa shortcut for dataset variables, coordinates and attributes:
- In [102]: ds = xray.Dataset({'tmin': ([], 25, {'units': 'celcius'})})
- In [103]: ds.tmin.units
- Out[103]: 'celcius'
Tab-completion for these variables should work in editors such as IPython.However, setting variables or attributes in this fashion is not yetsupported because there are some unresolved ambiguities (GH300).
- You can now use a dictionary for indexing with labeled dimensions. Thisprovides a safe way to do assignment with labeled dimensions:
- In [104]: array = xray.DataArray(np.zeros(5), dims=['x'])
- In [105]: array[dict(x=slice(3))] = 1
- In [106]: array
- Out[106]:
- <xarray.DataArray (x: 5)>
- array([1., 1., 1., 0., 0.])
- Dimensions without coordinates: x
Non-index coordinates can now be faithfully written to and restored fromnetCDF files. This is done according to CF conventions when possible byusing the
coordinates
attribute on a data variable. When not possible,xray defines a globalcoordinates
attribute.Preliminary support for converting
xray.DataArray
objects to and fromCDATcdms2
variables.We sped up any operation that involves creating a new Dataset or DataArray(e.g., indexing, aggregation, arithmetic) by a factor of 30 to 50%. The fullspeed up requires cyordereddict to be installed.
Bug fixes
Fix for
to_dataframe()
with 0d string/object coordinates (GH287)Fix for
to_netcdf
with 0d string variable (GH284)Fix writing datetime64 arrays to netcdf if NaT is present (GH270)
Fix align silently upcasts data arrays when NaNs are inserted (GH264)
Future plans
I am contemplating switching to the terms “coordinate variables” and “datavariables” instead of the (currently used) “coordinates” and “variables”,following their use in
CF Conventions
_ (GH293). This would mostlyhave implications for the documentation, but I would also change theDataset
attributevars
todata
.I no longer certain that automatic label alignment for arithmetic would be agood idea for xray – it is a feature from pandas that I have not missed(GH186).
The main API breakage that I do anticipate in the next release is finallymaking all aggregation operations skip missing values by default(GH130). I’m pretty sick of writing
ds.reduce(np.nanmean, 'time')
.The next version of xray (0.4) will remove deprecated features and aliaseswhose use currently raises a warning.
If you have opinions about any of these anticipated changes, I would love tohear them – please add a note to any of the referenced GitHub issues.
v0.3.1 (22 October, 2014)
This is mostly a bug-fix release to make xray compatible with the latestrelease of pandas (v0.15).
We added several features to better support working with missing values andexporting xray objects to pandas. We also reorganized the internal API forserializing and deserializing datasets, but this change should be almostentirely transparent to users.
Other than breaking the experimental DataStore API, there should be nobackwards incompatible changes.
New features
Added
count()
anddropna()
methods, copied from pandas, for working with missing values (GH247,GH58).Added
DataArray.to_pandas
forconverting a data array into the pandas object with the same dimensionality(1D to Series, 2D to DataFrame, etc.) (GH255).Support for reading gzipped netCDF3 files (GH239).
Reduced memory usage when writing netCDF files (GH251).
‘missing_value’ is now supported as an alias for the ‘_FillValue’ attributeon netCDF variables (GH245).
Trivial indexes, equivalent to
range(n)
wheren
is the length of thedimension, are no longer written to disk (GH245).
Bug fixes
Compatibility fixes for pandas v0.15 (GH262).
Fixes for display and indexing of
NaT
(not-a-time) (GH238,GH240)Fix slicing by label was an argument is a data array (GH250).
Test data is now shipped with the source distribution (GH253).
Ensure order does not matter when doing arithmetic with scalar data arrays(GH254).
Order of dimensions preserved with
DataArray.to_dataframe
(GH260).
v0.3 (21 September 2014)
New features
Revamped coordinates: “coordinates” now refer to all arrays that are notused to index a dimension. Coordinates are intended to allow for keeping trackof arrays of metadata that describe the grid on which the points in “variable”arrays lie. They are preserved (when unambiguous) even though mathematicaloperations.
Dataset math
Dataset
objects now support all arithmeticoperations directly. Dataset-array operations map across all datasetvariables; dataset-dataset operations act on each pair of variables with thesame name.GroupBy math: This provides a convenient shortcut for normalizing by theaverage value of a group.
The dataset
repr
method has been entirely overhauled; datasetobjects now show their values when printed.You can now index a dataset with a list of variables to return a new dataset:
ds[['foo', 'bar']]
.
Backwards incompatible changes
Dataset.eq
andDataset.ne
are now element-wise operationsinstead of comparing all values to obtain a single boolean. Use the methodequals()
instead.
Deprecations
Dataset.noncoords
is deprecated: useDataset.vars
instead.Dataset.select_vars
deprecated: index aDataset
with a list ofvariable names instead.DataArray.select_vars
andDataArray.drop_vars
deprecated: usereset_coords()
instead.
v0.2 (14 August 2014)
This is major release that includes some new features and quite a few bugfixes. Here are the highlights:
There is now a direct constructor for
DataArray
objects, which makes itpossible to create a DataArray without using a Dataset. This is highlightedin the refreshed tutorial.You can perform aggregation operations like
mean
directly onDataset
objects, thanks to Joe Hamman. These aggregationmethods also worked on grouped datasets.xray now works on Python 2.6, thanks to Anna Kuznetsova.
A number of methods and attributes were given more sensible (usually shorter)names:
labeled
->sel
,indexed
->isel
,select
->select_vars
,unselect
->drop_vars
,dimensions
->dims
,coordinates
->coords
,attributes
->attrs
.New
load_data()
andclose()
methods for datasets facilitate lower level of control of data loaded fromdisk.
v0.1.1 (20 May 2014)
xray 0.1.1 is a bug-fix release that includes changes that should be almostentirely backwards compatible with v0.1:
Python 3 support (GH53)
Required numpy version relaxed to 1.7 (GH129)
Return numpy.datetime64 arrays for non-standard calendars (GH126)
Support for opening datasets associated with NetCDF4 groups (GH127)
Bug-fixes for concatenating datetime arrays (GH134)
Special thanks to new contributors Thomas Kluyver, Joe Hamman and AlistairMiles.
v0.1 (2 May 2014)
Initial release.