* Sat Feb 02 2019 Arun Persaud <arun@gmx.de>
- update to version 1.1.1:
* Array
+ Add support for cupy.einsum (:pr:`4402`) Johnnie Gray
+ Provide byte size in chunks keyword (:pr:`4434`) Adam Beberg
+ Raise more informative error for histogram bins and range
(:pr:`4430`) James Bourbeau
* DataFrame
+ Lazily register more cudf functions and move to backends file
(:pr:`4396`) Matthew Rocklin
+ Fix ORC tests for pyarrow 0.12.0 (:pr:`4413`) Jim Crist
+ rearrange_by_column: ensure that shuffle arg defaults to 'disk'
if it's None in dask.config (:pr:`4414`) George Sakkis
+ Implement filters for _read_pyarrow (:pr:`4415`) George Sakkis
+ Avoid checking against types in is_dataframe_like (:pr:`4418`)
Matthew Rocklin
+ Pass username as 'user' when using pyarrow (:pr:`4438`) Roma
Sokolov
* Delayed
+ Fix DelayedAttr return value (:pr:`4440`) Matthew Rocklin
* Documentation
+ Use SVG for pipeline graphic (:pr:`4406`) John A Kirkham
+ Add doctest-modules to py.test documentation (:pr:`4427`) Daniel
Severo
* Core
+ Work around psutil 5.5.0 not allowing pickling Process objects
Dimplexion
* Sun Jan 20 2019 Arun Persaud <arun@gmx.de>
- specfile:
* update copyright year
- update to version 1.1.0:
* Array
+ Fix the average function when there is a masked array
(:pr:`4236`) Damien Garaud
+ Add allow_unknown_chunksizes to hstack and vstack (:pr:`4287`)
Paul Vecchio
+ Fix tensordot for 27+ dimensions (:pr:`4304`) Johnnie Gray
+ Fixed block_info with axes. (:pr:`4301`) Tom Augspurger
+ Use safe_wraps for matmul (:pr:`4346`) Mark Harfouche
+ Use chunks="auto" in array creation routines (:pr:`4354`)
Matthew Rocklin
+ Fix np.matmul in dask.array.Array.__array_ufunc__ (:pr:`4363`)
Stephan Hoyer
+ COMPAT: Re-enable multifield copy->view change (:pr:`4357`)
Diane Trout
+ Calling np.dtype on a delayed object works (:pr:`4387`) Jim
Crist
+ Rework normalize_array for numpy data (:pr:`4312`) Marco Neumann
* DataFrame
+ Add fill_value support for series comparisons (:pr:`4250`) James
Bourbeau
+ Add schema name in read_sql_table for empty tables (:pr:`4268`)
Mina Farid
+ Adjust check for bad chunks in map_blocks (:pr:`4308`) Tom
Augspurger
+ Add dask.dataframe.read_fwf (:pr:`4316`) @slnguyen
+ Use atop fusion in dask dataframe (:pr:`4229`) Matthew Rocklin
+ Use parallel_types(`) in from_pandas (:pr:`4331`) Matthew
Rocklin
+ Change DataFrame._repr_data to method (:pr:`4330`) Matthew
Rocklin
+ Install pyarrow fastparquet for Appveyor (:pr:`4338`) Gábor
Lipták
+ Remove explicit pandas checks and provide cudf lazy registration
(:pr:`4359`) Matthew Rocklin
+ Replace isinstance(..., pandas`) with is_dataframe_like
(:pr:`4375`) Matthew Rocklin
+ ENH: Support 3rd-party ExtensionArrays (:pr:`4379`) Tom
Augspurger
+ Pandas 0.24.0 compat (:pr:`4374`) Tom Augspurger
* Documentation
+ Fix link to 'map_blocks' function in array api docs (:pr:`4258`)
David Hoese
+ Add a paragraph on Dask-Yarn in the cloud docs (:pr:`4260`) Jim
Crist
+ Copy edit documentation (:pr:`4267), (:pr:`4263`), (:pr:`4262`),
(:pr:`4277`), (:pr:`4271`), (:pr:`4279), (:pr:`4265`),
(:pr:`4295`), (:pr:`4293`), (:pr:`4296`), (:pr:`4302`),
(:pr:`4306`), (:pr:`4318`), (:pr:`4314`), (:pr:`4309`),
(:pr:`4317`), (:pr:`4326`), (:pr:`4325`), (:pr:`4322`),
(:pr:`4332`), (:pr:`4333`), Miguel Farrajota
+ Fix typo in code example (:pr:`4272`) Daniel Li
+ Doc: Update array-api.rst (:pr:`4259`) (:pr:`4282`) Prabakaran
Kumaresshan
+ Update hpc doc (:pr:`4266`) Guillaume Eynard-Bontemps
+ Doc: Replace from_avro with read_avro in documents (:pr:`4313`)
Prabakaran Kumaresshan
+ Remove reference to "get" scheduler functions in docs
(:pr:`4350`) Matthew Rocklin
+ Fix typo in docstring (:pr:`4376`) Daniel Saxton
+ Added documentation for dask.dataframe.merge (:pr:`4382`)
Jendrik Jördening
* Core
+ Avoid recursion in dask.core.get (:pr:`4219`) Matthew Rocklin
+ Remove verbose flag from pytest setup.cfg (:pr:`4281`) Matthew
Rocklin
+ Support Pytest 4.0 by specifying marks explicitly (:pr:`4280`)
Takahiro Kojima
+ Add High Level Graphs (:pr:`4092`) Matthew Rocklin
+ Fix SerializableLock locked and acquire methods (:pr:`4294`)
Stephan Hoyer
+ Pin boto3 to earlier version in tests to avoid moto conflict
(:pr:`4276`) Martin Durant
+ Treat None as missing in config when updating (:pr:`4324`)
Matthew Rocklin
+ Update Appveyor to Python 3.6 (:pr:`4337`) Gábor Lipták
+ Use parse_bytes more liberally in dask.dataframe/bytes/bag
(:pr:`4339`) Matthew Rocklin
+ Add a better error message when cloudpickle is missing
(:pr:`4342`) Mark Harfouche
+ Support pool= keyword argument in threaded/multiprocessing get
functions (:pr:`4351`) Matthew Rocklin
+ Allow updates from arbitrary Mappings in config.update, not only
dicts. (:pr:`4356`) Stuart Berg
+ Move dask/array/top.py code to dask/blockwise.py (:pr:`4348`)
Matthew Rocklin
+ Add has_parallel_type (:pr:`4395`) Matthew Rocklin
+ CI: Update Appveyor (:pr:`4381`) Tom Augspurger
+ Ignore non-readable config files (:pr:`4388`) Jim Crist
* Sat Dec 01 2018 Arun Persaud <arun@gmx.de>
- update to version 1.0.0:
* Array
+ Add nancumsum/nancumprod unit tests (:pr:`4215`) Guido Imperiale
* DataFrame
+ Add index to to_dask_dataframe docstring (:pr:`4232`) James
Bourbeau
+ Text and fix when appending categoricals with fastparquet
(:pr:`4245`) Martin Durant
+ Don't reread metadata when passing ParquetFile to read_parquet
(:pr:`4247`) Martin Durant
* Documentation
+ Copy edit documentation (:pr:`4222`) (:pr:`4224`) (:pr:`4228`)
(:pr:`4231`) (:pr:`4230`) (:pr:`4234`) (:pr:`4235`) (:pr:`4254`)
Miguel Farrajota
+ Updated doc for the new scheduler keyword (:pr:`4251`) @milesial
* Core
+ Avoid a few warnings (:pr:`4223`) Matthew Rocklin
+ Remove dask.store module (:pr:`4221`) Matthew Rocklin
+ Remove AUTHORS.md Jim Crist
* Thu Nov 22 2018 Arun Persaud <arun@gmx.de>
- update to version 0.20.2:
* Array
+ Avoid fusing dependencies of atop reductions (:pr:`4207`)
Matthew Rocklin
* Dataframe
+ Improve memory footprint for dataframe correlation (:pr:`4193`)
Damien Garaud
+ Add empty DataFrame check to boundary_slice (:pr:`4212`) James
Bourbeau
* Documentation
+ Copy edit documentation (:pr:`4197`) (:pr:`4204`) (:pr:`4198`)
(:pr:`4199`) (:pr:`4200`) (:pr:`4202`) (:pr:`4209`) Miguel
Farrajota
+ Add stats module namespace (:pr:`4206`) James Bourbeau
+ Fix link in dataframe documentation (:pr:`4208`) James Bourbeau
* Mon Nov 12 2018 Arun Persaud <arun@gmx.de>
- update to version 0.20.1:
* Array
+ Only allocate the result space in wrapped_pad_func (:pr:`4153`)
John A Kirkham
+ Generalize expand_pad_width to expand_pad_value (:pr:`4150`)
John A Kirkham
+ Test da.pad with 2D linear_ramp case (:pr:`4162`) John A Kirkham
+ Fix import for broadcast_to. (:pr:`4168`) samc0de
+ Rewrite Dask Array's pad to add only new chunks (:pr:`4152`)
John A Kirkham
+ Validate index inputs to atop (:pr:`4182`) Matthew Rocklin
* Core
+ Dask.config set and get normalize underscores and hyphens
(:pr:`4143`) James Bourbeau
+ Only subs on core collections, not subclasses (:pr:`4159`)
Matthew Rocklin
+ Add block_size=0 option to HTTPFileSystem. (:pr:`4171`) Martin
Durant
+ Add traverse support for dataclasses (:pr:`4165`) Armin Berres
+ Avoid optimization on sharedicts without dependencies
(:pr:`4181`) Matthew Rocklin
+ Update the pytest version for TravisCI (:pr:`4189`) Damien
Garaud
+ Use key_split rather than funcname in visualize names
(:pr:`4160`) Matthew Rocklin
* Dataframe
+ Add fix for DataFrame.__setitem__ for index (:pr:`4151`)
Anderson Banihirwe
+ Fix column choice when passing list of files to fastparquet
(:pr:`4174`) Martin Durant
+ Pass engine_kwargs from read_sql_table to sqlalchemy
(:pr:`4187`) Damien Garaud
* Documentation
+ Fix documentation in Delayed best practices example that
returned an empty list (:pr:`4147`) Jonathan Fraine
+ Copy edit documentation (:pr:`4164`) (:pr:`4175`) (:pr:`4185`)
(:pr:`4192`) (:pr:`4191`) (:pr:`4190`) (:pr:`4180`) Miguel
Farrajota
+ Fix typo in docstring (:pr:`4183`) Carlos Valiente
* Tue Oct 30 2018 Arun Persaud <arun@gmx.de>
- update to version 0.20.0:
* Array
+ Fuse Atop operations (:pr:`3998`), (:pr:`4081`) Matthew Rocklin
+ Support da.asanyarray on dask dataframes (:pr:`4080`) Matthew
Rocklin
+ Remove unnecessary endianness check in datetime test
(:pr:`4113`) Elliott Sales de Andrade
+ Set name=False in array foo_like functions (:pr:`4116`) Matthew
Rocklin
+ Remove dask.array.ghost module (:pr:`4121`) Matthew Rocklin
+ Fix use of getargspec in dask array (:pr:`4125`) Stephan Hoyer
+ Adds dask.array.invert (:pr:`4127`), (:pr:`4131`) Anderson
Banihirwe
+ Raise informative error on arg-reduction on unknown chunksize
(:pr:`4128`), (:pr:`4135`) Matthew Rocklin
+ Normalize reversed slices in dask array (:pr:`4126`) Matthew
Rocklin
* Bag
+ Add bag.to_avro (:pr:`4076`) Martin Durant
* Core
+ Pull num_workers from config.get (:pr:`4086`), (:pr:`4093`)
James Bourbeau
+ Fix invalid escape sequences with raw strings (:pr:`4112`)
Elliott Sales de Andrade
+ Raise an error on the use of the get= keyword and set_options
(:pr:`4077`) Matthew Rocklin
+ Add import for Azure DataLake storage, and add docs (:pr:`4132`)
Martin Durant
+ Avoid collections.Mapping/Sequence (:pr:`4138`) Matthew Rocklin
* Dataframe
+ Include index keyword in to_dask_dataframe (:pr:`4071`) Matthew
Rocklin
+ add support for duplicate column names (:pr:`4087`) Jan Koch
+ Implement min_count for the DataFrame methods sum and prod
(:pr:`4090`) Bart Broere
+ Remove pandas warnings in concat (:pr:`4095`) Matthew Rocklin
+ DataFrame.to_csv header option to only output headers in the
first chunk (:pr:`3909`) Rahul Vaidya
+ Remove Series.to_parquet (:pr:`4104`) Justin Dennison
+ Avoid warnings and deprecated pandas methods (:pr:`4115`)
Matthew Rocklin
+ Swap 'old' and 'previous' when reporting append error
(:pr:`4130`) Martin Durant
* Documentation
+ Copy edit documentation (:pr:`4073`), (:pr:`4074`),
(:pr:`4094`), (:pr:`4097`), (:pr:`4107`), (:pr:`4124`),
(:pr:`4133`), (:pr:`4139`) Miguel Farrajota
+ Fix typo in code example (:pr:`4089`) Antonino Ingargiola
+ Add pycon 2018 presentation (:pr:`4102`) Javad
+ Quick description for gcsfs (:pr:`4109`) Martin Durant
+ Fixed typo in docstrings of read_sql_table method (:pr:`4114`)
TakaakiFuruse
+ Make target directories in redirects if they don't exist
(:pr:`4136`) Matthew Rocklin
* Wed Oct 10 2018 Arun Persaud <arun@gmx.de>
- update to version 0.19.4:
* Array
+ Implement apply_gufunc(..., axes=..., keepdims=...) (:pr:`3985`)
Markus Gonser
* Bag
+ Fix typo in datasets.make_people (:pr:`4069`) Matthew Rocklin
* Dataframe
+ Added percentiles options for dask.dataframe.describe method
(:pr:`4067`) Zhenqing Li
+ Add DataFrame.partitions accessor similar to Array.blocks
(:pr:`4066`) Matthew Rocklin
* Core
+ Pass get functions and Clients through scheduler keyword
(:pr:`4062`) Matthew Rocklin
* Documentation
+ Fix Typo on hpc example. (missing = in kwarg). (:pr:`4068`)
Matthias Bussonier
+ Extensive copy-editing: (:pr:`4065`), (:pr:`4064`), (:pr:`4063`)
Miguel Farrajota
* Mon Oct 08 2018 Arun Persaud <arun@gmx.de>
- update to version 0.19.3:
* Array
+ Make da.RandomState extensible to other modules (:pr:`4041`)
Matthew Rocklin
+ Support unknown dims in ravel no-op case (:pr:`4055`) Jim Crist
+ Add basic infrastructure for cupy (:pr:`4019`) Matthew Rocklin
+ Avoid asarray and lock arguments for from_array(getitem`)
(:pr:`4044`) Matthew Rocklin
+ Move local imports in corrcoef to global imports (:pr:`4030`)
John A Kirkham
+ Move local indices import to global import (:pr:`4029`) John A
Kirkham
+ Fix-up Dask Array's fromfunction w.r.t. dtype and kwargs
(:pr:`4028`) John A Kirkham
+ Don't use dummy expansion for trim_internal in overlapped
(:pr:`3964`) Mark Harfouche
+ Add unravel_index (:pr:`3958`) John A Kirkham
* Bag
+ Sort result in Bag.frequencies (:pr:`4033`) Matthew Rocklin
+ Add support for npartitions=1 edge case in groupby (:pr:`4050`)
James Bourbeau
+ Add new random dataset for people (:pr:`4018`) Matthew Rocklin
+ Improve performance of bag.read_text on small files (:pr:`4013`)
Eric Wolak
+ Add bag.read_avro (:pr:`4000`) (:pr:`4007`) Martin Durant
* Dataframe
+ Added an index parameter to
:meth:`dask.dataframe.from_dask_array` for creating a dask
DataFrame from a dask Array with a given index. (:pr:`3991`) Tom
Augspurger
+ Improve sub-classability of dask dataframe (:pr:`4015`) Matthew
Rocklin
+ Fix failing hdfs test [test-hdfs] (:pr:`4046`) Jim Crist
+ fuse_subgraphs works without normal fuse (:pr:`4042`) Jim Crist
+ Make path for reading many parquet files without prescan
(:pr:`3978`) Martin Durant
+ Index in dd.from_dask_array (:pr:`3991`) Tom Augspurger
+ Making skiprows accept lists (:pr:`3975`) Julia Signell
+ Fail early in fastparquet read for nonexistent column
(:pr:`3989`) Martin Durant
* Core
+ Add support for npartitions=1 edge case in groupby (:pr:`4050`)
James Bourbeau
+ Automatically wrap large arguments with dask.delayed in
map_blocks/partitions (:pr:`4002`) Matthew Rocklin
+ Fuse linear chains of subgraphs (:pr:`3979`) Jim Crist
+ Make multiprocessing context configurable (:pr:`3763`) Itamar
Turner-Trauring
* Documentation
+ Extensive copy-editing (:pr:`4049`), (:pr:`4034`), (:pr:`4031`),
(:pr:`4020`), (:pr:`4021`), (:pr:`4022`), (:pr:`4023`),
(:pr:`4016`), (:pr:`4017`), (:pr:`4010`), (:pr:`3997`),
(:pr:`3996`), Miguel Farrajota
+ Update shuffle method selection docs [skip ci] (:pr:`4048`)
James Bourbeau
+ Remove docs/source/examples, point to examples.dask.org
(:pr:`4014`) Matthew Rocklin
+ Replace readthedocs links with dask.org (:pr:`4008`) Matthew
Rocklin
+ Updates DataFrame.to_hdf docstring for returned values [skip ci]
(:pr:`3992`) James Bourbeau
* Mon Sep 17 2018 Arun Persaud <arun@gmx.de>
- update to version 0.19.2:
* Array
+ apply_gufunc implements automatic infer of functions output
dtypes (:pr:`3936`) Markus Gonser
+ Fix array histogram range error when array has nans (#3980)
James Bourbeau
+ Issue 3937 follow up, int type checks. (#3956) Yu Feng
+ from_array: add @martindurant's explaining of how hashing is
done for an array. (#3965) Mark Harfouche
+ Support gradient with coordinate (#3949) Keisuke Fujii
* Core
+ Fix use of has_keyword with partial in Python 2.7 (#3966) Mark
Harfouche
+ Set pyarrow as default for HDFS (#3957) Matthew Rocklin
* Documentation
+ Use dask_sphinx_theme (#3963) Matthew Rocklin
+ Use JupyterLab in Binder links from main page Matthew Rocklin
+ DOC: fixed sphinx syntax (#3960) Tom Augspurger
* Sat Sep 08 2018 Arun Persaud <arun@gmx.de>
- update to version 0.19.1:
* Array
+ Don't enforce dtype if result has no dtype (:pr:`3928`) Matthew
Rocklin
+ Fix NumPy issubtype deprecation warning (:pr:`3939`) Bruce Merry
+ Fix arg reduction tokens to be unique with different arguments
(:pr:`3955`) Tobias de Jong
+ Coerce numpy integers to ints in slicing code (:pr:`3944`) Yu
Feng
+ Linalg.norm ndim along axis partial fix (:pr:`3933`) Tobias de
Jong
* Dataframe
+ Deterministic DataFrame.set_index (:pr:`3867`) George Sakkis
+ Fix divisions in read_parquet when dealing with filters #3831
[#3930] (:pr:`3923`) (:pr:`3931`) @andrethrill
+ Fixing returning type in categorical.as_known (:pr:`3888`)
Sriharsha Hatwar
+ Fix DataFrame.assign for callables (:pr:`3919`) Tom Augspurger
+ Include partitions with no width in repartition (:pr:`3941`)
Matthew Rocklin
+ Don't constrict stage/k dtype in dataframe shuffle (:pr:`3942`)
Matthew Rocklin
* Documentation
+ DOC: Add hint on how to render task graphs horizontally
(:pr:`3922`) Uwe Korn
+ Add try-now button to main landing page (:pr:`3924`) Matthew
Rocklin
* Sun Sep 02 2018 arun@gmx.de
- specfile:
* remove devel from noarch
- update to version 0.19.0:
* Array
+ Fix argtopk split_every bug (:pr:`3810`) Guido Imperiale
+ Ensure result computing dask.array.isnull(`) always gives a
numpy array (:pr:`3825`) Stephan Hoyer
+ Support concatenate for scipy.sparse in dask array (:pr:`3836`)
Matthew Rocklin
+ Fix argtopk on 32-bit systems. (:pr:`3823`) Elliott Sales de
Andrade
+ Normalize keys in rechunk (:pr:`3820`) Matthew Rocklin
+ Allow shape of dask.array to be a numpy array (:pr:`3844`) Mark
Harfouche
+ Fix numpy deprecation warning on tuple indexing (:pr:`3851`)
Tobias de Jong
+ Rename ghost module to overlap (:pr:`3830`) `Robert Sare`_
+ Re-add the ghost import to da __init__ (:pr:`3861`) Jim Crist
+ Ensure copy preserves masked arrays (:pr:`3852`) Tobias de Jong
* DataFrame
+ Added dtype and sparse keywords to
:func:`dask.dataframe.get_dummies` (:pr:`3792`) Tom Augspurger
+ Added :meth:`dask.dataframe.to_dask_array` for converting a Dask
Series or DataFrame to a Dask Array, possibly with known chunk
sizes (:pr:`3884`) Tom Augspurger
+ Changed the behavior for :meth:`dask.array.asarray` for dask
dataframe and series inputs. Previously, the series was eagerly
converted to an in-memory NumPy array before creating a dask
array with known chunks sizes. This caused unexpectedly high
memory usage. Now, no intermediate NumPy array is created, and a
Dask array with unknown chunk sizes is returned (:pr:`3884`) Tom
Augspurger
+ DataFrame.iloc (:pr:`3805`) Tom Augspurger
+ When reading multiple paths, expand globs. (:pr:`3828`) Irina
Truong
+ Added index column name after resample (:pr:`3833`) Eric
Bonfadini
+ Add (lazy) shape property to dataframe and series (:pr:`3212`)
Henrique Ribeiro
+ Fix failing hdfs test [test-hdfs] (:pr:`3858`) Jim Crist
+ Fixes for pyarrow 0.10.0 release (:pr:`3860`) Jim Crist
+ Rename to_csv keys for diagnostics (:pr:`3890`) Matthew Rocklin
+ Match pandas warnings for concat sort (:pr:`3897`) Tom
Augspurger
+ Include filename in read_csv (:pr:`3908`) Julia Signell
* Core
+ Better error message on import when missing common dependencies
(:pr:`3771`) Danilo Horta
+ Drop Python 3.4 support (:pr:`3840`) Jim Crist
+ Remove expired deprecation warnings (:pr:`3841`) Jim Crist
+ Add DASK_ROOT_CONFIG environment variable (:pr:`3849`) `Joe
Hamman`_
+ Don't cull in local scheduler, do cull in delayed (:pr:`3856`)
Jim Crist
+ Increase conda download retries (:pr:`3857`) Jim Crist
+ Add python_requires and Trove classifiers (:pr:`3855`) @hugovk
+ Fix collections.abc deprecation warnings in Python 3.7.0
(:pr:`3876`) Jan Margeta
+ Allow dot jpeg to xfail in visualize tests (:pr:`3896`) Matthew
Rocklin
+ Add Python 3.7 to travis.yml (:pr:`3894`) Matthew Rocklin
+ Add expand_environment_variables to dask.config (:pr:`3893`)
`Joe Hamman`_
* Docs
+ Fix typo in import statement of diagnostics (:pr:`3826`) John
Mrziglod
+ Add link to YARN docs (:pr:`3838`) Jim Crist
+ fix of minor typos in landing page index.html (:pr:`3746`)
Christoph Moehl
+ Update delayed-custom.rst (:pr:`3850`) Anderson Banihirwe
+ DOC: clarify delayed docstring (:pr:`3709`) Scott Sievert
+ Add new presentations (:pr:`3880`) @javad94
+ Add dask array normalize_chunks to documentation (:pr:`3878`)
Daniel Rothenberg
+ Docs: Fix link to snakeviz (:pr:`3900`) Hans Moritz Günther
+ Add missing ` to docstring (:pr:`3915`) @rtobar
- changes from version 0.18.2:
* Array
+ Reimplemented argtopk to make it release the GIL (:pr:`3610`)
Guido Imperiale
+ Don't overlap on non-overlapped dimensions in map_overlap
(:pr:`3653`) Matthew Rocklin
+ Fix linalg.tsqr for dimensions of uncertain length (:pr:`3662`)
Jeremy Chen
+ Break apart uneven array-of-int slicing to separate chunks
(:pr:`3648`) Matthew Rocklin
+ Align auto chunks to provided chunks, rather than shape
(:pr:`3679`) Matthew Rocklin
+ Adds endpoint and retstep support for linspace (:pr:`3675`)
James Bourbeau
+ Implement .blocks accessor (:pr:`3689`) Matthew Rocklin
+ Add block_info keyword to map_blocks functions (:pr:`3686`)
Matthew Rocklin
+ Slice by dask array of ints (:pr:`3407`) Guido Imperiale
+ Support dtype in arange (:pr:`3722`) Guido Imperiale
+ Fix argtopk with uneven chunks (:pr:`3720`) Guido Imperiale
+ Raise error when replace=False in da.choice (:pr:`3765`) James
Bourbeau
+ Update chunks in Array.__setitem__ (:pr:`3767`) Itamar
Turner-Trauring
+ Add a chunksize convenience property (:pr:`3777`) Jacob
Tomlinson
+ Fix and simplify array slicing behavior when step < 0
(:pr:`3702`) Ziyao Wei
+ Ensure to_zarr with return_stored True returns a Dask Array
(:pr:`3786`) John A Kirkham
* Bag
+ Add last_endline optional parameter in to_textfiles (:pr:`3745`)
George Sakkis
* Dataframe
+ Add aggregate function for rolling objects (:pr:`3772`) Gerome
Pistre
+ Properly tokenize cumulative groupby aggregations (:pr:`3799`)
Cloves Almeida
* Delayed
+ Add the @ operator to the delayed objects (:pr:`3691`) Mark
Harfouche
+ Add delayed best practices to documentation (:pr:`3737`) Matthew
Rocklin
+ Fix @delayed decorator for methods and add tests (:pr:`3757`)
Ziyao Wei
* Core
+ Fix extra progressbar (:pr:`3669`) Mike Neish
+ Allow tasks back onto ordering stack if they have one dependency
(:pr:`3652`) Matthew Rocklin
+ Prefer end-tasks with low numbers of dependencies when ordering
(:pr:`3588`) Tom Augspurger
+ Add assert_eq to top-level modules (:pr:`3726`) Matthew Rocklin
+ Test that dask collections can hold scipy.sparse arrays
(:pr:`3738`) Matthew Rocklin
+ Fix setup of lz4 decompression functions (:pr:`3782`) Elliott
Sales de Andrade
+ Add datasets module (:pr:`3780`) Matthew Rocklin
* Sun Jun 24 2018 arun@gmx.de
- update to version 0.18.1:
* Array
+ from_array now supports scalar types and nested lists/tuples in
input, just like all numpy functions do. It also produces a
simpler graph when the input is a plain ndarray (:pr:`3556`)
Guido Imperiale
+ Fix slicing of big arrays due to cumsum dtype bug (:pr:`3620`)
Marco Rossi
+ Add Dask Array implementation of pad (:pr:`3578`) John A Kirkham
+ Fix array random API examples (:pr:`3625`) James Bourbeau
+ Add average function to dask array (:pr:`3640`) James Bourbeau
+ Tokenize ghost_internal with axes (:pr:`3643`) Matthew Rocklin
+ from_array: special handling for ndarray, list, and scalar types
(:pr:`3568`) Guido Imperiale
+ Add outer for Dask Arrays (:pr:`3658`) John A Kirkham
* DataFrame
+ Add Index.to_series method (:pr:`3613`) Henrique Ribeiro
+ Fix missing partition columns in pyarrow-parquet (:pr:`3636`)
Martin Durant
* Core
+ Minor tweaks to CI (:pr:`3629`) Guido Imperiale
+ Add back dask.utils.effective_get (:pr:`3642`) Matthew Rocklin
+ DASK_CONFIG dictates config write location (:pr:`3621`) Jim
Crist
+ Replace 'collections' key in unpack_collections with unique key
(:pr:`3632`) Yu Feng
+ Avoid deepcopy in dask.config.set (:pr:`3649`) Matthew Rocklin
- changes from version 0.18.0:
* Array
+ Add to/read_zarr for Zarr-format datasets and arrays
(:pr:`3460`) Martin Durant
+ Experimental addition of generalized ufunc support,
apply_gufunc, gufunc, and as_gufunc (:pr:`3109`) (:pr:`3526`)
(:pr:`3539`) Markus Gonser
+ Avoid unnecessary rechunking tasks (:pr:`3529`) Matthew Rocklin
+ Compute dtypes at runtime for fft (:pr:`3511`) Matthew Rocklin
+ Generate UUIDs for all da.store operations (:pr:`3540`) Martin
Durant
+ Correct internal dimension of Dask's SVD (:pr:`3517`) John A
Kirkham
+ BUG: do not raise IndexError for identity slice in array.vindex
(:pr:`3559`) Scott Sievert
+ Adds isneginf and isposinf (:pr:`3581`) John A Kirkham
+ Drop Dask Array's learn module (:pr:`3580`) John A Kirkham
+ added sfqr (short-and-fat) as a counterpart to tsqr…
(:pr:`3575`) Jeremy Chen
+ Allow 0-width chunks in dask.array.rechunk (:pr:`3591`) Marc
Pfister
+ Document Dask Array's nan_to_num in public API (:pr:`3599`) John
A Kirkham
+ Show block example (:pr:`3601`) John A Kirkham
+ Replace token= keyword with name= in map_blocks (:pr:`3597`)
Matthew Rocklin
+ Disable locking in to_zarr (needed for using to_zarr in a
distributed context) (:pr:`3607`) John A Kirkham
+ Support Zarr Arrays in to_zarr/from_zarr (:pr:`3561`) John A
Kirkham
+ Added recursion to array/linalg/tsqr to better manage the single
core bottleneck (:pr:`3586`) `Jeremy Chan`_
* Dataframe
+ Add to/read_json (:pr:`3494`) Martin Durant
+ Adds index to unsupported arguments for DataFrame.rename method
(:pr:`3522`) James Bourbeau
+ Adds support to subset Dask DataFrame columns using
numpy.ndarray, pandas.Series, and pandas.Index objects
(:pr:`3536`) James Bourbeau
+ Raise error if meta columns do not match dataframe (:pr:`3485`)
Christopher Ren
+ Add index to unsupprted argument for DataFrame.rename
(:pr:`3522`) James Bourbeau
+ Adds support for subsetting DataFrames with pandas Index/Series
and numpy ndarrays (:pr:`3536`) James Bourbeau
+ Dataframe sample method docstring fix (:pr:`3566`) James
Bourbeau
+ fixes dd.read_json to infer file compression (:pr:`3594`) Matt
Lee
+ Adds n to sample method (:pr:`3606`) James Bourbeau
+ Add fastparquet ParquetFile object support (:pr:`3573`)
@andrethrill
* Bag
+ Rename method= keyword to shuffle= in bag.groupby (:pr:`3470`)
Matthew Rocklin
* Core
+ Replace get= keyword with scheduler= keyword (:pr:`3448`)
Matthew Rocklin
+ Add centralized dask.config module to handle configuration for
all Dask subprojects (:pr:`3432`) (:pr:`3513`) (:pr:`3520`)
Matthew Rocklin
+ Add dask-ssh CLI Options and Description. (:pr:`3476`) @beomi
+ Read whole files fix regardless of header for HTTP (:pr:`3496`)
Martin Durant
+ Adds synchronous scheduler syntax to debugging docs (:pr:`3509`)
James Bourbeau
+ Replace dask.set_options with dask.config.set (:pr:`3502`)
Matthew Rocklin
+ Update sphinx readthedocs-theme (:pr:`3516`) Matthew Rocklin
+ Introduce "auto" value for normalize_chunks (:pr:`3507`) Matthew
Rocklin
+ Fix check in configuration with env=None (:pr:`3562`) Simon
Perkins
+ Update sizeof definitions (:pr:`3582`) Matthew Rocklin
+ Remove --verbose flag from travis-ci (:pr:`3477`) Matthew
Rocklin
+ Remove "da.random" from random array keys (:pr:`3604`) Matthew
Rocklin
* Mon May 21 2018 arun@gmx.de
- update to version 0.17.5:
* Compatibility with pandas 0.23.0 (:pr:`3499`) Tom Augspurger
* Sun May 06 2018 arun@gmx.de
- update to version 0.17.4:
* Dataframe
+ Add support for indexing Dask DataFrames with string subclasses
(:pr:`3461`) James Bourbeau
+ Allow using both sorted_index and chunksize in read_hdf
(:pr:`3463`) Pierre Bartet
+ Pass filesystem to arrow piece reader (:pr:`3466`) Martin Durant
+ Switches to using dask.compat string_types (#3462) James
Bourbeau
- changes from version 0.17.3:
* Array
+ Add einsum for Dask Arrays (:pr:`3412`) Simon Perkins
+ Add piecewise for Dask Arrays (:pr:`3350`) John A Kirkham
+ Fix handling of nan in broadcast_shapes (:pr:`3356`) John A
Kirkham
+ Add isin for dask arrays (:pr:`3363`). Stephan Hoyer
+ Overhauled topk for Dask Arrays: faster algorithm, particularly
for large k's; added support for multiple axes, recursive
aggregation, and an option to pick the bottom k elements
instead. (:pr:`3395`) Guido Imperiale
+ The topk API has changed from topk(k, array) to the more
conventional topk(array, k). The legacy API still works but is
now deprecated. (:pr:`2965`) Guido Imperiale
+ New function argtopk for Dask Arrays (:pr:`3396`) Guido
Imperiale
+ Fix handling partial depth and boundary in map_overlap
(:pr:`3445`) John A Kirkham
+ Add gradient for Dask Arrays (:pr:`3434`) John A Kirkham
* DataFrame
+ Allow t as shorthand for table in to_hdf for pandas
compatibility (:pr:`3330`) Jörg Dietrich
+ Added top level isna method for Dask DataFrames (:pr:`3294`)
Christopher Ren
+ Fix selection on partition column on read_parquet for
engine="pyarrow" (:pr:`3207`) Uwe Korn
+ Added DataFrame.squeeze method (:pr:`3366`) Christopher Ren
+ Added infer_divisions option to read_parquet to specify whether
read engines should compute divisions (:pr:`3387`) Jon Mease
+ Added support for inferring division for engine="pyarrow"
(:pr:`3387`) Jon Mease
+ Provide more informative error message for meta= errors
(:pr:`3343`) Matthew Rocklin
+ add orc reader (:pr:`3284`) Martin Durant
+ Default compression for parquet now always Snappy, in line with
pandas (:pr:`3373`) Martin Durant
+ Fixed bug in Dask DataFrame and Series comparisons with NumPy
scalars (:pr:`3436`) James Bourbeau
+ Remove outdated requirement from repartition docstring
(:pr:`3440`) Jörg Dietrich
+ Fixed bug in aggregation when only a Series is selected
(:pr:`3446`) Jörg Dietrich
+ Add default values to make_timeseries (:pr:`3421`) Matthew
Rocklin
* Core
+ Support traversing collections in persist, visualize, and
optimize (:pr:`3410`) Jim Crist
+ Add schedule= keyword to compute and persist. This replaces
common use of the get= keyword (:pr:`3448`) Matthew Rocklin
* Sat Mar 24 2018 arun@gmx.de
- update to version 0.17.2:
* Array
+ Add broadcast_arrays for Dask Arrays (:pr:`3217`) John A Kirkham
+ Add bitwise_* ufuncs (:pr:`3219`) John A Kirkham
+ Add optional axis argument to squeeze (:pr:`3261`) John A
Kirkham
+ Validate inputs to atop (:pr:`3307`) Matthew Rocklin
+ Avoid calls to astype in concatenate if all parts have the same
dtype (:pr:`3301`) `Martin Durant`_
* DataFrame
+ Fixed bug in shuffle due to aggressive truncation (:pr:`3201`)
Matthew Rocklin
+ Support specifying categorical columns on read_parquet with
categories=[…] for engine="pyarrow" (:pr:`3177`) Uwe Korn
+ Add dd.tseries.Resampler.agg (:pr:`3202`) Richard Postelnik
+ Support operations that mix dataframes and arrays (:pr:`3230`)
Matthew Rocklin
+ Support extra Scalar and Delayed args in
dd.groupby._Groupby.apply (:pr:`3256`) Gabriele Lanaro
* Bag
+ Support joining against single-partitioned bags and delayed
objects (:pr:`3254`) Matthew Rocklin
* Core
+ Fixed bug when using unexpected but hashable types for keys
(:pr:`3238`) Daniel Collins
+ Fix bug in task ordering so that we break ties consistently with
the key name (:pr:`3271`) Matthew Rocklin
+ Avoid sorting tasks in order when the number of tasks is very
large (:pr:`3298`) Matthew Rocklin
* Fri Mar 02 2018 sebix+novell.com@sebix.at
- correctly package bytecode
- use %license macro
* Fri Feb 23 2018 arun@gmx.de
- update to version 0.17.1:
* Array
+ Corrected dimension chunking in indices (:issue:`3166`,
:pr:`3167`) Simon Perkins
+ Inline store_chunk calls for store's return_stored option
(:pr:`3153`) John A Kirkham
+ Compatibility with struct dtypes for NumPy 1.14.1 release
(:pr:`3187`) Matthew Rocklin
* DataFrame
+ Bugfix to allow column assignment of pandas
datetimes(:pr:`3164`) Max Epstein
* Core
+ New file-system for HTTP(S), allowing direct loading from
specific URLs (:pr:`3160`) `Martin Durant`_
+ Fix bug when tokenizing partials with no keywords (:pr:`3191`)
Matthew Rocklin
+ Use more recent LZ4 API (:pr:`3157`) `Thrasibule`_
+ Introduce output stream parameter for progress bar (:pr:`3185`)
`Dieter Weber`_
* Sat Feb 10 2018 arun@gmx.de
- update to version 0.17.0:
* Array
+ Added a support object-type arrays for nansum, nanmin, and
nanmax (:issue:`3133`) Keisuke Fujii
+ Update error handling when len is called with empty chunks
(:issue:`3058`) Xander Johnson
+ Fixes a metadata bug with store's return_stored option
(:pr:`3064`) John A Kirkham
+ Fix a bug in optimization.fuse_slice to properly handle when
first input is None (:pr:`3076`) James Bourbeau
+ Support arrays with unknown chunk sizes in percentile
(:pr:`3107`) Matthew Rocklin
+ Tokenize scipy.sparse arrays and np.matrix (:pr:`3060`) Roman
Yurchak
* DataFrame
+ Support month timedeltas in repartition(freq=...) (:pr:`3110`)
Matthew Rocklin
+ Avoid mutation in dataframe groupby tests (:pr:`3118`) Matthew
Rocklin
+ read_csv, read_table, and read_parquet accept iterables of paths
(:pr:`3124`) Jim Crist
+ Deprecates the dd.to_delayed function in favor of the existing
method (:pr:`3126`) Jim Crist
+ Return dask.arrays from df.map_partitions calls when the UDF
returns a numpy array (:pr:`3147`) Matthew Rocklin
+ Change handling of columns and index in dd.read_parquet to be
more consistent, especially in handling of multi-indices
(:pr:`3149`) Jim Crist
+ fastparquet append=True allowed to create new dataset
(:pr:`3097`) `Martin Durant`_
+ dtype rationalization for sql queries (:pr:`3100`) `Martin
Durant`_
* Bag
+ Document bag.map_paritions function may recieve either a list or
generator. (:pr:`3150`) Nir
* Core
+ Change default task ordering to prefer nodes with few dependents
and then many downstream dependencies (:pr:`3056`) Matthew
Rocklin
+ Add color= option to visualize to color by task order
(:pr:`3057`) (:pr:`3122`) Matthew Rocklin
+ Deprecate dask.bytes.open_text_files (:pr:`3077`) Jim Crist
+ Remove short-circuit hdfs reads handling due to maintenance
costs. May be re-added in a more robust manner later
(:pr:`3079`) Jim Crist
+ Add dask.base.optimize for optimizing multiple collections
without computing. (:pr:`3071`) Jim Crist
+ Rename dask.optimize module to dask.optimization (:pr:`3071`)
Jim Crist
+ Change task ordering to do a full traversal (:pr:`3066`) Matthew
Rocklin
+ Adds an optimize_graph keyword to all to_delayed methods to
allow controlling whether optimizations occur on
conversion. (:pr:`3126`) Jim Crist
+ Support using pyarrow for hdfs integration (:pr:`3123`) Jim
Crist
+ Move HDFS integration and tests into dask repo (:pr:`3083`) Jim
Crist
+ Remove write_bytes (:pr:`3116`) Jim Crist
* Thu Jan 11 2018 arun@gmx.de
- specfile:
* update copyright year
- update to version 0.16.1:
* Array
+ Fix handling of scalar percentile values in "percentile"
(:pr:`3021`) `James Bourbeau`_
+ Prevent "bool()" coercion from calling compute (:pr:`2958`)
`Albert DeFusco`_
+ Add "matmul" (:pr:`2904`) `John A Kirkham`_
+ Support N-D arrays with "matmul" (:pr:`2909`) `John A Kirkham`_
+ Add "vdot" (:pr:`2910`) `John A Kirkham`_
+ Explicit "chunks" argument for "broadcast_to" (:pr:`2943`)
`Stephan Hoyer`_
+ Add "meshgrid" (:pr:`2938`) `John A Kirkham`_ and (:pr:`3001`)
`Markus Gonser`_
+ Preserve singleton chunks in "fftshift"/"ifftshift" (:pr:`2733`)
`John A Kirkham`_
+ Fix handling of negative indexes in "vindex" and raise errors
for out of bounds indexes (:pr:`2967`) `Stephan Hoyer`_
+ Add "flip", "flipud", "fliplr" (:pr:`2954`) `John A Kirkham`_
+ Add "float_power" ufunc (:pr:`2962`) (:pr:`2969`) `John A
Kirkham`_
+ Compatability for changes to structured arrays in the upcoming
NumPy 1.14 release (:pr:`2964`) `Tom Augspurger`_
+ Add "block" (:pr:`2650`) `John A Kirkham`_
+ Add "frompyfunc" (:pr:`3030`) `Jim Crist`_
* DataFrame
+ Fixed naming bug in cumulative aggregations (:issue:`3037`)
`Martijn Arts`_
+ Fixed "dd.read_csv" when "names" is given but "header" is not
set to "None" (:issue:`2976`) `Martijn Arts`_
+ Fixed "dd.read_csv" so that passing instances of
"CategoricalDtype" in "dtype" will result in known categoricals
(:pr:`2997`) `Tom Augspurger`_
+ Prevent "bool()" coercion from calling compute (:pr:`2958`)
`Albert DeFusco`_
+ "DataFrame.read_sql()" (:pr:`2928`) to an empty database tables
returns an empty dask dataframe `Apostolos Vlachopoulos`_
+ Compatability for reading Parquet files written by PyArrow 0.8.0
(:pr:`2973`) `Tom Augspurger`_
+ Correctly handle the column name (`df.columns.name`) when
reading in "dd.read_parquet" (:pr:2973`) `Tom Augspurger`_
+ Fixed "dd.concat" losing the index dtype when the data contained
a categorical (:issue:`2932`) `Tom Augspurger`_
+ Add "dd.Series.rename" (:pr:`3027`) `Jim Crist`_
+ "DataFrame.merge()" (:pr:`2960`) now supports merging on a
combination of columns and the index `Jon Mease`_
+ Removed the deprecated "dd.rolling*" methods, in preperation for
their removal in the next pandas release (:pr:`2995`) `Tom
Augspurger`_
+ Fix metadata inference bug in which single-partition series were
mistakenly special cased (:pr:`3035`) `Jim Crist`_
+ Add support for "Series.str.cat" (:pr:`3028`) `Jim Crist`_
* Core
+ Improve 32-bit compatibility (:pr:`2937`) `Matthew Rocklin`_
+ Change task prioritization to avoid upwards branching
(:pr:`3017`) `Matthew Rocklin`_
* Sun Nov 19 2017 arun@gmx.de
- update to version 0.16.0:
* Fix install of fastparquet on travis (#2897)
* Fix port for bokeh dashboard (#2889)
* fix hdfs3 version
* Modify hdfs import to point to hdfs3 (#2894)
* Explicitly pass in pyarrow filesystem for parquet (#2881)
* COMPAT: Ensure lists for multiple groupby keys (#2892)
* Avoid list index error in repartition_freq (#2873)
* Finish moving `infer_storage_options` (#2886)
* Support arrow in `to_parquet`. Several other parquet
cleanups. (#2868)
* Bugfix: Filesystem object not passed to pyarrow reader (#2527)
* Fix py34 build
* Fixup s3 tests (#2875)
* Close resource profiler process on __exit__ (#2871)
* Add changelog for to_parquet changes. [ci skip]
* A few parquet cleanups (#2867)
* Fixed fillna with Series (#2810)
* Error nicely on parse dates failure in read_csv (#2863)
* Fix empty dataframe partitioning for numpy 1.10.4 (#2862)
* Test `unique`'s inverse mapping's shape (#2857)
* Move `thread_state` out of the top namespace (#2858)
* Explain unique's steps (#2856)
* fix and test for issue #2811 (#2818)
* Minor tweaks to `_unique_internal` optional result handling
(#2855)
* Update dask interface during XArray integration (#2847)
* Remove unnecessary map_partitions in aggregate (#2712)
* Simplify `_unique_internal` (#2850)
* Add more tests for read_parquet(engine='pyarrow') (#2822)
* Do not raise exception when calling set_index on empty dataframe
[#2819] (#2827)
* Test unique on more data (#2846)
* Do not except on set_index on text column with empty partitions
[#2820] (#2831)
* Compat for bokeh 0.12.10 (#2844)
* Support `return_*` arguments with `unique` (#2779)
* Fix installing of pandas dev (#2838)
* Squash a few warnings in dask.array (#2833)
* Array optimizations don't elide some getter calls (#2826)
* test against pandas rc (#2814)
* df.astype(categorical_dtype) -> known categoricals (#2835)
* Fix cloudpickle test (#2836)
* BUG: Quantile with missing data (#2791)
* API: remove dask.async (#2828)
* Adds comma to flake8 section in setup.cfg (#2817)
* Adds asarray and asanyarray to the dask.array public API (#2787)
* flake8 now checks bare excepts (#2816)
* CI: Update for new flake8 / pycodestyle (#2808)
* Fix concat series bug (#2800)
* Typo in the docstring of read_parquet's filters param (#2806)
* Docs update (#2803)
* minor doc changes in bag.core (#2797)
* da.random.choice works with array args (#2781)
* Support broadcasting 0-length dimensions (#2784)
* ResourceProfiler plot works with single point (#2778)
* Implement Dask Array's unique to be lazy (#2775)
* Dask Collection Interface
* Reduce test memory usage (#2782)
* Deprecate vnorm (#2773)
* add auto-import of gcsfs (#2776)
* Add allclose (#2771)
* Remove `random.different_seeds` from API docs (#2772)
* Follow-up for atleast_nd (#2765)
* Use get_worker().client.get if available (#2762)
* Link PR for "Allow tuples as sharedict keys" (#2766)
* Allow tuples as sharedict keys (#2763)
* update docs to use flatten vs concat (#2764)
* Add atleast_nd functions (#2760)
* Consolidate changelog for 0.15.4 (#2759)
* Add changelog template for future date (#2758)