* Mon Apr 23 2018 toddrme2178@gmail.com
- Fix dependency versions
* Fri Mar 02 2018 arun@gmx.de
- specfile:
* update required llvmlite version
- update to version 0.37.0:
* Misc enhancements:
+ PR #2627: Remove hacks to make llvmlite threadsafe
+ PR #2672: Add ascontiguousarray
+ PR #2678: Add Gitter badge
+ PR #2691: Fix #2690: add intrinsic to convert array to tuple
+ PR #2703: Test runner feature: failed-first and last-failed
+ PR #2708: Patch for issue #1907
+ PR #2732: Add support for array.fill
* Misc Fixes:
+ PR #2610: Fix #2606 lowering of optional.setattr
+ PR #2650: Remove skip for win32 cosine test
+ PR #2668: Fix empty_like from readonly arrays.
+ PR #2682: Fixes 2210, remove _DisableJitWrapper
+ PR #2684: Fix #2340, generator error yielding bool
+ PR #2693: Add travis-ci testing of NumPy 1.14, and also check on
Python 2.7
+ PR #2694: Avoid type inference failure due to a typing template
rejection
+ PR #2695: Update llvmlite version dependency.
+ PR #2696: Fix tuple indexing codegeneration for empty tuple
+ PR #2698: Fix #2697 by deferring deletion in the simplify_CFG
loop.
+ PR #2701: Small fix to avoid tempfiles being created in the
current directory
+ PR #2725: Fix 2481, LLVM IR parsing error due to mutated IR
+ PR #2726: Fix #2673: incorrect fork error msg.
+ PR #2728: Alternative to #2620. Remove dead code
ByteCodeInst.get.
+ PR #2730: Add guard for test needing SciPy/BLAS
* Documentation updates:
+ PR #2670: Update communication channels
+ PR #2671: Add docs about diagnosing loop vectorizer
+ PR #2683: Add docs on const arg requirements and on const mem
alloc
+ PR #2722: Add docs on numpy support in cuda
+ PR #2724: Update doc: warning about unsupported arguments
* ParallelAccelerator enhancements/fixes:
+ Parallel support for `np.arange` and `np.linspace`, also
`np.mean`, `np.std` and `np.var` are added. This was performed
as part of a general refactor and cleanup of the core ParallelAccelerator code.
+ PR #2674: Core pa
+ PR #2704: Generate Dels after parfor sequential lowering
+ PR #2716: Handle matching directly supported functions
* CUDA enhancements:
+ PR #2665: CUDA DeviceNDArray: Support numpy tranpose API
+ PR #2681: Allow Assigning to DeviceNDArrays
+ PR #2702: Make DummyArray do High Dimensional Reshapes
+ PR #2714: Use CFFI to Reuse Code
* CUDA fixes:
+ PR #2667: Fix CUDA DeviceNDArray slicing
+ PR #2686: Fix #2663: incorrect offset when indexing cuda array.
+ PR #2687: Ensure Constructed Stream Bound
+ PR #2706: Workaround for unexpected warp divergence due to
exception raising code
+ PR #2707: Fix regression: cuda test submodules not loading
properly in runtests
+ PR #2731: Use more challenging values in slice tests.
+ PR #2720: A quick testsuite fix to not run the new cuda testcase
in the multiprocess pool
* Thu Jan 11 2018 toddrme2178@gmail.com
- Bump minimum llvmlite version.
* Thu Dec 21 2017 arun@gmx.de
- update to version 0.36.2:
* PR #2645: Avoid CPython bug with "exec" in older 2.7.x.
* PR #2652: Add support for CUDA 9.
* Fri Dec 08 2017 arun@gmx.de
- update to version 0.36.1:
* ParallelAccelerator features:
+ PR #2457: Stencil Computations in ParallelAccelerator
+ PR #2548: Slice and range fusion, parallelizing bitarray and
slice assignment
+ PR #2516: Support general reductions in ParallelAccelerator
* ParallelAccelerator fixes:
+ PR #2540: Fix bug #2537
+ PR #2566: Fix issue #2564.
+ PR #2599: Fix nested multi-dimensional parfor type inference
issue
+ PR #2604: Fixes for stencil tests and cmath sin().
+ PR #2605: Fixes issue #2603.
* PR #2568: Update for LLVM 5
* PR #2607: Fixes abort when getting address to
"nrt_unresolved_abort"
* PR #2615: Working towards conda build 3
* Misc fixes/enhancements:
+ PR #2534: Add tuple support to np.take.
+ PR #2551: Rebranding fix
+ PR #2552: relative doc links
+ PR #2570: Fix issue #2561, handle missing successor on loop exit
+ PR #2588: Fix #2555. Disable libpython.so linking on linux
+ PR #2601: Update llvmlite version dependency.
+ PR #2608: Fix potential cache file collision
+ PR #2612: Fix NRT test failure due to increased overhead when
running in coverage
+ PR #2619: Fix dubious pthread_cond_signal not in lock
+ PR #2622: Fix `np.nanmedian` for all NaN case.
+ PR #2633: Fix markdown in CONTRIBUTING.md
+ PR #2635: Make the dependency on compilers for AOT optional.
* CUDA support fixes:
+ PR #2523: Fix invalid cuda context in memory transfer calls in
another thread
+ PR #2575: Use CPU to initialize xoroshiro states for GPU
RNG. Fixes #2573
+ PR #2581: Fix cuda gufunc mishandling of scalar arg as array and
out argument
* Tue Oct 03 2017 arun@gmx.de
- update to version 0.35.0:
* ParallelAccelerator:
+ PR #2400: Array comprehension
+ PR #2405: Support printing Numpy arrays
+ PR #2438: from Support more np.random functions in
ParallelAccelerator
+ PR #2482: Support for sum with axis in nopython mode.
+ PR #2487: Adding developer documentation for ParallelAccelerator
technology.
+ PR #2492: Core PA refactor adds assertions for broadcast
semantics
* ParallelAccelerator fixes:
+ PR #2478: Rename cfg before parfor translation (#2477)
+ PR #2479: Fix broken array comprehension tests on unsupported
platforms
+ PR #2484: Fix array comprehension test on win64
+ PR #2506: Fix for 32-bit machines.
* Additional features of note:
+ PR #2490: Implement np.take and ndarray.take
+ PR #2493: Display a warning if parallel=True is set but not
possible.
+ PR #2513: Add np.MachAr, np.finfo, np.iinfo
+ PR #2515: Allow environ overriding of cpu target and cpu
features.
* Misc fixes/enhancements:
+ PR #2455: add contextual information to runtime errors
+ PR #2470: Fixes #2458, poor performance in np.median
+ PR #2471: Ensure LLVM threadsafety in {g,}ufunc building.
+ PR #2494: Update doc theme
+ PR #2503: Remove hacky code added in 2482 and feature
enhancement
+ PR #2505: Serialise env mutation tests during multithreaded
testing.
+ PR #2520: Fix failing cpu-target override tests
* CUDA support fixes:
+ PR #2504: Enable CUDA toolkit version testing
+ PR #2509: Disable tests generating code unavailable in lower CC
versions.
+ PR #2511: Fix Windows 64 bit CUDA tests.
- changes from version 0.34.0:
* ParallelAccelerator features:
+ PR #2318: Transfer ParallelAccelerator technology to Numba
+ PR #2379: ParallelAccelerator Core Improvements
+ PR #2367: Add support for len(range(...))
+ PR #2369: List comprehension
+ PR #2391: Explicit Parallel Loop Support (prange)
* CUDA support enhancements:
+ PR #2377: New GPU reduction algorithm
* CUDA support fixes:
+ PR #2397: Fix #2393, always set alignment of cuda static memory
regions
* Misc Fixes:
+ PR #2373, Issue #2372: 32-bit compatibility fix for parfor
related code
+ PR #2376: Fix #2375 missing stdint.h for py2.7 vc9
+ PR #2378: Fix deadlock in parallel gufunc when kernel acquires
the GIL.
+ PR #2382: Forbid unsafe casting in bitwise operation
+ PR #2385: docs: fix Sphinx errors
+ PR #2396: Use 64-bit RHS operand for shift
+ PR #2404: Fix threadsafety logic issue in ufunc compilation
cache.
+ PR #2424: Ensure consistent iteration order of blocks for type
inference.
+ PR #2425: Guard code to prevent the use of ?parallel? on win32 +
py27
+ PR #2426: Basic test for Enum member type recovery.
+ PR #2433: Fix up the parfors tests with respect to windows py2.7
+ PR #2442: Skip tests that need BLAS/LAPACK if scipy is not
available.
+ PR #2444: Add test for invalid array setitem
+ PR #2449: Make the runtime initialiser threadsafe
+ PR #2452: Skip CFG test on 64bit windows
* Misc Enhancements:
+ PR #2366: Improvements to IR utils
+ PR #2388: Update README.rst to indicate the proper version of
LLVM
+ PR #2394: Upgrade to llvmlite 0.19.*
+ PR #2395: Update llvmlite version to 0.19
+ PR #2406: Expose environment object to ufuncs
+ PR #2407: Expose environment object to target-context inside
lowerer
+ PR #2413: Add flags to pass through to conda build for buildbot
+ PR #2414: Add cross compile flags to local recipe
+ PR #2415: A few cleanups for rewrites
+ PR #2418: Add getitem support for Enum classes
+ PR #2419: Add support for returning enums in vectorize
+ PR #2421: Add copyright notice for Intel contributed files.
+ PR #2422: Patch code base to work with np 1.13 release
+ PR #2448: Adds in warning message when using ?parallel? if
cache=True
+ PR #2450: Add test for keyword arg on .sum-like and .cumsum-like
array methods
- changes from version 0.33.0:
* There are also several enhancements to the CUDA GPU support:
+ A GPU random number generator based on xoroshiro128+ algorithm
is added. See details and examples in documentation.
+ @cuda.jit CUDA kernels can now call @jit and @njit CPU functions
and they will automatically be compiled as CUDA device
functions.
+ CUDA IPC memory API is exposed for sharing memory between
proceses. See usage details in documentation.
* Reference counting enhancements:
+ PR #2346, Issue #2345, #2248: Add extra refcount pruning after
inlining
+ PR #2349: Fix refct pruning not removing refct op with tail
call.
+ PR #2352, Issue #2350: Add refcount pruning pass for function
that does not need refcount
* CUDA support enhancements:
+ PR #2023: Supports CUDA IPC for device array
+ PR #2343, Issue #2335: Allow CPU jit decorated function to be
used as cuda device function
+ PR #2347: Add random number generator support for CUDA device
code
+ PR #2361: Update autotune table for CC: 5.3, 6.0, 6.1, 6.2
* Misc fixes:
+ PR #2362: Avoid test failure due to typing to int32 on 32-bit
platforms
+ PR #2359: Fixed nogil example that threw a TypeError when
executed.
+ PR #2357, Issue #2356: Fix fragile test that depends on how the
script is executed.
+ PR #2355: Fix cpu dispatcher referenced as attribute of another
module
+ PR #2354: Fixes an issue with caching when function needs NRT
and refcount pruning
+ PR #2342, Issue #2339: Add warnings to inspection when it is
used on unserialized cached code
+ PR #2329, Issue #2250: Better handling of missing op codes
* Misc enhancements:
+ PR #2360: Adds missing values in error mesasge interp.
+ PR #2353: Handle when get_host_cpu_features() raises
RuntimeError
+ PR #2351: Enable SVML for erf/erfc/gamma/lgamma/log2
+ PR #2344: Expose error_model setting in jit decorator
+ PR #2337: Align blocking terminate support for fork() with new
TBB version
+ PR #2336: Bump llvmlite version to 0.18
+ PR #2330: Core changes in PR #2318
* Wed May 03 2017 toddrme2178@gmail.com
- update to version 0.32.0:
+ Improvements:
* PR #2322: Suppress test error due to unknown but consistent error with tgamma
* PR #2320: Update llvmlite dependency to 0.17
* PR #2308: Add details to error message on why cuda support is disabled.
* PR #2302: Add os x to travis
* PR #2294: Disable remove_module on MCJIT due to memory leak inside LLVM
* PR #2291: Split parallel tests and recycle workers to tame memory usage
* PR #2253: Remove the pointer-stuffing hack for storing meminfos in lists
+ Fixes:
* PR #2331: Fix a bug in the GPU array indexing
* PR #2326: Fix #2321 docs referring to non-existing function.
* PR #2316: Fixing more race-condition problems
* PR #2315: Fix #2314. Relax strict type check to allow optional type.
* PR #2310: Fix race condition due to concurrent compilation and cache loading
* PR #2304: Fix intrinsic 1st arg not a typing.Context as stated by the docs.
* PR #2287: Fix int64 atomic min-max
* PR #2286: Fix #2285 `@overload_method` not linking dependent libs
* PR #2303: Missing import statements to interval-example.rst
- Implement single-spec version
* Wed Feb 22 2017 arun@gmx.de
- update to version 0.31.0:
* Improvements:
+ PR #2281: Update for numpy1.12
+ PR #2278: Add CUDA atomic.{max, min, compare_and_swap}
+ PR #2277: Add about section to conda recipies to identify
license and other metadata in Anaconda Cloud
+ PR #2271: Adopt itanium C++-style mangling for CPU and CUDA
targets
+ PR #2267: Add fastmath flags
+ PR #2261: Support dtype.type
+ PR #2249: Changes for llvm3.9
+ PR #2234: Bump llvmlite requirement to 0.16 and add
install_name_tool_fixer to mviewbuf for OS X
+ PR #2230: Add python3.6 to TravisCi
+ PR #2227: Enable caching for gufunc wrapper
+ PR #2170: Add debugging support
+ PR #2037: inspect_cfg() for easier visualization of the function
operation
* Fixes:
+ PR #2274: Fix nvvm ir patch in mishandling ?load?
+ PR #2272: Fix breakage to cuda7.5
+ PR #2269: Fix caching of copy_strides kernel in cuda.reduce
+ PR #2265: Fix #2263: error when linking two modules with dynamic
globals
+ PR #2252: Fix path separator in test
+ PR #2246: Fix overuse of memory in some system with fork
+ PR #2241: Fix #2240: __module__ in dynamically created function
not a str
+ PR #2239: Fix fingerprint computation failure preventing
fallback
* Sun Jan 15 2017 arun@gmx.de
- update to version 0.30.1:
* Fixes:
+ PR #2232: Fix name clashes with _Py_hashtable_xxx in Python 3.6.
* Improvements:
+ PR #2217: Add Intel TBB threadpool implementation for parallel
ufunc.
* Tue Jan 10 2017 arun@gmx.de
- specfile:
* update copyright year
- update to version 0.30.0:
* Improvements:
+ PR #2209: Support Python 3.6.
+ PR #2175: Support np.trace(), np.outer() and np.kron().
+ PR #2197: Support np.nanprod().
+ PR #2190: Support caching for ufunc.
+ PR #2186: Add system reporting tool.
* Fixes:
+ PR #2214, Issue #2212: Fix memory error with ndenumerate and
flat iterators.
+ PR #2206, Issue #2163: Fix zip() consuming extra elements in
early exhaustion.
+ PR #2185, Issue #2159, #2169: Fix rewrite pass affecting objmode
fallback.
+ PR #2204, Issue #2178: Fix annotation for liftedloop.
+ PR #2203: Fix Appveyor segfault with Python 3.5.
+ PR #2202, Issue #2198: Fix target context not initialized when
loading from ufunc cache.
+ PR #2172, Issue #2171: Fix optional type unpacking.
+ PR #2189, Issue #2188: Disable freezing of big (>1MB) global
arrays.
+ PR #2180, Issue #2179: Fix invalid variable version in
looplifting.
+ PR #2156, Issue #2155: Fix divmod, floordiv segfault on CUDA.
* Fri Dec 02 2016 jengelh@inai.de
- remove subjective words from description
* Sat Nov 05 2016 arun@gmx.de
- update to version 0.29.0:
* Improvements:
+ PR #2130, #2137: Add type-inferred recursion with docs and
examples.
+ PR #2134: Add np.linalg.matrix_power.
+ PR #2125: Add np.roots.
+ PR #2129: Add np.linalg.{eigvals,eigh,eigvalsh}.
+ PR #2126: Add array-to-array broadcasting.
+ PR #2069: Add hstack and related functions.
+ PR #2128: Allow for vectorizing a jitted function. (thanks to
@dhirschfeld)
+ PR #2117: Update examples and make them test-able.
+ PR #2127: Refactor interpreter class and its results.
* Fixes:
+ PR #2149: Workaround MSVC9.0 SP1 fmod bug kb982107.
+ PR #2145, Issue #2009: Fixes kwargs for jitclass __init__
method.
+ PR #2150: Fix slowdown in objmode fallback.
+ PR #2050, Issue #1258: Fix liveness problem with some generator
loops.
+ PR #2072, Issue #1995: Right shift of unsigned LHS should be
logical.
+ PR #2115, Issue #1466: Fix inspect_types() error due to mangled
variable name.
+ PR #2119, Issue #2118: Fix array type created from record-dtype.
+ PR #2122, Issue #1808: Fix returning a generator due to
datamodel error.
* Fri Sep 23 2016 toddrme2178@gmail.com
- Initial version