Package Release Info

openblas_pthreads-0.3.29-160000.2.2

Update Info: Base Release
Available in Package Hub : 16.0

platforms

AArch64
ppc64le
s390x
x86-64

subpackages

openblas_pthreads-tests

Change Logs

* Fri May 30 2025 rguenther@suse.com
- For SLES16 target POWER9 instead of POWER8 which fixes the
  issue with the reported sgemm testsuite fails.  [bsc#1239545]
* Mon Mar 17 2025 eich@suse.com
- Disable sgemmt and dgemmt tests in the test suite on power
  when gcc-13 is used. It is known (bsc#1239134) that some
  of these tests fail on this architecture when OpenBLAS
  is being build with the said compiler version ever since
  these tests were introduced.
  With this will essentially restore the situation of the
  version prior to the adition of these tests (0.3.26) where
  one was unaware of the problem.
  This is only a temporary measure and will be removed once
  the issue with gcc-13 has been resolved.
- Remove: Link-library-with-z-noexecstack.patch
  since `-Wa,--noexecstack -Wl,-z,noexecstack` are global options,
  now.
* Fri Mar 14 2025 eich@suse.com
- Use upstream patch for bsc#1239134 which is more friendly to the
  non-affected power9 and power10 sub-architectures:
  Replace:
  Revert-ba47c7f4f301aad100ed166de338b86e01da8465.patch
  by:
  Restore-the-non-vectorized-code-from-before-PR4880-for-POWER8.patch
* Sat Mar 08 2025 eich@suse.com
- Revert  commit ba47c7f4f301aad100ed166de338b86e01da8465 to
  prevent failures on Power8 (bsc#1239134)
  * Add: Revert-ba47c7f4f301aad100ed166de338b86e01da8465.patch
- Add a script to run tests.
- Add bisect support.
* Wed Mar 05 2025 eich@suse.com
- Update to version 0.2.29 (jsc#PED-9676):
  General:
  * Fixed a potential NULL pointer dereference in multithreaded builds.
  * Added function aliases for `GEMMT` using its new name `GEMMTR`
    adopted by Reference-BLAS.
  * Fixed the behavior of the recently added `CBLAS_?GEMMT` functions
    with row-major data.
  * Improved thread scaling of multithreaded `SBGEMV`.
  * Improved thread scaling of multithreaded `TRTRI`.
  * Fixed compilation of the CBLAS testsuite with gcc14 (and no
    Fortran compiler).
  * Fixed placement of the `-fopenmp` flag and libsuffix in the
    generated pkgconfig file.
  * Improved the `CMakeConfig` file generated by the Makefile build.
  * Fixed const-correctness of `cblas_?geadd` in `cblas.h`.
  * Fixed a potential inaccuracy in multithreaded BLAS3 calls.
  * Fixed empty implementations of `get`/`set_affinity` that print a
    warning in OpenMP builds.
  * Fixed function signatures for TRTRS in the converted C version of
    LAPACK.
  * Fixed omission of several single-precision LAPACK symbols in the
    shared library.
  * Improved build instructions for the provided "pybench" benchmarks.
  * Improved documentation, including descriptions of environment
    variables that affect build and runtime behavior.
  * Added a separate "make install_tests" target for use with
    cross-compilations.
  * Integrated improvements and corrections from Reference-LAPACK:
  - removed a comparison in LAPACKE `?tpmqrt` that is always false.
  - fixed the leading dimension for B in tests for GGEV.
  - replaced `the ?LARFT` functions with a recursive implementation.
  arm64:
  * Fixed a long-standing bug in the (generic) `c`/`zgemm_beta` kernel
    that could lead to reads and writes outside the array bounds in some
    circumstances.
  * Rewrote cpu autodetection to scan all cores and return the highest
    performing type.
  * Improved the DGEMM performance for SVE targets and small matrix sizes.
  * improved dimension criteria for forwarding from `GEMM` to `GEMV`
    kernels.
  * Added SVE kernels for `ROT` and `SWAP`.
  * Improved SVE kernels for `SGEMV` and `DGEMV` on `A64FX` and
    `NEOVERSEV1`.
  * Fixed NRM2 implementations for generic SVE targets and the Neoverse N2.
  x86_64:
  * Fixed a wrong storage size in the SBGEMV kernel for Cooper Lake.
  * Added cpu autodetection for Intel Granite Rapids.
  * Added cpu autodetection for AMD Ryzen 5 series.
  * Added optimized `SOMATCOPY_CT` for AVX-capable targets.
  * fixed the fallback implementation of `GEMM3M` in GENERIC builds.
  Power:
  * Fixed multithreaded `SBGEMM`.
  * Fixed a CMake build problem on POWER10.
  * Improved the performance of SGEMV.
  * Added vectorized implementations of `SBGEMV` and support for
    forwarding 1xN `SBGEMM` to them.
  * Fixed illegal instructions and potential memory overflow in SGEMM
    on PPCG4.
  * Fixed handling of NaN and Inf arguments in `SSCAL` and `DSCAL` on
    PPC440,G4 and 970.
  * Added improved `CGEMM` and `ZGEMM` kernels for POWER10.
  Riscv64:
  * Removed thread yielding overhead caused by `sched_yield`.
  * Replaced some non-standard intrinsics with their official names.
  * Fixed and sped up the implementations of `CGEMM`/`ZGEMM` `TCOPY`
    for vector lenghts 128 and 256.
  * Improved the performance of `SNRM2`/`DNRM2` for RVV1.0 targets.
  * Added optimized `?OMATCOPY_CN` kernels for RVV1.0 targets.
- Add test package.
- Add flags: `-Wa,--noexecstack -Wl,-z,noexecstack` to make sure
  stack is not executable. This works around problems in assembler
  code for z.
- Make stack of empty cpuid.S non-executable as well.
* Wed Mar 05 2025 eich@suse.com
- Set gcc versions for ppc64le (bsc#1239702)
  * on SLE-15-SP6: v13
  * on SLE-15-SP7: v14
* Mon Feb 03 2025 schwab@suse.de
- Disable LTO on riscv64 due to GCC#110812
* Thu Jan 02 2025 eich@suse.com
- Update to version 0.3.28 (jsc#PED-9676):
  * General:
    + Reworked the unfinished implementation of `HUGETLB` from GotoBLAS
    for allocating huge memory pages as buffers on suitable systems.
    + Changed the unfinished implementation of `GEMM3M` for the generic
    target on all architectures to at least forward to regular GEMM.
    + Improved multithreaded `GEMM` performance for large non-skinny
    matrices.
    + Improved BLAS3 performance on larger multicore systems through
    improved parallelism.
    + Improved performance of the initial memory allocation by reducing
    locking overhead.
    + Improved performance of `GBMV` at small problem sizes by introducing
    a size barrier for the switch to multithreading.
    + Added an implementation of the `CBLAS_GEMM_BATCH` extension.
    + Fixed corner cases involving the handling of NAN and INFINITY
    arguments in `?SCAL` on all architectures.
    + Fixed NAN handling and potential accuracy issues in compilations
    with Intel ICX by supplying a suitable fp-model option by default.
    + It is now possible to register a callback function that replaces
    the built-in support for multithreading with an external backend
    like TBB (`openblas_set_threads_callback_function`).
    + Fixed potential duplication of suffixes in shared library naming.
    + Improved C compiler detection by the build system to tolerate
    more naming variants for gcc builds.
    + Fixed an unnecessary dependency of the utest on CBLAS.
    + Fixed spurious error reports from the BLAS extensions `utest`.
    + Fixed unwanted invocation of the `GEMM3M` tests in cross-
    compilation.
    + Fixed a flaw in the makefile build that could lead to the
    pkgconfig file containing an entry of `UNKNOWN` for the target
    cpu after installing.
    + Integrated fixes from the Reference-LAPACK project:
  - Fixed uninitialized variables in the LAPACK tests for `?QP3RK`.
  - Fixed potential bounds error in `?UNHR_COL`/`?ORHR_COL`.
  - Fixed potential infinite loop in the LAPACK testsuite.
  - Make the variable type used for hidden length arguments
    configurable.
    + Fixed `SYTRD` workspace computation and various typos.
    + Prevent compiler use of FMA that could increase numerical
    error in `?GEEVX`.
  * x86-64:
    + Fixed a potential thread buffer overrun in `SBSTOBF16` on small
    systems.
    + Fixed an accuracy issue in `ZSCAL` introduced in 0.3.26.
    + Added support for Intel Emerald Rapids and Meteor Lake CPUs.
    + Added autodetection support for the Zhaoxin KX-7000 CPU.
    + Fixed autodetection of Intel Prescott (probably broken
    since 0.3.19).
    + Fixed compilation of the converter-generated C versions
    of the LAPACK sources with gcc-14.
    + Added support for supplying the L2 cache size via an
    environment variable (`OPENBLAS_L2_SIZE`) in case it is not
    correctly reported (as in some VM configurations).
    + Improved the error message shown when thread creation fails
    on startup.
  * arm64:
    + Added a fast path forwarding `SGEMM` and `DGEMM` calls with a
    1xN or Mx1 matrix to the corresponding `GEMV` kernel.
    + Added optimized `SGEMV` and `DGEMV` kernels for A64FX.
    + Added optimized SVE kernels for small-matrix `GEMM`.
    + Added A64FX to the CPU list for DYNAMIC_ARCH.
    + Fixed building with support for CPU affinity.
    + Worked around accuracy problems with `C/ZNRM2` on NeoverseN1
    targets.
    + Improved GEMM performance on Neoverse V1.
    + Fixed compilation for `NEOVERSEN2` with older compilers.
    + Fixed potential miscompilation of the SVE `SDOT` and `DDOT`
    kernels.
    + Fixed potential miscompilation of the non-SVE `CDOT` and
    `ZDOT` kernels.
    + Fixed a potential overflow when using very large user-defined
    `BUFFERSIZE`.
  * Power:
    + Added a fast path forwarding `SGEMM` and `DGEMM` calls with a 1xN
    or Mx1 matrix to the corresponding `GEMV` kernel.
    + Significantly improved performance of `SBGEMM`. on POWER10.
    + Fixed compilation with OpenMP and the XLF compiler.
    + Fixed building of parts of the LAPACK testsuite with XLF.
    + Fixed CSWAP/ZSWAP on big-endian POWER10 targets.
    + Fixed a performance regression in SAXPY on POWER10 with OpenXL.
    + Fixed a potential overflow when using very large user-defined
    `BUFFERSIZE`.
    + Fixed an accuracy issue in the POWER6 kernels for `GEMM` and
    `GEMV`.
  * RISCV64:
    + Added a fast path forwarding `SGEMM` and `DGEMM` calls with a
    1xN or Mx1 matrix to the corresponding GEMV kernel.
    + Wdded `DYNAMIC_ARCH` support (comprising `GENERIC_RISCV64` and
    the two RVV 1.0 targets with vector length of 128 and 256).
    + Worked around the `ZVL128B` kernels for `AXPBY` mishandling the
    special case of zero Y increment.
- Obsoleted: no-static.patch.
* Mon Jul 29 2024 eich@suse.com
- Duplicate all options passed to `make` also to `make install`:
  The openblas build output suggests this: 'Note that any flags
  passed to make during build should also be passed to make install
  to circumvent any install errors'.
  This also makes sure that minimum CPU requirement is set in
  the pkgconfig file is the same one as used for building.
  This helps to maintain a reproducible build (boo#1228177).
* Thu Jun 13 2024 schwab@suse.de
- no-static.patch: do not link statically