* Wed Mar 30 2022 Egbert Eich <eich@suse.com>
- Build PPC64LE libraries with the lastest gcc available to
take advantage of instruction sets in later CPUs used in
the CPU specific kernels (jsc#SLE-18143, bsc#1197721).
For fortran use the stock compiler to avoid compatibility
issues between different versions of libfortran.
This is relevant for Leap/SLE only. It may be dropped once
gcc < 10 is no longer supported.
* Sun Feb 13 2022 Egbert Eich <eich@suse.com>
- Fixed bsc#1195232 for good: found and removed offending entry.
This reintroduces part of:
Thu Jul 8 12:35:35 UTC 2021 - Dominique Leuenberger <dimstar@opensuse.org>
- Do not create dummy symlinks on $self in /etc/alternatives: those
files are packages as %ghost and any real file existance only
confuses brp-checks, as it detects circular symlinks.
* Sun Jan 30 2022 Egbert Eich <eich@suse.com>
- Partly revert:
Thu Jul 8 12:35:35 UTC 2021 - Dominique Leuenberger <dimstar@opensuse.org>
- Do not create dummy symlinks on $self in /etc/alternatives: those
files are packages as %ghost and any real file existance only
confuses brp-checks, as it detects circular symlinks.
for all suse_versions < current Factory in an attempt to fix bsc#1195232.
* Mon Jul 26 2021 Andreas Schwab <schwab@suse.de>
- Use RISCV64_GENERIC for riscv64
- Add -ffat-lto-objects to get proper static archives
* Thu Jul 22 2021 Ismail Dönmez <ismail@i10z.com>
- Update to version 0.3.17
- Fixes regressions introduced in 0.3.16
See https://github.com/xianyi/OpenBLAS/releases/tag/v0.3.17 for
the complete changelog.
* Tue Jul 13 2021 Ismail Dönmez <ismail@i10z.com>
- Update to version 0.3.16
Please see https://github.com/xianyi/OpenBLAS/releases/tag/v0.3.15
and https://github.com/xianyi/OpenBLAS/releases/tag/v0.3.16
for the complete list of changes. A complete changelog is
also available in the installed Changelog.txt .
* Thu Jul 08 2021 Dominique Leuenberger <dimstar@opensuse.org>
- Do not create dummy symlinks on $self in /etc/alternatives: those
files are packages as %ghost and any real file existance only
confuses brp-checks, as it detects circular symlinks.
* Thu Mar 18 2021 Michel Normand <normand@linux.vnet.ibm.com>
- Update openblas-ppc64be_up2_p8.patch trimed by previous sr
(still need changes in Makefile.system)
* Thu Mar 18 2021 Ismail Dönmez <idonmez@suse.com>
- Update to version 0.3.14
common:
* Fixed a race condition on thread shutdown in non-OpenMP builds
* Fixed custom BUFFERSIZE option getting ignored in gmake builds
* Fixed CMAKE compilation of the TRMM kernels for GENERIC platforms
* Added CBLAS interfaces for CROTG, ZROTG, CSROT and ZDROT
* Improved performance of OMATCOPY_RT across all platforms
* Changed perl scripts to use env instead of a hardcoded /usr/bin/perl
* Fixed potential misreading of the GCC compiler version in the build scripts
* Fixed convergence problems in LAPACK complex GGEV/GGES (Reference-LAPACK #477)
* Reduced the stacksize requirements for running the LAPACK testsuite (Reference-LAPACK #335)
RISC V:
* Fixed compilation on RISCV (missing entry in getarch)
POWER:
* Fixed compilation for DYNAMIC_ARCH with clang and with older gcc versions
* Added support for compilation on FreeBSD/ppc64le
* Added optimized POWER10 kernels for SSCAL, DSCAL, CSCAL, ZSCAL
* Added optimized POWER10 kernels for SROT, DROT, CDOT, SASUM, DASUM
* Improved SSWAP, DSWAP, CSWAP, ZSWAP performance on POWER10
* Improved SCOPY and CCOPY performance on POWER10
* Improved SGEMM and DGEMM performance on POWER10
* Added support for compilation with the NVIDIA HPC compiler
x86_64:
* Added an optimized bfloat16 GEMM kernel for Cooperlake
* Added CPUID autodetection for Intel Rocket Lake and Tiger Lake cpus
* Improved the performance of SASUM,DASUM,SROT,DROT on AMD Ryzen cpus
* Added support for compilation with the NAG Fortran compiler
* Fixed recognition of the AMD AOCC compiler
* Fixed compilation for DYNAMIC_ARCH with clang on Windows
* Added support for running the BLAS/CBLAS tests on Windows
* Fixed signatures of the tls callback functions for Windows x64
* Fixed various issues with fma intrinsics support handling
ARM:
* Support compilation for embedded Cortex M4 targets via a new option EMBEDDED
ARM64:
* Fixed the THUNDERX2T99 and NEOVERSEN1 DNRM2/ZNRM2 kernels for inputs with Inf
* Added support for the DYNAMIC_LIST option
* Added support for compilation with the NVIDIA HPC compiler
* Added support for compiling with the NAG Fortran compiler
- Remove 0001-Require-gcc-11-for-builtin_cpu_is-power10.patch
0002-patch-to-support-power10-in-builtin_cpu_is-was-backp.patch
Upstream fixed in a different way.
* Thu Feb 04 2021 Michel Normand <normand@linux.vnet.ibm.com>
- Disable lto for ppc64le to avoid build failure (bsc#1181733)
- Add openblas-ppc64be_up2_p8.patch to avoid ppc64 (BE) build failure
Do not set BUILD_BFLOAT16 for ppc64 (BE) (same bug nb)
* Tue Feb 02 2021 Egbert Eich <eich@suse.com>
- BUILD_BFLOAT16=1 is not supported in s390(x) (bsc#1181522)
- Add:
* 0001-Require-gcc-11-for-builtin_cpu_is-power10.patch
* 0002-patch-to-support-power10-in-builtin_cpu_is-was-backp.patch:
Only gcc11 has builtin_cpu_is(power10) - fix build issue for ppc64
(bsc#1181522).
* Thu Dec 17 2020 Ismail Dönmez <idonmez@suse.com>
- Update to version 0.3.13
common:
* Added a generic bfloat16 SBGEMV kernel
* Fixed a potentially severe memory leak after fork in OpenMP builds
that was introduced in 0.3.12
* Added detection of the Fujitsu Fortran compiler
* Added detection of the (e)gfortran compiler on OpenBSD
* Added support for overriding the default name of the library independently
from symbol suffixing in the gmake builds (already supported in cmake)
RISC V:
* Added a RISC V port optimized for C910V
POWER:
* Added optimized POWER10 kernels for SAXPY, CAXPY, SDOT, DDOT and DGEMV_N
* Improved DGEMM performance on POWER10
* Improved STRSM and DTRSM performance on POWER9 and POWER10
* Fixed segmemtation faults in DYNAMIC_ARCH builds
* Fixed compilation with the PGI compiler
x86:
* Fixed compilation of kernels that require SSE2 intrinsics since 0.3.12
x86_64:
* Added an optimized bfloat16 SBGEMV kernel for SkylakeX and Cooperlake
* Improved the performance of SASUM and DASUM kernels through parallelization
* Improved the performance of SROT and DROT kernels
* Improved the performance of multithreaded xSYRK
* Fixed OpenMP builds that use the LLVM Clang compiler together with GNU gfortran
(where linking of both the LLVM libomp and GNU libgomp could lead to lockups or
wrong results)
* Fixed miscompilations by old gcc 4.6
* Fixed misdetection of AVX2 capability in some Sandybridge cpus
* Fixed lockups in builds combining DYNAMIC_ARCH with TARGET=GENERIC on OpenBSD
ARM64:
* Fixed segmentation faults in DYNAMIC_ARCH builds
MIPS:
* Improved kernels for Loongson 3R3 ("3A") and 3R4 ("3B") models, including MSA
* Fixed bugs in the MSA kernels for CGEMM, CTRMM, CGEMV and ZGEMV
* Added handling of zero increments in the MSA kernels for SSWAP and DSWAP
* Added DYNAMIC_ARCH support for MIPS64 (currently Loongson3R3/3R4 only)
SPARC:
* Fixed building 32 and 64 bit SPARC kernels with the SolarisStudio compilers
* Wed Dec 16 2020 Dominique Leuenberger <dimstar@opensuse.org>
- Fix invalid symlinks (boo#1179764).
* Sat Oct 24 2020 Ismail Dönmez <idonmez@suse.com>
- Update to version 0.3.12
common:
* Fixed missing BLAS/LAPACK functions (inadvertently dropped during
the build system restructuring to support selective compilation)
* Fixed argument conversion macro in LAPACKE_zgesvdq (LAPACK #458)
power:
* Added optimized SCOPY/CCOPY kernels for POWER10
* Increased and unified the default size of the GEMM buffer
* Fixed building for POWER10 in DYNAMIC_ARCH mode
* POWER10 compatibility test now checks binutils version as well
* Cleaned up compiler warnings
x86_64:
* Corrected compiler version checks for AVX2 compatibility
* Added compiler option -mavx2 for building with flang
* Fixed direct SGEMM pathway for small matrix sizes (broken by
the code refactoring in 0.3.11)
* Fixed unhandled partial register clobbers in several kernels
for AXPY,DOT,GEMV_N and GEMV_T flagged by gcc10 tree-vectorizer
armv8:
* Improved Apple Vortex support to include cross-compiling
- Drop fix-build.patch, merged upstream.
* Wed Oct 21 2020 Ismail Dönmez <idonmez@suse.com>
- Update _constraints to use 12GB RAM on x86_64
* Wed Oct 21 2020 Ismail Dönmez <idonmez@suse.com>
- Update to version 0.3.11
common:
* Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper
limit for placing temporary arrays on the stack) to be compatible
with a stack size of 1mb (as imposed by the JAVA runtime library)
* Added mixed-precision dot function SBDOT and utility functions
shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between
single or double precision float arrays and bfloat16 arrays
* Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions
in lapack.h
* Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2
(causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263)
* Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415)
* Fixed several bugs in the LAPACK testsuite
* Improved performance of TRMM and TRSM for certain problem sizes
* Fixed infinite recursions and workspace miscalculations in ReLAPACK
* CMAKE builds no longer require pkg-config for creating the .pc file
* Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as
enabling these options
* Fixed detection of gfortran when invoked through an mpi wrapper
* Improve thread reinitialization performance with OpenMP after a fork
* Added support for building only the subset of the library required
for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE
* Optional function name prefixes and suffixes are now correctly
reflected in the generated cblas.h
* Added CMAKE build support for the LAPACK and multithreading tests
power:
* Added optimized support for POWER10
* Added support for compiling for POWER8 in 32bit mode
* Added support for compilation with LLVM/clang
* Added support for compilation with NVIDIA/PGI compilers
* Fixed building on big-endian POWER8
* Fixed miscompilation of ZDOTC by gcc10
* Fixed alignment errors in the POWER8 SAXPY kernel
* Improved CPU detection on AIX
* Supported building with older compilers on POWER9
x86_64:
* Added support for Intel Cooperlake
* Added autodetection of AMD Renoir/Matisse/Zen3 cpus
* Added autodetection of Intel Comet Lake cpus
* Reimplemented ?sum, ?dot and daxpy using universal intrinsics
* Reset the fpu state before using the fpu on Windows as a workaround
for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004)
* Fixed potentially undefined behaviour in the dot and gemv_t kernels
* Fixed a potential segmentation fault in DYNAMIC_ARCH builds
* Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers
armv7:
* Fixed cpu detection on BSD-like systems
armv8:
* Added preliminary support for Apple Vortex cpus
* Added support for the Cavium ThunderX3T110 cpu
* Fixed cpu detection on BSD-like systems
* Fixed compilation in -std=C18 mode
IBM Z:
* Added support for compiling with the clang compiler
* Improved GEMM performance on Z14
- Enable bloat16 support via BUILD_BFLOAT16=1
- Add fix-build.patch to fix build with -Werror=return-type
* Sat Oct 03 2020 Egbert Eich <eich@suse.com>
- Set DYNAMIC_ARCH everywhere, use a base CPU model for non-dynamic
bits to have a reproducible base line:
x86_64: CORE2
aarch64: ARMV8
ppc: POWER8
s390: ZARCH_GENERIC
- Remove workaround for build failure on aarch64 (boo#1128794).
* Thu Sep 24 2020 Egbert Eich <eich@suse.com>
- For s390/s390x add TARGET=ZARCH_GENERIC (jsc#SLE-13773).
* Wed Aug 12 2020 Bernhard Wiedemann <bwiedemann@suse.com>
- Avoid compile-time CPU-detection (boo#1100677)
* Thu Jul 23 2020 Egbert Eich <eich@suse.com>
- Add build support for gcc10 to HPC build (bsc#1174439).