* Thu Apr 09 2026 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to 2026.1.0
- More GenAI coverage and framework integrations to minimize code
changes
* New models supported on CPUs & GPUs: Qwen3 VL
* New models supported on CPUs: GPT-OSS 120B
* Preview: Introducing the OpenVINO backend for llama.cpp,
which enables optimized inference on Intel CPUs, GPUs, and
NPUs. Validated on GGUF models such as
Llama-3.2-1B-Instruct-GGUF, Phi-3-mini-4k-instruct-gguf,
Qwen2.5-1.5B-Instruct-GGUF, and Mistral-7B-Instruct-v0.3.
* New notebook: Unified VLM chatbot with video file support and
interactive model switching across Qwen3-VL, Qwen2.5-VL, and
LLaVa-NeXT-Video.
- Broader LLM model support and more model compression
techniques
* OpenVINO™ GenAI adds TaylorSeer Lite caching for image and
video generation, accelerating diffusion-transformer inference
across Flux, SD3, and LTX-Video pipelines, aligned with
Hugging Face Diffusers.
* LTX-Video generation on GPU achieves end-to-end acceleration
through fusion of RMSNorm and RoPE operators, significantly
improving video generation performance.
* OpenVINO™ GenAI adds dynamic LoRA support for Qwen3-VL and
VL models with LLM, allowing developers to swap adapters at
runtime for efficient serving of multiple model variants in
production without reloading the base model.
* Preview: The release-weights API for ov::Model enables
memory reclamation during model compilation on NPUs,
delivering dramatically lower peak memory consumption for
edge and client deployments. Users must set this property
in ov::Model, and it will be applied during compilation.
- More portability and performance to run AI at the edge, in the
cloud, or locally.
* Introducing support for Intel® Core™ Series 3 processors
(formerly codenamed Wildcat Lake) and Intel® Arc™ Pro B70
Graphics with 32GB memory for single-GPU inference on 20-30B
parameter LLMs
* Prompt Lookup Decoding extended to vision-language pipelines,
delivering significantly faster token generation for
multimodal workloads on Intel CPUs and GPUs.
* OpenVINO™ GenAI now has a smaller runtime footprint after
eliminating ICU DLL dependencies from tokenization, leading
to reduced memory usage, faster startup, and easier
deployment.
* OpenVINO GenAI introduces WhisperPipeline for Node.js via
its NPM package, delivering production-ready speech
recognition with word-level audio-to-text transcription.
* OpenVINO™ Model Server enhances support for Qwen3-MOE and
GPT-OSS-20b models, delivering improved performance,
accuracy, and robust concurrent request handling with
continuous batching. These pre-optimized models are
available on Hugging Face for easy deployment.
Additionally, the Model Server introduces image
inpainting and outpainting capabilities via the
/image endpoint for AI image editing.
* Wed Feb 25 2026 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to 2026.0.0
- More GenAI coverage and framework integrations to minimize code
changes
* New models supported on CPUs & GPUs: GPT-OSS-20B,
MiniCPM-V-4_5-8B, and MiniCPM-o-2.6.
* New models supported on NPUs: MiniCPM-o-2.6. In addition, NPU
support is now available on Qwen2.5-1.5B-Instruct,
Qwen3-Embedding-0.6B, Qwen-2.5-coder-0.5B.
* OpenVINO? GenAI now adds word-level timestamp functionality to
the Whisper Pipeline on CPUs, GPUs, and NPUs, enabling more
accurate transcriptions and subtitling in line with OpenAI and
FasterWhisper implementations.
* Phi-3-mini FastDraft model is now available on Hugging Face to
accelerate LLM inference on NPUs. FastDraft optimizes
speculative decoding for LLMs.
- Broader LLM model support and more model compression techniques
* With the new int4 data-aware weight compression for 3D
MatMuls, the Neural Network Compression Framework enables MoE
LLMs to run with reduced memory, bandwidth, and improved
accuracy compared to data-free schemes-delivering faster,
more efficient deployment on resource-constrained devices.
* Preview: the Neural Network Compression Framework now
supports per-layer and per-group Look-Up Tables (LUT) for
FP8-4BLUT quantization. This enables fine-grained,
codebook-based compression that reduces model size and
bandwidth while improving inference speed and accuracy for
LLMs and transformer workloads.
- More portability and performance to run AI at the edge, in the
cloud, or locally.
* Preview: OpenVINO? GenAI adds VLM pipeline support to enhance
Agentic AI framework integration.
* OpenVINO GenAI now supports speculative decoding for NPUs,
delivering improved performance and efficient text generation
through a small draft model that is periodically validated by
the full-size model.
* Preview: NPU compiler integration with the NPU plugin enables
ahead-of-time and on-device compilation without relying on
OEM driver updates. Developers can enable this feature for a
single, ready-to-ship package that reduces integration
friction and accelerates time-to-value.
* OpenVINO? Model Server adds enhanced support for audio
endpoint plus agentic continuous batching and concurrent runs
for improved LLM performance in agentic workflows on Intel
CPUs and GPUs.
- Support Change and Deprecation Notices
* Discontinued in 2026.0:
+ The deprecated openvino.runtime namespace has been removed.
Please use the openvino namespace directly.
+ The deprecated openvino.Type.undefined has been removed.
Please use openvino.Type.dynamic instead.
+ The PostponedConstant constructor signature has been
updated for improved usability:
- Old (removed): Callable[[Tensor], None]
- New: Callable[[], Tensor]
+ The deprecated OpenVINO GenAI predefined generation
configs were removed.
+ The deprecated OpenVINO GenAI support for whisper
stateless decoder model has been removed. Please use a
stateful model.
+ The deprecated OpenVINO GenAI StreamerBase put method, bool
return type for callbacks, and ChunkStreamer class has been
removed.
+ NNCF create_compressed_model() method is now deprecated and
removed in 2026. Please use nncf.prune() method for
unstructured pruning and nncf.quantize() for INT8
quantization.
+ NNCF optimization methods for TensorFlow models and
TensorFlow backend in NNCF are deprecated and removed in
2026. It is recommended to use PyTorch analogous models for
training-aware optimization methods and OpenVINO? IR,
PyTorch, and ONNX models for post-training optimization
methods from NNCF.
+ The following experimental NNCF methods are deprecated and
removed: NAS, Structural Pruning, AutoML, Knowledge
Distillation, Mixed-Precision Quantization, Movement
Sparsity.
+ CPU plugin now requires support for the AVX2 instruction
set as a minimum system requirement. The SSE instruction
set will no longer be supported.
+ OpenVINO migrated builds based on RHEL 8 to RHEL 9.
+ manylinux2014 upgraded to manylinux_2_28. This aligns with
modern toolchain requirements but also means that CentOS 7
will no longer be supported due to glibc incompatibility.
+ MacOS x86 is no longer supported.
+ APT & YUM Repositories Restructure: Starting with release
2025.1, users can switch to the new repository structure for
APT and YUM, which no longer uses year-based subdirectories
(like 2025). The old (legacy) structure is unavailable
starting 2026.0. Detailed instructions are available on the
relevant documentation pages:
- Installation guide - yum
- Installation guide - apt
+ OpenCV binaries removed from Docker images.
* Deprecated and to be removed in the future:
+ Support for Ubuntu 20.04 has been discontinued due to the
end of its standard support.
+ auto shape and auto batch size (reshaping a model in
runtime) will be removed in the future. OpenVINO?s dynamic
shape models are recommended instead.
+ With the release of Node.js v22, updated Node.js bindings
are now available and compatible with the latest LTS
version. These bindings do not support CentOS 7, as they
rely on newer system libraries unavailable on legacy
systems.
+ Starting with 2026.0 release major internal refactoring of
the graph iteration mechanism has been implemented for
improved performance and maintainability. The legacy path
can be enabled by setting the ONNX_ITERATOR=0 environment
variable. This legacy path is deprecated and will be
removed in future releases.
+ OpenVINO Model Server:
- The dedicated OpenVINO operator for Kubernetes and
OpenShift is now deprecated in favor of the recommended
KServe operator. The OpenVINO operator will remain
functional in upcoming OpenVINO Model Server releases
but will no longer be actively developed. Since KServe
provides broader capabilities, no loss of functionality is
expected. On the contrary, more functionalities will be
accessible and migration between other serving solutions
and OpenVINO Model Server will be much easier.
- TensorFlow Serving (TFS) API support is planned for
deprecation. With increasing adoption of the KServe API
for classic models and the OpenAI API for generative
workloads, usage of the TFS API has significantly
declined. Dropping date is to be determined based on the
feedback, with a tentative target of mid-2026.
- Support for Stateful models will be deprecated. These
capabilities were originally introduced for Kaldi audio
models which is no longer relevant. Current audio models
support relies on the OpenAI API, and pipelines
implemented via OpenVINO GenAI library.
- Directed Acyclic Graph Scheduler will be deprecated in
favor of pipelines managed by MediaPipe scheduler and
will be removed in 2026.3. That approach gives more
flexibility, includes wider range of calculators and has
support for using processing accelerators.
* Tue Feb 17 2026 malcolmlewis@opensuse.org
- Enable intel gpu option and add associated package.
* Sun Dec 28 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to 2025.4.0
* Preview: Mixture of Experts (MoE) models optimized for CPUs and
GPUs, validated for GPT-OSS 20B model. How to convert model:
optimum-cli export openvino -m "openai/gpt-oss-20b" out_dir
- -weight-format int4
* Fixed issue ID 174531: Accuracy regression of
Mistral-7b-instruct-v0.2 and Mistral-7b-instruct-v0.3
on all devices when executed with OpenVINO GenAI.
As a workaround, use the IR converted with OpenVINO 2025.3.
* Fixed issue ID 176777: Using the callback parameter with the
Python API call generate() in Text2ImagePipeline,
Image2ImagePipeline, InpaintingPipeline may cause the process
to hang. As a workaround, do not use the callback parameter.
C++ implementations was not affected.
* Resolved an issue in the NPU plugin where the Level Zero (L0)
context was implemented as a static global object and only
destroyed during DLL unload, even after unload_plugin()
was called. This behavior prevented the driver from
spawning threads required for certain optimizations and
features.
* Tue Dec 02 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to 2025.4.0
- More GenAI coverage and framework integrations to minimize code
changes
* New models supported:
+ On CPUs & GPUs: Qwen3-Embedding-0.6B, Qwen3-Reranker-0.6B,
Mistral-Small-24B-Instruct-2501.
+ On NPUs: Gemma-3-4b-it and Qwen2.5-VL-3B-Instruct.
* Preview: Mixture of Experts (MoE) models optimized for CPUs
and GPUs, validated for Qwen3-30B-A3B.
* GenAI pipeline integrations: Qwen3-Embedding-0.6B and
Qwen3-Reranker-0.6B for enhanced retrieval/ranking, and
Qwen2.5VL-7B for video pipeline.
- Broader LLM model support and more model compression
techniques
* The Neural Network Compression Framework (NNCF) ONNX backend
now supports INT8 static post-training quantization (PTQ)
and INT8/INT4 weight-only compression to ensure accuracy
parity with OpenVINO IR format models. SmoothQuant algorithm
support added for INT8 quantization.
* Accelerated multi-token generation for GenAI, leveraging
optimized GPU kernels to deliver faster inference, smarter
KV-cache reuse, and scalable LLM performance.
* GPU plugin updates include improved performance with prefix
caching for chat history scenarios and enhanced LLM accuracy
with dynamic quantization support for INT8.
- More portability and performance to run AI at the edge, in the
cloud, or locally.
* Announcing support for Intel® Core Ultra Processor Series 3.
* Encrypted blob format support added for secure model
deployment with OpenVINO GenAI. Model weights and artifacts
are stored and transmitted in an encrypted format, reducing
risks of IP theft during deployment. Developers can deploy
with minimal code changes using OpenVINO GenAI pipelines.
* OpenVINO™ Model Server and OpenVINO™ GenAI now extend
support for Agentic AI scenarios with new features such as
output parsing and improved chat templates for reliable
multi-turn interactions, and preview functionality for the
Qwen3-30B-A3B model. OVMS also introduces a preview for
audio endpoints.
* NPU deployment is simplified with batch support, enabling
seamless model execution across Intel® Core Ultra
processors while eliminating driver dependencies. Models
are reshaped to batch_size=1 before compilation.
* The improved NVIDIA Triton Server* integration with
OpenVINO backend now enables developers to utilize Intel
GPUs or NPUs for deployment.
* Sun Sep 07 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to 2025.3.0
- More GenAI coverage and framework integrations to minimize code
changes
* New models supported: Phi-4-mini-reasoning, AFM-4.5B,
Gemma-3-1B-it, Gemma-3-4B-it, and Gemma-3-12B,
* NPU support added for: Qwen3-1.7B, Qwen3-4B, and Qwen3-8B.
* LLMs optimized for NPU now available on OpenVINO Hugging
Face collection.
- Broader LLM model support and more model compression techniques
* The NPU plug-in adds support for longer contexts of up to
8K tokens, dynamic prompts, and dynamic LoRA for improved
LLM performance.
* The NPU plug-in now supports dynamic batch sizes by reshaping
the model to a batch size of 1 and concurrently managing
multiple inference requests, enhancing performance and
optimizing memory utilization.
* Accuracy improvements for GenAI models on both built-in
and discrete graphics achieved through the implementation
of the key cache compression per channel technique, in
addition to the existing KV cache per-token compression
method.
* OpenVINO™ GenAI introduces TextRerankPipeline for improved
retrieval relevance and RAG pipeline accuracy, plus
Structured Output for enhanced response reliability and
function calling while ensuring adherence to predefined
formats.
- More portability and performance to run AI at the edge,
in the cloud, or locally.
* Announcing support for Intel® Arc™ Pro B-Series
(B50 and B60).
* Preview: Hugging Face models that are GGUF-enabled for
OpenVINO GenAI are now supported by the OpenVINO™ Model
Server for popular LLM model architectures such as
DeepSeek Distill, Qwen2, Qwen2.5, and Llama 3.
This functionality reduces memory footprint and
simplifies integration for GenAI workloads.
* With improved reliability and tool call accuracy,
the OpenVINO™ Model Server boosts support for
agentic AI use cases on AI PCs, while enhancing
performance on Intel CPUs, built-in GPUs, and NPUs.
* int4 data-aware weights compression, now supported in the
Neural Network Compression Framework (NNCF) for ONNX
models, reduces memory footprint while maintaining
accuracy and enables efficient deployment in
resource-constrained environments.
Version: 2025.2.0-bp160.1.4
* Wed Jun 25 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- openSUSE Leap 16.0 compatibility
* Tue Jun 24 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Remove openvino-gcc5-compatibility.patch file
* Tue Jun 24 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
Summary of major features and improvements
- More GenAI coverage and framework integrations to minimize code
changes
* New models supported on CPUs & GPUs: Phi-4,
Mistral-7B-Instruct-v0.3, SD-XL Inpainting 0.1, Stable
Diffusion 3.5 Large Turbo, Phi-4-reasoning, Qwen3, and
Qwen2.5-VL-3B-Instruct. Mistral 7B Instruct v0.3 is also
supported on NPUs.
* Preview: OpenVINO ™ GenAI introduces a text-to-speech
pipeline for the SpeechT5 TTS model, while the new RAG
backend offers developers a simplified API that delivers
reduced memory usage and improved performance.
* Preview: OpenVINO™ GenAI offers a GGUF Reader for seamless
integration of llama.cpp based LLMs, with Python and C++
pipelines that load GGUF models, build OpenVINO graphs,
and run GPU inference on-the-fly. Validated for popular models:
DeepSeek-R1-Distill-Qwen (1.5B, 7B), Qwen2.5 Instruct
(1.5B, 3B, 7B) & llama-3.2 Instruct (1B, 3B, 8B).
- Broader LLM model support and more model compression
techniques
* Further optimization of LoRA adapters in OpenVINO GenAI
for improved LLM, VLM, and text-to-image model performance
on built-in GPUs. Developers can use LoRA adapters to
quickly customize models for specialized tasks.
* KV cache compression for CPUs is enabled by default for
INT8, providing a reduced memory footprint while maintaining
accuracy compared to FP16. Additionally, it delivers
substantial memory savings for LLMs with INT4 support compared
to INT8.
* Optimizations for Intel® Core™ Ultra Processor Series 2
built-in GPUs and Intel® Arc™ B Series Graphics with the
Intel® XMX systolic platform to enhance the performance of
VLM models and hybrid quantized image generation models, as
well as improve first-token latency for LLMs through dynamic
quantization.
- More portability and performance to run AI at the edge, in the
cloud, or locally.
* Enhanced Linux* support with the latest GPU driver for
built-in GPUs on Intel® Core™ Ultra Processor Series 2
(formerly codenamed Arrow Lake H).
* Support for INT4 data-free weights compression for ONNX
models implemented in the Neural Network Compression
Framework (NNCF).
* NPU support for FP16-NF4 precision on Intel® Core™ 200V
Series processors for models with up to 8B parameters is
enabled through symmetrical and channel-wise quantization,
improving accuracy while maintaining performance efficiency.
Support Change and Deprecation Notices
- Discontinued in 2025:
* Runtime components:
+ The OpenVINO property of Affinity API is no longer
available. It has been replaced with CPU binding
configurations (ov::hint::enable_cpu_pinning).
+ The openvino-nightly PyPI module has been discontinued.
End-users should proceed with the Simple PyPI nightly repo
instead. More information in Release Policy. The
openvino-nightly PyPI module has been discontinued.
End-users should proceed with the Simple PyPI nightly repo
instead. More information in Release Policy.
* Tools:
+ The OpenVINO™ Development Tools package (pip install
openvino-dev) is no longer available for OpenVINO releases
in 2025.
+ Model Optimizer is no longer available. Consider using the
new conversion methods instead. For more details, see the
model conversion transition guide.
+ Intel® Streaming SIMD Extensions (Intel® SSE) are currently
not enabled in the binary package by default. They are
still supported in the source code form.
+ Legacy prefixes: l_, w_, and m_ have been removed from
OpenVINO archive names.
* OpenVINO GenAI:
+ StreamerBase::put(int64_t token)
+ The Bool value for Callback streamer is no longer accepted.
It must now return one of three values of StreamingStatus
enum.
+ ChunkStreamerBase is deprecated. Use StreamerBase instead.
* NNCF create_compressed_model() method is now deprecated.
nncf.quantize() method is recommended for
Quantization-Aware Training of PyTorch and TensorFlow models.
* OpenVINO Model Server (OVMS) benchmark client in C++
using TensorFlow Serving API.
- Deprecated and to be removed in the future:
* Python 3.9 is now deprecated and will be unavailable after
OpenVINO version 2025.4.
* openvino.Type.undefined is now deprecated and will be removed
with version 2026.0. openvino.Type.dynamic should be used
instead.
* APT & YUM Repositories Restructure: Starting with release
2025.1, users can switch to the new repository structure
for APT and YUM, which no longer uses year-based
subdirectories (like “2025”). The old (legacy) structure
will still be available until 2026, when the change will
be finalized. Detailed instructions are available on the
relevant documentation pages:
+ Installation guide - yum
+ Installation guide - apt
* OpenCV binaries will be removed from Docker images in 2026.
* Ubuntu 20.04 support will be deprecated in future OpenVINO
releases due to the end of standard support.
* “auto shape” and “auto batch size” (reshaping a model in
runtime) will be removed in the future. OpenVINO’s dynamic
shape models are recommended instead.
* MacOS x86 is no longer recommended for use due to the
discontinuation of validation. Full support will be removed
later in 2025.
* The openvino namespace of the OpenVINO Python API has been
redesigned, removing the nested openvino.runtime module.
The old namespace is now considered deprecated and will be
discontinued in 2026.0.
* Wed May 21 2025 Andreas Schwab <schwab@suse.de>
- Fix file list for riscv64
* Mon May 05 2025 Dominique Leuenberger <dimstar@opensuse.org>
- Do not force GCC15 on Tumblewed just yet: follow the distro
default compiler, like any other package.
* Sat May 03 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- openvino-gcc5-compatibility.patch to resolve incompatibility
in gcc5
* Thu May 01 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Added gcc-14
* Mon Apr 14 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to 2025.1.0
- Downgrade from gcc13-c++ to 12 due to incompatibility in tbb
compilation. This was due to C++ libraries (using libstdc++)
resulting in the error: libtbb.so.12: undefined reference to
`__cxa_call_terminate@CXXABI_1.3.15'
- More GenAI coverage and framework integrations to minimize code
changes
* New models supported: Phi-4 Mini, Jina CLIP v1, and Bce
Embedding Base v1.
* OpenVINO™ Model Server now supports VLM models, including
Qwen2-VL, Phi-3.5-Vision, and InternVL2.
* OpenVINO GenAI now includes image-to-image and inpainting
features for transformer-based pipelines, such as Flux.1 and
Stable Diffusion 3 models, enhancing their ability to generate
more realistic content.
* Preview: AI Playground now utilizes the OpenVINO Gen AI backend
to enable highly optimized inferencing performance on AI PCs.
- Broader LLM model support and more model compression techniques
* Reduced binary size through optimization of the CPU plugin and
removal of the GEMM kernel.
* Optimization of new kernels for the GPU plugin significantly
boosts the performance of Long Short-Term Memory (LSTM) models,
used in many applications, including speech recognition,
language modeling, and time series forecasting.
* Preview: Token Eviction implemented in OpenVINO GenAI to reduce
the memory consumption of KV Cache by eliminating unimportant
tokens. This current Token Eviction implementation is
beneficial for tasks where a long sequence is generated, such
as chatbots and code generation.
* NPU acceleration for text generation is now enabled in
OpenVINO™ Runtime and OpenVINO™ Model Server to support the
power-efficient deployment of VLM models on NPUs for AI PC use
cases with low concurrency.
- More portability and performance to run AI at the edge, in the
cloud, or locally.
* Additional LLM performance optimizations on Intel® Core™ Ultra
200H series processors for improved 2nd token latency on
Windows and Linux.
* Enhanced performance and efficient resource utilization with
the implementation of Paged Attention and Continuous Batching
by default in the GPU plugin.
* Preview: The new OpenVINO backend for Executorch will enable
accelerated inference and improved performance on Intel
hardware, including CPUs, GPUs, and NPUs.
* Tue Mar 04 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Disabled JAX plugin beta.
* Sun Feb 09 2025 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to 2025.0.0
- More GenAI coverage and framework integrations to minimize code
changes
* New models supported: Qwen 2.5, Deepseek-R1-Distill-Llama-8B,
DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-1.5B,
FLUX.1 Schnell and FLUX.1 Dev
* Whisper Model: Improved performance on CPUs, built-in GPUs,
and discrete GPUs with GenAI API.
* Preview: Introducing NPU support for torch.compile, giving
developers the ability to use the OpenVINO backend to run the
PyTorch API on NPUs. 300+ deep learning models enabled from
the TorchVision, Timm, and TorchBench repositories..
- Broader Large Language Model (LLM) support and more model
compression techniques.
* Preview: Addition of Prompt Lookup to GenAI API improves 2nd
token latency for LLMs by effectively utilizing predefined
prompts that match the intended use case.
* Preview: The GenAI API now offers image-to-image inpainting
functionality. This feature enables models to generate
realistic content by inpainting specified modifications and
seamlessly integrating them with the original image.
* Asymmetric KV Cache compression is now enabled for INT8 on
CPUs, resulting in lower memory consumption and improved 2nd
token latency, especially when dealing with long prompts that
require significant memory. The option should be explicitly
specified by the user.
- More portability and performance to run AI at the edge, in the
cloud, or locally.
* Support for the latest Intel® Core™ Ultra 200H series
processors (formerly codenamed Arrow Lake-H)
* Integration of the OpenVINO ™ backend with the Triton
Inference Server allows developers to utilize the Triton
server for enhanced model serving performance when deploying
on Intel CPUs.
* Preview: A new OpenVINO ™ backend integration allows
developers to leverage OpenVINO performance optimizations
directly within Keras 3 workflows for faster AI inference on
CPUs, built-in GPUs, discrete GPUs, and NPUs. This feature is
available with the latest Keras 3.8 release.
* The OpenVINO Model Server now supports native Windows Server
deployments, allowing developers to leverage better
performance by eliminating container overhead and simplifying
GPU deployment.
- Support Change and Deprecation Notices
* Now deprecated:
+ Legacy prefixes l_, w_, and m_ have been removed from
OpenVINO archive names.
+ The runtime namespace for Python API has been marked as
deprecated and designated to be removed for 2026.0. The
new namespace structure has been delivered, and migration
is possible immediately. Details will be communicated
through warnings andvia documentation.
+ NNCF create_compressed_model() method is deprecated.
nncf.quantize() method is now recommended for
Quantization-Aware Training of PyTorch and
TensorFlow models.