Package Release Info

ollama-0.7.0-bp157.2.1

Update Info: openSUSE-2025-181
Available in Package Hub : 15 SP7 Update

platforms

AArch64
ppc64le
s390x
x86-64

subpackages

ollama

Change Logs

* Fri May 23 2025 Wolfgang Engel <wolfgang.engel@suse.com>

- Cleanup part in spec file where build for SLE-15-SP6 and above
  is defined to make if condition more robust

* Wed May 21 2025 Wolfgang Engel <wolfgang.engel@suse.com>

- Allow to build for Package Hub for SLE-15-SP7
  (openSUSE:Backports:SLE-15-SP7) with g++-12/gcc-12
  by checking for sle_version >= 150600 in spec file (bsc#1243438)

* Sat May 17 2025 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.7.0:
  * Ollama now supports multimodal models via Ollama’s new engine,
    starting with new vision multimodal models:
    ~ Meta Llama 4
    ~ Google Gemma 3
    ~ Qwen 2.5 VL
    ~ Qwen 2.5 VL
  * Ollama now supports providing WebP images as input to
    multimodal models
  * Improved performance of importing safetensors models via
    ollama create
  * Various bug fixes and performance enhancements

* Tue May 06 2025 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.6.8:
  * Performance improvements for Qwen 3 MoE models on NVIDIA and
    AMD GPUs
  * Fixed a memory leak that occurred when providing images as
    input
  * ollama show will now correctly label older vision models such
    as llava
  * Reduced out of memory errors by improving worst-case memory
    estimations
  * Fix issue that resulted in a context canceled error
- Update to version 0.6.7:
  * New model: Qwen 3
  * New model: Phi 4 reasoning and Phi 4 mini reasoning
  * New model: llama 4
  * Increased default context window to 4096 tokens
  * Fixed issue where image paths would not be recognized with ~
    when being provided to ollama run
  * Improved output quality when using JSON mode in certain
    scenarios
  * Fixed issue where model would be stuck in the Stopping...
    state
- Use source url (https://en.opensuse.org/SourceUrls)

* Thu Apr 24 2025 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.6.6:
  * New model: IBM Granite 3.3
  * New model: DeepCoder
  * New, faster model downloading: OLLAMA_EXPERIMENT=client2
    ollama serve will run Ollama using a new downloader with
    improved performance and reliability when running ollama pull
  * Fixed memory leak issues when running Gemma 3, Mistral Small
    3.1 and other models on Ollama
  * Improved performance of ollama create when importing models
    from Safetensors
  * Ollama will now allow tool function parameters with either a
    single type or an array of types
  * Fixed certain out-of-memory issues caused by not reserving
    enough memory at startup
  * Fixed nondeterministic model unload order
  * Included the items and $defs fields to properly handle array
    types in the API
  * OpenAI-Beta headers are now included in the CORS safelist
  * Fixed issue where model tensor data would be corrupted when
    importing models from Safetensors

* Sat Apr 19 2025 Eyad Issa <eyadlorenzo@gmail.com>

- Add ollama to the video group
- Update to version 0.6.5:
  * Add support for mistral-small
  * Fix issues with spm tokenizer for Gemma 3 models
  * Add checks for values falling out of sliding window cache
  * Improve file descriptor management for tensors and
    Pull operations
  * Add gfx1200 & gfx1201 GPU support on Linux
  * Optimize sliding window attention and KV cache implementations
  * Implement loading tensors in 32KiB chunks for better performance
  * Add autotemplate for gemma3 models
  * Add benchmarking for ollama server performance
  * Fix file handling in /proc/cpuinfo discovery
  * Support heterogeneous KV cache layer sizes in memory estimation
  * Fix debug logging for memory estimates
  * Improve error handling for empty logits and tensor data reading
  * Return model capabilities from the show endpoint

* Tue Mar 25 2025 me@levitati.ng

- Update to version 0.6.2:
  * Multiple images are now supported in Gemma 3
  * Fixed issue where running Gemma 3 would consume a large amount
    of system memory
  * ollama create --quantize now works when converting Gemma 3
    from safetensors
  * Fixed issue where /save would not work if running a model
    with / in the name
  * Add support for AMD Strix Halo GPUs

* Tue Mar 18 2025 Bernhard Wiedemann <bwiedemann@suse.com>

- Only require git-core

* Fri Mar 14 2025 Eyad Issa <eyadlorenzo@gmail.com>

- Update BuildRequires to go1.24
- Update to version 0.6.0:
  * New model: Gemma 3
  * Fixed error that would occur when running
    snowflake-arctic-embed and snowflake-arctic-embed2 models
  * Various performance improvements and bug fixes

* Wed Mar 12 2025 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.5.13:
  * New models: Phi-4-Mini, Granite-3.2-Vision, Command R7B Arabic
  * The default context length can now be set with a new
    OLLAMA_CONTEXT_LENGTH environment variable. For example, to set
    the default context length to 8K, use:
    OLLAMA_CONTEXT_LENGTH=8192 ollama serve
  * Fixed issue where bf16 GGUF files could not be imported
  * Ollama is now be able to accept requests from Visual Studio
    Code and Cursor by accepting requests from origins beginning
    with vscode-file://
  * Various performance improvements and bug fixes

* Thu Feb 27 2025 eyadlorenzo@gmail.com

- Update to version 0.5.12:
  * New model: Perplexity R1 1776
  * The OpenAI-compatible API will now return tool_calls if the
    model called a tool
  * Performance on certain Intel Xeon processors should now be
    restored
  * Fixed permission denied issues after installing Ollama on Linux
  * Fixed issue where additional CPU libraries were included in the
    arm64 Linux install
  * The progress bar will no longer flicker when running ollama
    pull
  * Fixed issue where running a model would fail on Linux if Ollama
    was installed in a path with UTF-8 characters
  * X-Stainless-Timeout will now be accepted as a header in the
    OpenAI API endpoints

* Sat Feb 15 2025 Eyad Issa <eyadlorenzo@gmail.com>

- Use Ninja instead of Make and update the build script to
  match the new version
- Update to version 0.5.11:
  * No notable changes for Linux
- Update to version 0.5.10:
  * Fixed issue on multi-GPU Windows and Linux machines where
    memory estimations would be incorrect
- Update to version 0.5.9:
  * New model: DeepScaleR
  * New model: OpenThinker
- Update to version 0.5.8:
  * Ollama will now use AVX-512 instructions where available for
    additional CPU acceleration
  * Fixed indexing error that would occur when downloading a model
    with ollama run or ollama pull
  * Fixes cases where download progress would reverse

* Mon Jan 27 2025 Adrian Schröter <adrian@suse.de>

- Make ollama configurable by the admin via /etc/sysconfig/ollama
  (boo#1236008)
- cleanup reproducible.patch

* Thu Jan 16 2025 Eyad Issa <eyadlorenzo@gmail.com>

- Removed 01-build-verbose.patch: embedded GOFLAG into .spec file
- Disabled reproducible.patch: should be not needed, as .gz is
  not produced anymore
- Update to version 0.5.7:
  * Fixed issue where using two FROM commands in Modelfile
  * Support importing Command R and Command R+ architectures
    from safetensors
- Update to version 0.5.6:
  * Fixed errors that would occur when running ollama create on
    Windows and when using absolute paths
- Update to version 0.5.5:
  * New models:
    ~ Phi-4
    ~ Command R7B
    ~ DeepSeek-V3
    ~ OLMo 2
    ~ Dolphin 3
    ~ SmallThinker:
    ~ Granite 3.1 Dense
    ~ Granite 3.1 MoE
  * The /api/create API endpoint that powers ollama create has
    been changed to improve conversion time and also accept a JSON
    object.
  * Fixed runtime error that would occur when filling the model's
    context window
  * Fixed crash that would occur when quotes were used in /save
  * Fixed errors that would occur when sending x-stainless headers
    from OpenAI clients
- Update to version 0.5.4:
  * New model: Falcon3
  * Fixed issue where providing null to format would result in
    an error
- Update to version 0.5.3:
  * Fixed runtime errors on older Intel Macs
  * Fixed issue where setting the format field to "" would cause
    an error
- Update to version 0.5.2:
  * New model: EXAONE 3.5
  * Fixed issue where whitespace would get trimmed from prompt
    when images were provided
  * Improved memory estimation when scheduling models
  * OLLAMA_ORIGINS will now check hosts in a case insensitive
    manner

Version: 0.3.13-bp160.1.34

* Sat Oct 12 2024 eyadlorenzo@gmail.com

- Update to version 0.3.13:
  * New safety models:
    ~ Llama Guard 3: a series of models by Meta, fine-tuned for
    content safety classification of LLM inputs and responses.
    ~ ShieldGemma: ShieldGemma is set of instruction tuned models
    from Google DeepMind for evaluating the safety of text
    prompt input and text output responses against a set of
    defined safety policies.
  * Fixed issue where ollama pull would leave connections when
    encountering an error
  * ollama rm will now stop a model if it is running prior to
    deleting it

* Sat Sep 28 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>

- Update to version 0.3.12:
  * Llama 3.2: Meta's Llama 3.2 goes small with 1B and 3B
    models.
  * Qwen 2.5 Coder: The latest series of Code-Specific Qwen
    models, with significant improvements in code generation,
    code reasoning, and code fixing.
  * Ollama now supports ARM Windows machines
  * Fixed rare issue where Ollama would report a missing .dll
    file on Windows
  * Fixed performance issue for Windows without GPUs

* Fri Sep 20 2024 adrian@suse.de

- Update to version 0.3.11:
  * llm: add solar pro (preview) (#6846)
  * server: add tool parsing support for nemotron-mini (#6849)
  * make patches git am-able
  * CI: dist directories no longer present (#6834)
  * CI: clean up naming, fix tagging latest (#6832)
  * CI: set platform build build_linux script to keep buildx happy (#6829)
  * readme: add Agents-Flex to community integrations (#6788)
  * fix typo in import docs (#6828)
  * readme: add vim-intelligence-bridge to Terminal section (#6818)
  * readme: add Obsidian Quiz Generator plugin to community integrations (#6789)
  * Fix incremental builds on linux (#6780)
  * Use GOARCH for build dirs (#6779)
  * Optimize container images for startup (#6547)
  * examples: updated requirements.txt for privategpt example
  * examples: polish loganalyzer example (#6744)
  * readme: add ollama_moe to community integrations (#6752)
  * runner: Flush pending responses before returning
  * add "stop" command (#6739)
  * refactor show ouput
  * readme: add QodeAssist to community integrations (#6754)
  * Verify permissions for AMD GPU (#6736)
  * add *_proxy for debugging
  * docs: update examples to use llama3.1 (#6718)
  * Quiet down dockers new lint warnings (#6716)
  * catch when model vocab size is set correctly (#6714)
  * readme: add crewAI to community integrations (#6699)
  * readme: add crewAI with mesop to community integrations

* Tue Sep 17 2024 adrian@suse.de

- Update to version 0.3.10:
  * openai: align chat temperature and frequency_penalty options with completion (#6688)
  * docs: improve linux install documentation (#6683)
  * openai: don't scale temperature or frequency_penalty (#6514)
  * readme: add Archyve to community integrations (#6680)
  * readme: add Plasmoid Ollama Control to community integrations (#6681)
  * Improve logging on GPU too small (#6666)
  * openai: fix "presence_penalty" typo and add test (#6665)
  * Fix gemma2 2b conversion (#6645)
  * Document uninstall on windows (#6663)
  * Revert "Detect running in a container (#6495)" (#6662)
  * llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
  * Introduce GPU Overhead env var (#5922)
  * Detect running in a container (#6495)
  * readme: add AiLama to the list of community integrations (#4957)
  * Update gpu.md: Add RTX 3050 Ti and RTX 3050 Ti (#5888)
  * server: fix blob download when receiving a 200 response  (#6656)
  * readme: add Gentoo package manager entry to community integrations (#5714)
  * Update install.sh：Replace "command -v" with encapsulated functionality (#6035)
  * readme: include Enchanted for Apple Vision Pro (#4949)
  * readme: add lsp-ai to community integrations (#5063)
  * readme: add ollama-php library to community integrations (#6361)
  * readme: add vnc-lm discord bot community integration (#6644)
  * llm: use json.hpp from common (#6642)
  * readme: add confichat to community integrations (#6378)
  * docs: add group to manual Linux isntructions and verify service is running (#6430)
  * readme: add gollm to the list of community libraries (#6099)
  * readme: add Cherry Studio to community integrations (#6633)
  * readme: add Go fun package (#6421)
  * docs: fix spelling error (#6391)
  * install.sh: update instructions to use WSL2 (#6450)
  * readme: add claude-dev to community integrations (#6630)
  * readme: add PyOllaMx project (#6624)
  * llm: update llama.cpp commit to 8962422 (#6618)
  * Use cuda v11 for driver 525 and older (#6620)
  * Log system memory at info (#6617)
  * readme: add Painting Droid community integration (#5514)
  * readme: update Ollama4j link and add link to Ollama4j Web UI (#6608)
  * Fix sprintf to snprintf (#5664)
  * readme: add PartCAD tool to readme for generating 3D CAD models using Ollama (#6605)
  * Reduce docker image size (#5847)
  * readme: add OllamaFarm project (#6508)
  * readme: add go-crew and Ollamaclient projects (#6583)
  * docs: update faq.md for OLLAMA_MODELS env var permissions (#6587)
  * fix(cmd): show info may have nil ModelInfo (#6579)
  * docs: update GGUF examples and references (#6577)
  * Add findutils to base images (#6581)
  * remove any unneeded build artifacts
  * doc: Add Nix and Flox to package manager listing (#6074)
  * update the openai docs to explain how to set the context size (#6548)
  * fix(test): do not clobber models directory
  * add llama3.1 chat template (#6545)
  * update deprecated warnings
  * validate model path
  * throw an error when encountering unsupport tensor sizes (#6538)
  * Move ollama executable out of bin dir (#6535)
  * update templates to use messages
  * more tokenizer tests
  * add safetensors to the modelfile docs (#6532)
  * Fix import image width (#6528)
  * Update manual instructions with discrete ROCm bundle (#6445)
  * llm: fix typo in comment (#6530)
  * adjust image sizes
  * clean up convert tokenizer
  * detect chat template from configs that contain lists
  * update the import docs (#6104)
  * server: clean up route names for consistency (#6524)
  * Only enable numa on CPUs (#6484)
  * gpu: Group GPU Library sets by variant (#6483)
  * update faq
  * passthrough OLLAMA_HOST path to client
  * convert safetensor adapters into GGUF (#6327)
  * gpu: Ensure driver version set before variant (#6480)
  * llm: Align cmake define for cuda no peer copy (#6455)
  * Fix embeddings memory corruption (#6467)
  * llama3.1
  * convert gemma2
  * create bert models from cli
  * bert
  * Split rocm back out of bundle (#6432)
  * CI: remove directories from dist dir before upload step (#6429)
  * CI: handle directories during checksum (#6427)
  * Fix overlapping artifact name on CI
  * Review comments
  * Adjust layout to bin+lib/ollama
  * Remove Jetpack
  * Add windows cuda v12 + v11 support
  * Enable cuda v12 flags
  * Add cuda v12 variant and selection logic
  * Report GPU variant in log
  * Add Jetson cuda variants for arm
  * Wire up ccache and pigz in the docker based build
  * Refactor linux packaging
  * server: limit upload parts to 16 (#6411)
  * Fix white space.
  * Reset NumCtx.
  * Override numParallel only if unset.
  * fix: chmod new layer to 0o644 when creating it
  * fix: Add tooltip to system tray icon
  * only skip invalid json manifests
  * skip invalid manifest files
  * fix noprune
  * add `CONTRIBUTING.md` (#6349)
  * Fix typo and improve readability (#5964)
  * server: reduce max connections used in download (#6347)
  * update chatml template format to latest in docs (#6344)
  * lint
  * Update openai.md to remove extra checkbox (#6345)
  * llama3.1 memory

* Thu Aug 15 2024 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.3.6:
  * Fixed issue where /api/embed would return an error instead of
    loading the model when the input field was not provided.
  * ollama create can now import Phi-3 models from Safetensors
  * Added progress information to ollama create when importing GGUF
    files
  * Ollama will now import GGUF files faster by minimizing file
    copies
- Update to version 0.3.6:
  * Fixed issue where temporary files would not be cleaned up
  * Fix rare error when Ollama would start up due to invalid model
    data

* Sun Aug 11 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>

- Update to version 0.3.4:
  * New embedding models
  - BGE-M3: a large embedding model from BAAI distinguished for
    its versatility in Multi-Functionality, Multi-Linguality, and
    Multi-Granularity.
  - BGE-Large: a large embedding model trained in english.
  - Paraphrase-Multilingual: A multilingual embedding model
    trained on parallel data for 50+ languages.
  * New embedding API with batch support
  - Ollama now supports a new API endpoint /api/embed for
    embedding generation:
  * This API endpoint supports new features:
  - Batches: generate embeddings for several documents in
    one request
  - Normalized embeddings: embeddings are now normalized,
    improving similarity results
  - Truncation: a new truncate parameter that will error if
    set to false
  - Metrics: responses include load_duration, total_duration and
    prompt_eval_count metrics

* Sat Aug 03 2024 eyadlorenzo@gmail.com

- Update to version 0.3.3:
  * The /api/embed endpoint now returns statistics: total_duration,
    load_duration, and prompt_eval_count
  * Added usage metrics to the /v1/embeddings OpenAI compatibility
    API
  * Fixed issue where /api/generate would respond with an empty
    string if provided a context
  * Fixed issue where /api/generate would return an incorrect
    value for context
  * /show modefile will now render MESSAGE commands correctly
- Update to version 0.3.2:
  * Fixed issue where ollama pull would not resume download
    progress
  * Fixed issue where phi3 would report an error on older versions

* Tue Jul 30 2024 Adrian Schröter <adrian@suse.de>

- Update to version 0.3.1:
  * Added support for min_p sampling option
  * Lowered number of requests required when downloading models
    with ollama pull
  * ollama create will now autodetect required stop parameters
    when importing certain models
  * Fixed issue where /save would cause parameters to be saved
    incorrectly.
  * OpenAI-compatible API will now return a finish_reason of
    tool_calls if a tool call occured.

* Mon Jul 29 2024 Adrian Schröter <adrian@suse.de>

- fix build on leap 15.6
- exclude builds on 32bit due to build failures

* Sun Jul 28 2024 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.3.0:
  * Ollama now supports tool calling with popular models such
    as Llama 3.1. This enables a model to answer a given prompt
    using tool(s) it knows about, making it possible for models to
    perform more complex tasks or interact with the outside world.
  * New models:
    ~ Llama 3.1
    ~ Mistral Large 2
    ~ Firefunction v2
    ~ Llama-3-Groq-Tool-Use
  * Fixed duplicate error message when running ollama create

Version: 0.5.1-bp156.2.1

* Thu Dec 12 2024 Bernhard Wiedemann <bwiedemann@suse.com>

- Add reproducible.patch for deterministic .gz creation (boo#1047218)

* Sat Dec 07 2024 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.5.1:
  * Fixed issue where Ollama's API would generate JSON output when
    specifying "format": null
  * Fixed issue where passing --format json to ollama run would
    cause an error
- Update to version 0.5.0:
  * New models:
    ~ Llama 3.3: a new state of the art 70B model.
    ~ Snowflake Arctic Embed 2: Snowflake's frontier embedding
    model.
  * Ollama now supports structured outputs, making it possible to
    constrain a model's output to a specific format defined by a
    JSON schema. The Ollama Python and JavaScript libraries have
    been updated to support structured outputs, together with
    Ollama's OpenAI-compatible API endpoints.
  * Fixed error importing model vocabulary files
  * Experimental: new flag to set KV cache quantization to 4-bit
    (q4_0), 8-bit (q8_0) or 16-bit (f16). This reduces VRAM
    requirements for longer context windows.
- Update to version 0.4.7:
  * Enable index tracking for tools - openai api support (#7888)
  * llama: fix typo and formatting in readme (#7876)
  * readme: add SpaceLlama, YouLama, and DualMind to community
    integrations (#7216)

* Sat Nov 30 2024 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.4.6:
  * New model: QwQ: an experimental research model by the Qwen
    team, focused on advancing AI reasoning capabilities.
  * Tool calls will now be included in streaming responses
  * Ollama will now provide an error when submitting SVG images
  * Image tokens will no longer be counted in token counts when
    running a text-only model
- Update to version 0.4.5:
  * The Ollama Python Library has been updated
  * Fixed issue where HTTPS_PROXY and HTTP_PROXY environment
    variables would have no effect
  * Ollama will now accept X-Stainless-Retry-Count used by many
    OpenAI API clients
  * Fix issue where importing certain GGUF files would result in
    the incorrect quantization level
  * ollama push will now print the uploaded model URL on
    ollama.com
- Update to version 0.4.4:
  * Marco-o1: An open large reasoning model for real-world
    solutions by the Alibaba International Digital Commerce Group
    (AIDC-AI).
  * Fixed issue where Ollama would freeze when processing requests
    in parallel (e.g. when using code completion tools)
  * Redirecting output to a file no longer outputs progress bars
    or spinners
- Update to version 0.4.3:
  * New model: Tülu 3 is a leading instruction following model
    family, offering fully open-source data, code, and recipes by
    the The Allen Institute for AI.
  * New model: Mistral Large: a new version of Mistral Large with
    improved Long Context, Function Calling and System Prompt
    support.
  * Improved performance issues that occurred in Ollama versions
    0.4.0-0.4.2
  * Fixed issue that would cause granite3-dense to generate empty
    responses
  * Fixed crashes and hanging caused by KV cache management

* Sat Nov 16 2024 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.4.2:
  * runner.go: Propagate panics back to the user.
  * runner.go: Increase survivability of main processing loop
  * build: fix arm container image (#7674)
  * add line numbers for parser errors (#7326)
  * chore(deps): bump golang.org/x dependencies (#7655)
  * runner.go: Don't trim whitespace from inputs
  * runner.go: Enforce NUM_PARALLEL directly in the runner
  * cmd: preserve exact bytes when displaying template/system layers (#7586)
  * fix(mllama): sync backend between batches
  * runner.go: Fix off-by-one for num predicted
  * CI: give windows lint more time (#7635)
  * Jetpack support for Go server (#7217)
  * doc: capture numeric group requirement (#6941)
  * docs: Capture docker cgroup workaround (#7519)
  * runner.go: Make KV entry accounting more robust
  * readme: add aichat terminal app to community integrations (#7418)
  * api: fix typos in Go Doc comments (#7620)
  * readme: add GoLamify to community integrations (#7521)
  * readme: add browser extension that enables using Ollama for interacting with web pages (#5827)
  * docs: add mentions of Llama 3.2 (#7517)
  * api: fix typo in python ClientFromEnvironment docs (#7604)
  * readme: add llama3.2-vision to model list (#7580)

* Mon Nov 11 2024 Eyad Issa <eyadlorenzo@gmail.com>

- Add patch 01-build-verbose.patch to add the -v option
  to go build
- Update to version 0.4.1:
  * runner.go: Check for zero length images
  * docs: update langchainpy.md with proper model name (#7527)
  * Set macos min version for all architectures (#7579)
  * win: remove preview title from installer (#7529)
  * Workaround buggy P2P ROCm copy on windows (#7466)
  * Debug logging for nvcuda init (#7532)
  * Align rocm compiler flags (#7467)
  * Be explicit for gpu library link dir (#7560)
  * docs: OLLAMA_NEW_RUNNERS no longer exists
  * runner.go: Remove unused arguments
  * sched: Lift parallel restriction for multimodal models except mllama

* Thu Nov 07 2024 adrian@suse.de

- Update to version 0.4.0:
  * Update README.md (#7516)
  * One corrupt manifest should not wedge model operations (#7515)
  * prompt: Use a single token when estimating mllama context size
  * readme: add Hexabot to the list of community integrations
  * Quiet down debug log of image payload (#7454)

* Wed Nov 06 2024 Eyad Issa <eyadlorenzo@gmail.com>

- Update to version 0.4.0-rc8:
  * CI: Switch to v13 macos runner (#7498)
  * CI: matrix strategy fix (#7496)
  * Sign windows arm64 official binaries (#7493)
  * readme: add TextCraft to community integrations (#7377)
  * nvidia libs have inconsistent ordering (#7473)
  * CI: omit unused tools for faster release builds (#7432)
  * llama: Improve error handling
  * runner.go: Only allocate 1 element embedding batches for mllama
  * refactor kv estimation
  * mllama cross attention
  * Add basic mllama integration tests (#7455)
  * runner.go: Don't set cross attention before sending embeddings
  * Give unicode test more time to run (#7437)

* Fri Nov 01 2024 Eyad Issa <eyadlorenzo@gmail.com>

- Remove enable-lto.patch
- Update to version 0.4.0-rc6:
  * Refine default thread selection for NUMA systems (#7322)
  * runner.go: Better abstract vision model integration
  * Soften windows clang requirement (#7428)
  * Remove submodule and shift to Go server - 0.4.0  (#7157)
  * Move windows app out of preview (#7347)
  * windows: Support alt install paths, fit and finish (#6967)
  * add more tests for getting the optimal tiled canvas (#7411)
  * Switch windows to clang (#7407)
  * tests: Add test for Unicode processing
  * runner.go: Better handle return NULL values from llama.cpp
  * add mllama image processing to the generate handler (#7384)
  * Bump to latest Go 1.22 patch (#7379)
  * Fix deepseek deseret regex (#7369)
  * Better support for AMD multi-GPU on linux (#7212)
  * Fix unicode output on windows with redirect to file (#7358)
  * Fix incremental build file deps (#7361)
  * Improve dependency gathering logic (#7345)
  * fix #7247 - invalid image input (#7249)
  * integration: harden embedding test (#7306)
  * default to "FROM ." if a Modelfile isn't present (#7250)
  * Fix rocm windows build and clean up dependency gathering (#7305)
  * runner.go: Merge partial unicode characters before sending
  * readme: add Ollama for Swift to the community integrations (#7295)
  * server: allow vscode-webview origin (#7273)
  * image processing for llama3.2 (#6963)
  * llama: Decouple patching script from submodule (#7139)
  * llama: add compiler tags for cpu features (#7137)

* Wed Oct 30 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>

- Update to version 0.3.14:
  * New Models
    + Granite 3 MoE: The IBM Granite 1B and 3B models are the
    first mixture of experts (MoE) Granite models from IBM
    designed for low latency usage.
    + Granite 3 Dense: The IBM Granite 2B and 8B models are
    designed to support tool-based use cases and support for
    retrieval augmented generation (RAG), streamlining code
    generation, translation and bug fixing.

* Wed Jul 24 2024 adrian@suse.de

- Update to version 0.2.8:
  * api embed docs (#5282)
  * convert: capture `head_dim` for mistral (#5818)
  * Update llama.cpp submodule commit to `d94c6e0c` (#5805)
  * server: collect nested tool call objects when parsing (#5824)
  * Remove no longer supported max vram var
  * Refine error reporting for subprocess crash
  * Remove out of space test temporarily (#5825)
  * llm: consider `head_dim` in llama arch (#5817)
  * Adjust windows ROCm discovery
  * add patch for tekken (#5807)
  * preserve last assistant message (#5802)
  * Fix generate test flakyness (#5804)
  * server: validate template (#5734)
  * OpenAI: Function Based Testing (#5752)
  * adjust openai chat msg processing (#5729)
  * fix parsing tool calls
  * server: check for empty tools array too (#5779)
  * always provide content even if empty (#5778)
  * server: only parse tool calls if tools are provided (#5771)
  * Fix context exhaustion integration test for small gpus
  * Refine scheduler unit tests for reliability