* Thu Dec 12 2024 Bernhard Wiedemann <bwiedemann@suse.com>
- Add reproducible.patch for deterministic .gz creation (boo#1047218)
* Sat Dec 07 2024 Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.5.1:
* Fixed issue where Ollama's API would generate JSON output when
specifying "format": null
* Fixed issue where passing --format json to ollama run would
cause an error
- Update to version 0.5.0:
* New models:
~ Llama 3.3: a new state of the art 70B model.
~ Snowflake Arctic Embed 2: Snowflake's frontier embedding
model.
* Ollama now supports structured outputs, making it possible to
constrain a model's output to a specific format defined by a
JSON schema. The Ollama Python and JavaScript libraries have
been updated to support structured outputs, together with
Ollama's OpenAI-compatible API endpoints.
* Fixed error importing model vocabulary files
* Experimental: new flag to set KV cache quantization to 4-bit
(q4_0), 8-bit (q8_0) or 16-bit (f16). This reduces VRAM
requirements for longer context windows.
- Update to version 0.4.7:
* Enable index tracking for tools - openai api support (#7888)
* llama: fix typo and formatting in readme (#7876)
* readme: add SpaceLlama, YouLama, and DualMind to community
integrations (#7216)
* Sat Nov 30 2024 Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.4.6:
* New model: QwQ: an experimental research model by the Qwen
team, focused on advancing AI reasoning capabilities.
* Tool calls will now be included in streaming responses
* Ollama will now provide an error when submitting SVG images
* Image tokens will no longer be counted in token counts when
running a text-only model
- Update to version 0.4.5:
* The Ollama Python Library has been updated
* Fixed issue where HTTPS_PROXY and HTTP_PROXY environment
variables would have no effect
* Ollama will now accept X-Stainless-Retry-Count used by many
OpenAI API clients
* Fix issue where importing certain GGUF files would result in
the incorrect quantization level
* ollama push will now print the uploaded model URL on
ollama.com
- Update to version 0.4.4:
* Marco-o1: An open large reasoning model for real-world
solutions by the Alibaba International Digital Commerce Group
(AIDC-AI).
* Fixed issue where Ollama would freeze when processing requests
in parallel (e.g. when using code completion tools)
* Redirecting output to a file no longer outputs progress bars
or spinners
- Update to version 0.4.3:
* New model: Tülu 3 is a leading instruction following model
family, offering fully open-source data, code, and recipes by
the The Allen Institute for AI.
* New model: Mistral Large: a new version of Mistral Large with
improved Long Context, Function Calling and System Prompt
support.
* Improved performance issues that occurred in Ollama versions
0.4.0-0.4.2
* Fixed issue that would cause granite3-dense to generate empty
responses
* Fixed crashes and hanging caused by KV cache management
* Sat Nov 16 2024 Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.4.2:
* runner.go: Propagate panics back to the user.
* runner.go: Increase survivability of main processing loop
* build: fix arm container image (#7674)
* add line numbers for parser errors (#7326)
* chore(deps): bump golang.org/x dependencies (#7655)
* runner.go: Don't trim whitespace from inputs
* runner.go: Enforce NUM_PARALLEL directly in the runner
* cmd: preserve exact bytes when displaying template/system layers (#7586)
* fix(mllama): sync backend between batches
* runner.go: Fix off-by-one for num predicted
* CI: give windows lint more time (#7635)
* Jetpack support for Go server (#7217)
* doc: capture numeric group requirement (#6941)
* docs: Capture docker cgroup workaround (#7519)
* runner.go: Make KV entry accounting more robust
* readme: add aichat terminal app to community integrations (#7418)
* api: fix typos in Go Doc comments (#7620)
* readme: add GoLamify to community integrations (#7521)
* readme: add browser extension that enables using Ollama for interacting with web pages (#5827)
* docs: add mentions of Llama 3.2 (#7517)
* api: fix typo in python ClientFromEnvironment docs (#7604)
* readme: add llama3.2-vision to model list (#7580)
* Mon Nov 11 2024 Eyad Issa <eyadlorenzo@gmail.com>
- Add patch 01-build-verbose.patch to add the -v option
to go build
- Update to version 0.4.1:
* runner.go: Check for zero length images
* docs: update langchainpy.md with proper model name (#7527)
* Set macos min version for all architectures (#7579)
* win: remove preview title from installer (#7529)
* Workaround buggy P2P ROCm copy on windows (#7466)
* Debug logging for nvcuda init (#7532)
* Align rocm compiler flags (#7467)
* Be explicit for gpu library link dir (#7560)
* docs: OLLAMA_NEW_RUNNERS no longer exists
* runner.go: Remove unused arguments
* sched: Lift parallel restriction for multimodal models except mllama
* Thu Nov 07 2024 adrian@suse.de
- Update to version 0.4.0:
* Update README.md (#7516)
* One corrupt manifest should not wedge model operations (#7515)
* prompt: Use a single token when estimating mllama context size
* readme: add Hexabot to the list of community integrations
* Quiet down debug log of image payload (#7454)
* Wed Nov 06 2024 Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.4.0-rc8:
* CI: Switch to v13 macos runner (#7498)
* CI: matrix strategy fix (#7496)
* Sign windows arm64 official binaries (#7493)
* readme: add TextCraft to community integrations (#7377)
* nvidia libs have inconsistent ordering (#7473)
* CI: omit unused tools for faster release builds (#7432)
* llama: Improve error handling
* runner.go: Only allocate 1 element embedding batches for mllama
* refactor kv estimation
* mllama cross attention
* Add basic mllama integration tests (#7455)
* runner.go: Don't set cross attention before sending embeddings
* Give unicode test more time to run (#7437)
* Fri Nov 01 2024 Eyad Issa <eyadlorenzo@gmail.com>
- Remove enable-lto.patch
- Update to version 0.4.0-rc6:
* Refine default thread selection for NUMA systems (#7322)
* runner.go: Better abstract vision model integration
* Soften windows clang requirement (#7428)
* Remove submodule and shift to Go server - 0.4.0 (#7157)
* Move windows app out of preview (#7347)
* windows: Support alt install paths, fit and finish (#6967)
* add more tests for getting the optimal tiled canvas (#7411)
* Switch windows to clang (#7407)
* tests: Add test for Unicode processing
* runner.go: Better handle return NULL values from llama.cpp
* add mllama image processing to the generate handler (#7384)
* Bump to latest Go 1.22 patch (#7379)
* Fix deepseek deseret regex (#7369)
* Better support for AMD multi-GPU on linux (#7212)
* Fix unicode output on windows with redirect to file (#7358)
* Fix incremental build file deps (#7361)
* Improve dependency gathering logic (#7345)
* fix #7247 - invalid image input (#7249)
* integration: harden embedding test (#7306)
* default to "FROM ." if a Modelfile isn't present (#7250)
* Fix rocm windows build and clean up dependency gathering (#7305)
* runner.go: Merge partial unicode characters before sending
* readme: add Ollama for Swift to the community integrations (#7295)
* server: allow vscode-webview origin (#7273)
* image processing for llama3.2 (#6963)
* llama: Decouple patching script from submodule (#7139)
* llama: add compiler tags for cpu features (#7137)
* Wed Oct 30 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to version 0.3.14:
* New Models
+ Granite 3 MoE: The IBM Granite 1B and 3B models are the
first mixture of experts (MoE) Granite models from IBM
designed for low latency usage.
+ Granite 3 Dense: The IBM Granite 2B and 8B models are
designed to support tool-based use cases and support for
retrieval augmented generation (RAG), streamlining code
generation, translation and bug fixing.
* Sat Oct 12 2024 eyadlorenzo@gmail.com
- Update to version 0.3.13:
* New safety models:
~ Llama Guard 3: a series of models by Meta, fine-tuned for
content safety classification of LLM inputs and responses.
~ ShieldGemma: ShieldGemma is set of instruction tuned models
from Google DeepMind for evaluating the safety of text
prompt input and text output responses against a set of
defined safety policies.
* Fixed issue where ollama pull would leave connections when
encountering an error
* ollama rm will now stop a model if it is running prior to
deleting it
* Sat Sep 28 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to version 0.3.12:
* Llama 3.2: Meta's Llama 3.2 goes small with 1B and 3B
models.
* Qwen 2.5 Coder: The latest series of Code-Specific Qwen
models, with significant improvements in code generation,
code reasoning, and code fixing.
* Ollama now supports ARM Windows machines
* Fixed rare issue where Ollama would report a missing .dll
file on Windows
* Fixed performance issue for Windows without GPUs
* Fri Sep 20 2024 adrian@suse.de
- Update to version 0.3.11:
* llm: add solar pro (preview) (#6846)
* server: add tool parsing support for nemotron-mini (#6849)
* make patches git am-able
* CI: dist directories no longer present (#6834)
* CI: clean up naming, fix tagging latest (#6832)
* CI: set platform build build_linux script to keep buildx happy (#6829)
* readme: add Agents-Flex to community integrations (#6788)
* fix typo in import docs (#6828)
* readme: add vim-intelligence-bridge to Terminal section (#6818)
* readme: add Obsidian Quiz Generator plugin to community integrations (#6789)
* Fix incremental builds on linux (#6780)
* Use GOARCH for build dirs (#6779)
* Optimize container images for startup (#6547)
* examples: updated requirements.txt for privategpt example
* examples: polish loganalyzer example (#6744)
* readme: add ollama_moe to community integrations (#6752)
* runner: Flush pending responses before returning
* add "stop" command (#6739)
* refactor show ouput
* readme: add QodeAssist to community integrations (#6754)
* Verify permissions for AMD GPU (#6736)
* add *_proxy for debugging
* docs: update examples to use llama3.1 (#6718)
* Quiet down dockers new lint warnings (#6716)
* catch when model vocab size is set correctly (#6714)
* readme: add crewAI to community integrations (#6699)
* readme: add crewAI with mesop to community integrations
* Tue Sep 17 2024 adrian@suse.de
- Update to version 0.3.10:
* openai: align chat temperature and frequency_penalty options with completion (#6688)
* docs: improve linux install documentation (#6683)
* openai: don't scale temperature or frequency_penalty (#6514)
* readme: add Archyve to community integrations (#6680)
* readme: add Plasmoid Ollama Control to community integrations (#6681)
* Improve logging on GPU too small (#6666)
* openai: fix "presence_penalty" typo and add test (#6665)
* Fix gemma2 2b conversion (#6645)
* Document uninstall on windows (#6663)
* Revert "Detect running in a container (#6495)" (#6662)
* llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
* Introduce GPU Overhead env var (#5922)
* Detect running in a container (#6495)
* readme: add AiLama to the list of community integrations (#4957)
* Update gpu.md: Add RTX 3050 Ti and RTX 3050 Ti (#5888)
* server: fix blob download when receiving a 200 response (#6656)
* readme: add Gentoo package manager entry to community integrations (#5714)
* Update install.sh:Replace "command -v" with encapsulated functionality (#6035)
* readme: include Enchanted for Apple Vision Pro (#4949)
* readme: add lsp-ai to community integrations (#5063)
* readme: add ollama-php library to community integrations (#6361)
* readme: add vnc-lm discord bot community integration (#6644)
* llm: use json.hpp from common (#6642)
* readme: add confichat to community integrations (#6378)
* docs: add group to manual Linux isntructions and verify service is running (#6430)
* readme: add gollm to the list of community libraries (#6099)
* readme: add Cherry Studio to community integrations (#6633)
* readme: add Go fun package (#6421)
* docs: fix spelling error (#6391)
* install.sh: update instructions to use WSL2 (#6450)
* readme: add claude-dev to community integrations (#6630)
* readme: add PyOllaMx project (#6624)
* llm: update llama.cpp commit to 8962422 (#6618)
* Use cuda v11 for driver 525 and older (#6620)
* Log system memory at info (#6617)
* readme: add Painting Droid community integration (#5514)
* readme: update Ollama4j link and add link to Ollama4j Web UI (#6608)
* Fix sprintf to snprintf (#5664)
* readme: add PartCAD tool to readme for generating 3D CAD models using Ollama (#6605)
* Reduce docker image size (#5847)
* readme: add OllamaFarm project (#6508)
* readme: add go-crew and Ollamaclient projects (#6583)
* docs: update faq.md for OLLAMA_MODELS env var permissions (#6587)
* fix(cmd): show info may have nil ModelInfo (#6579)
* docs: update GGUF examples and references (#6577)
* Add findutils to base images (#6581)
* remove any unneeded build artifacts
* doc: Add Nix and Flox to package manager listing (#6074)
* update the openai docs to explain how to set the context size (#6548)
* fix(test): do not clobber models directory
* add llama3.1 chat template (#6545)
* update deprecated warnings
* validate model path
* throw an error when encountering unsupport tensor sizes (#6538)
* Move ollama executable out of bin dir (#6535)
* update templates to use messages
* more tokenizer tests
* add safetensors to the modelfile docs (#6532)
* Fix import image width (#6528)
* Update manual instructions with discrete ROCm bundle (#6445)
* llm: fix typo in comment (#6530)
* adjust image sizes
* clean up convert tokenizer
* detect chat template from configs that contain lists
* update the import docs (#6104)
* server: clean up route names for consistency (#6524)
* Only enable numa on CPUs (#6484)
* gpu: Group GPU Library sets by variant (#6483)
* update faq
* passthrough OLLAMA_HOST path to client
* convert safetensor adapters into GGUF (#6327)
* gpu: Ensure driver version set before variant (#6480)
* llm: Align cmake define for cuda no peer copy (#6455)
* Fix embeddings memory corruption (#6467)
* llama3.1
* convert gemma2
* create bert models from cli
* bert
* Split rocm back out of bundle (#6432)
* CI: remove directories from dist dir before upload step (#6429)
* CI: handle directories during checksum (#6427)
* Fix overlapping artifact name on CI
* Review comments
* Adjust layout to bin+lib/ollama
* Remove Jetpack
* Add windows cuda v12 + v11 support
* Enable cuda v12 flags
* Add cuda v12 variant and selection logic
* Report GPU variant in log
* Add Jetson cuda variants for arm
* Wire up ccache and pigz in the docker based build
* Refactor linux packaging
* server: limit upload parts to 16 (#6411)
* Fix white space.
* Reset NumCtx.
* Override numParallel only if unset.
* fix: chmod new layer to 0o644 when creating it
* fix: Add tooltip to system tray icon
* only skip invalid json manifests
* skip invalid manifest files
* fix noprune
* add `CONTRIBUTING.md` (#6349)
* Fix typo and improve readability (#5964)
* server: reduce max connections used in download (#6347)
* update chatml template format to latest in docs (#6344)
* lint
* Update openai.md to remove extra checkbox (#6345)
* llama3.1 memory
* Thu Aug 15 2024 Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.3.6:
* Fixed issue where /api/embed would return an error instead of
loading the model when the input field was not provided.
* ollama create can now import Phi-3 models from Safetensors
* Added progress information to ollama create when importing GGUF
files
* Ollama will now import GGUF files faster by minimizing file
copies
- Update to version 0.3.6:
* Fixed issue where temporary files would not be cleaned up
* Fix rare error when Ollama would start up due to invalid model
data
* Sun Aug 11 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>
- Update to version 0.3.4:
* New embedding models
- BGE-M3: a large embedding model from BAAI distinguished for
its versatility in Multi-Functionality, Multi-Linguality, and
Multi-Granularity.
- BGE-Large: a large embedding model trained in english.
- Paraphrase-Multilingual: A multilingual embedding model
trained on parallel data for 50+ languages.
* New embedding API with batch support
- Ollama now supports a new API endpoint /api/embed for
embedding generation:
* This API endpoint supports new features:
- Batches: generate embeddings for several documents in
one request
- Normalized embeddings: embeddings are now normalized,
improving similarity results
- Truncation: a new truncate parameter that will error if
set to false
- Metrics: responses include load_duration, total_duration and
prompt_eval_count metrics
* Sat Aug 03 2024 eyadlorenzo@gmail.com
- Update to version 0.3.3:
* The /api/embed endpoint now returns statistics: total_duration,
load_duration, and prompt_eval_count
* Added usage metrics to the /v1/embeddings OpenAI compatibility
API
* Fixed issue where /api/generate would respond with an empty
string if provided a context
* Fixed issue where /api/generate would return an incorrect
value for context
* /show modefile will now render MESSAGE commands correctly
- Update to version 0.3.2:
* Fixed issue where ollama pull would not resume download
progress
* Fixed issue where phi3 would report an error on older versions
* Tue Jul 30 2024 Adrian Schröter <adrian@suse.de>
- Update to version 0.3.1:
* Added support for min_p sampling option
* Lowered number of requests required when downloading models
with ollama pull
* ollama create will now autodetect required stop parameters
when importing certain models
* Fixed issue where /save would cause parameters to be saved
incorrectly.
* OpenAI-compatible API will now return a finish_reason of
tool_calls if a tool call occured.
* Mon Jul 29 2024 Adrian Schröter <adrian@suse.de>
- fix build on leap 15.6
- exclude builds on 32bit due to build failures
* Sun Jul 28 2024 Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.3.0:
* Ollama now supports tool calling with popular models such
as Llama 3.1. This enables a model to answer a given prompt
using tool(s) it knows about, making it possible for models to
perform more complex tasks or interact with the outside world.
* New models:
~ Llama 3.1
~ Mistral Large 2
~ Firefunction v2
~ Llama-3-Groq-Tool-Use
* Fixed duplicate error message when running ollama create
* Wed Jul 24 2024 adrian@suse.de
- Update to version 0.2.8:
* api embed docs (#5282)
* convert: capture `head_dim` for mistral (#5818)
* Update llama.cpp submodule commit to `d94c6e0c` (#5805)
* server: collect nested tool call objects when parsing (#5824)
* Remove no longer supported max vram var
* Refine error reporting for subprocess crash
* Remove out of space test temporarily (#5825)
* llm: consider `head_dim` in llama arch (#5817)
* Adjust windows ROCm discovery
* add patch for tekken (#5807)
* preserve last assistant message (#5802)
* Fix generate test flakyness (#5804)
* server: validate template (#5734)
* OpenAI: Function Based Testing (#5752)
* adjust openai chat msg processing (#5729)
* fix parsing tool calls
* server: check for empty tools array too (#5779)
* always provide content even if empty (#5778)
* server: only parse tool calls if tools are provided (#5771)
* Fix context exhaustion integration test for small gpus
* Refine scheduler unit tests for reliability