Commit Graph

25 Commits

Author SHA1 Message Date
c56646e7e7 Switch GPU backend from Vulkan to SYCL for ~2x inference performance on Intel GPUs
Build ggml-sycl from upstream llama.cpp (commit a5bb8ba4, matching ollama's
vendored ggml) using Intel oneAPI 2025.1.1 in a multi-stage Docker build.
Patch two ollama-specific API divergences via patch-sycl.py: added batch_size
parameter to graph_compute, removed GGML_TENSOR_FLAG_COMPUTE skip-check that
caused all compute nodes to be bypassed.

Tested: gemma3:1b — 27/27 layers on GPU, 10.2 tok/s gen, 65.3 tok/s prompt eval.
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 17:28:23 +00:00
63c3b81292 Upgrade ollama from 0.9.3 (IPEX-LLM) to 0.15.6 (official) with Vulkan Intel GPU
Replace the IPEX-LLM portable zip (bundling a patched ollama 0.9.3 with SYCL)
with the official ollama 0.15.6 release using the Vulkan backend for Intel GPU
acceleration. The official ollama project does not ship a SYCL backend; Vulkan
is their supported path for Intel GPUs.

- Use official ollama binary with Vulkan runner (OLLAMA_VULKAN=1)
- Strip CUDA/MLX runners from image to save space
- Add mesa-vulkan-drivers for Intel ANV Vulkan ICD
- Remove all IPEX-LLM env vars and wrapper scripts
- Simplify entrypoint to /usr/bin/ollama serve directly
- Clean up docker-compose.yml: remove IPEX build args and env vars

Tested: Intel Arc Graphics (MTL) detected, 17/17 layers offloaded to Vulkan0
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:34:03 +00:00
96913a2a18 Update Intel GPU stack and ipex-llm to latest available versions
- level-zero v1.22.4 -> v1.28.0
- IGC v2.11.7 -> v2.28.4
- compute-runtime 25.18.33578.6 -> 26.05.37020.3
- libigdgmm 22.7.0 -> 22.9.0
- ipex-llm ollama nightly 2.3.0b20250612 -> 2.3.0b20250725
- Docker compose: disable webui auth, stateless webui volume
- README formatting and GPU model update

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:00:53 +00:00
1239010eec Clean up Dockerfile by adding autoremove and autoclean commands to reduce image size 2025-06-21 00:37:38 +01:00
0a7f974c04 Update Docker configurations and Intel GPU runtimes for improved performance 2025-06-21 00:35:20 +01:00
17592946fa Update Docker configurations for deployment improvements
Revised `IPEXLLM_RELEASE_REPO` value and adjusted file and path references for consistency. Updated `docker-compose.yml` with refined environment variables, device mapping, restart policies, and added necessary port bindings for better functionality and maintainability.
2025-04-22 17:56:04 +01:00
Matt Curfman
8172339ca5 Merge pull request #54 from blebo/update-ipex-v2.2.0
Update default to ipex-llm v2.2.0 (guide for v2.3.0-nightly in docs)
2025-04-19 17:00:06 -07:00
Charles Ng
f1bbedb599 Update Intel libraries 2025-04-18 01:51:10 +00:00
Adam Gibson
504a1d388f Update default to ipex-llm v2.2.0 (guide for v2.3.0-nightly in docs) 2025-04-16 21:25:38 +08:00
Adam Gibson
1e92fbe888 Updates to allow latest ollama in compose file, with fallback to cached in Dockerfile (if no build args provided) 2025-03-16 16:47:45 +08:00
Adam Gibson
e1da4a4d16 Allow for user choice of ollama portable zip at build time 2025-03-16 15:37:06 +08:00
Matt Curfman
85e28fca19 Cache link 2025-02-22 20:27:34 -08:00
Matt Curfman
dd84c202a7 Minor fixes 2025-02-19 15:00:46 -08:00
Matt Curfman
2fc526511f Update to use new ipex portable .zip packages 2025-02-19 14:56:56 -08:00
Matt Curfman
6df0d8d3cc Update ipex-llm image from Intel to 2.2.0-SNAPSHOT 2025-01-16 22:00:31 -08:00
Matt Curfman
c74f6f2216 Update Dockerfile to use Intel public ipex container 2024-11-14 13:45:57 -08:00
Matt Curfman
db0efa7eaf Fix the ambiguous intel-basekit package with specific version of oneAPI linked to by ipex-llm 2024-10-25 21:21:31 -07:00
6a39438666 Fix mess generation bug by updating libraries
* Level Zero 1.18.3
* IGC 1.0.17791
* Compute 24.39.31294.12

https://github.com/mattcurf/ollama-intel-gpu
2024-10-24 15:59:10 +01:00
Matt Curfman
40313a7364 Revert to ipex-llm version of ollama for gpu acceleration 2024-08-16 22:41:15 -07:00
Matt Curfman
dadaf4000c Update compute-runtine, oneAPI, and now use upstream ollama 2024-07-31 21:45:50 -07:00
Matt Curfman
164881bc71 Cleanups 2024-05-23 20:59:10 -07:00
Matt Curfman
e2f4a81fef Install GPU runtime from git repo 2024-05-10 19:29:28 -07:00
mattcurf
e3c4f50b1f Workaround for Kernel 6.8 regression 2024-05-05 12:48:38 -07:00
mattcurf
025b1b0fc9 Misc. cleanups 2024-04-30 14:38:23 -07:00
mattcurf
2daa02e8f4 Initial version 2024-04-29 17:19:07 -07:00