ollama-intel-gpu

Author	SHA1	Message	Date
Andriy Oblivantsev	c56646e7e7	Switch GPU backend from Vulkan to SYCL for ~2x inference performance on Intel GPUs Build ggml-sycl from upstream llama.cpp (commit a5bb8ba4, matching ollama's vendored ggml) using Intel oneAPI 2025.1.1 in a multi-stage Docker build. Patch two ollama-specific API divergences via patch-sycl.py: added batch_size parameter to graph_compute, removed GGML_TENSOR_FLAG_COMPUTE skip-check that caused all compute nodes to be bypassed. Tested: gemma3:1b — 27/27 layers on GPU, 10.2 tok/s gen, 65.3 tok/s prompt eval. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-12 17:28:23 +00:00
Andriy Oblivantsev	63c3b81292	Upgrade ollama from 0.9.3 (IPEX-LLM) to 0.15.6 (official) with Vulkan Intel GPU Replace the IPEX-LLM portable zip (bundling a patched ollama 0.9.3 with SYCL) with the official ollama 0.15.6 release using the Vulkan backend for Intel GPU acceleration. The official ollama project does not ship a SYCL backend; Vulkan is their supported path for Intel GPUs. - Use official ollama binary with Vulkan runner (OLLAMA_VULKAN=1) - Strip CUDA/MLX runners from image to save space - Add mesa-vulkan-drivers for Intel ANV Vulkan ICD - Remove all IPEX-LLM env vars and wrapper scripts - Simplify entrypoint to /usr/bin/ollama serve directly - Clean up docker-compose.yml: remove IPEX build args and env vars Tested: Intel Arc Graphics (MTL) detected, 17/17 layers offloaded to Vulkan0 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-12 15:34:03 +00:00
Andriy Oblivantsev	96913a2a18	Update Intel GPU stack and ipex-llm to latest available versions - level-zero v1.22.4 -> v1.28.0 - IGC v2.11.7 -> v2.28.4 - compute-runtime 25.18.33578.6 -> 26.05.37020.3 - libigdgmm 22.7.0 -> 22.9.0 - ipex-llm ollama nightly 2.3.0b20250612 -> 2.3.0b20250725 - Docker compose: disable webui auth, stateless webui volume - README formatting and GPU model update Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-12 15:00:53 +00:00
Andriy Oblivantsev	1239010eec	Clean up Dockerfile by adding autoremove and autoclean commands to reduce image size	2025-06-21 00:37:38 +01:00
Andriy Oblivantsev	0a7f974c04	Update Docker configurations and Intel GPU runtimes for improved performance	2025-06-21 00:35:20 +01:00
Andriy Oblivantsev	17592946fa	Update Docker configurations for deployment improvements Revised `IPEXLLM_RELEASE_REPO` value and adjusted file and path references for consistency. Updated `docker-compose.yml` with refined environment variables, device mapping, restart policies, and added necessary port bindings for better functionality and maintainability.	2025-04-22 17:56:04 +01:00
Matt Curfman	8172339ca5	Merge pull request #54 from blebo/update-ipex-v2.2.0 Update default to ipex-llm v2.2.0 (guide for v2.3.0-nightly in docs)	2025-04-19 17:00:06 -07:00
Charles Ng	f1bbedb599	Update Intel libraries	2025-04-18 01:51:10 +00:00
Adam Gibson	504a1d388f	Update default to ipex-llm v2.2.0 (guide for v2.3.0-nightly in docs)	2025-04-16 21:25:38 +08:00
Adam Gibson	1e92fbe888	Updates to allow latest ollama in compose file, with fallback to cached in Dockerfile (if no build args provided)	2025-03-16 16:47:45 +08:00
Adam Gibson	e1da4a4d16	Allow for user choice of ollama portable zip at build time	2025-03-16 15:37:06 +08:00
Matt Curfman	85e28fca19	Cache link	2025-02-22 20:27:34 -08:00
Matt Curfman	dd84c202a7	Minor fixes	2025-02-19 15:00:46 -08:00
Matt Curfman	2fc526511f	Update to use new ipex portable .zip packages	2025-02-19 14:56:56 -08:00
Matt Curfman	6df0d8d3cc	Update ipex-llm image from Intel to 2.2.0-SNAPSHOT	2025-01-16 22:00:31 -08:00
Matt Curfman	c74f6f2216	Update Dockerfile to use Intel public ipex container	2024-11-14 13:45:57 -08:00
Matt Curfman	db0efa7eaf	Fix the ambiguous intel-basekit package with specific version of oneAPI linked to by ipex-llm	2024-10-25 21:21:31 -07:00
Andriy Oblivantsev	6a39438666	Fix mess generation bug by updating libraries * Level Zero 1.18.3 * IGC 1.0.17791 * Compute 24.39.31294.12 https://github.com/mattcurf/ollama-intel-gpu	2024-10-24 15:59:10 +01:00
Matt Curfman	40313a7364	Revert to ipex-llm version of ollama for gpu acceleration	2024-08-16 22:41:15 -07:00
Matt Curfman	dadaf4000c	Update compute-runtine, oneAPI, and now use upstream ollama	2024-07-31 21:45:50 -07:00
Matt Curfman	164881bc71	Cleanups	2024-05-23 20:59:10 -07:00
Matt Curfman	e2f4a81fef	Install GPU runtime from git repo	2024-05-10 19:29:28 -07:00
mattcurf	e3c4f50b1f	Workaround for Kernel 6.8 regression	2024-05-05 12:48:38 -07:00
mattcurf	025b1b0fc9	Misc. cleanups	2024-04-30 14:38:23 -07:00
mattcurf	2daa02e8f4	Initial version	2024-04-29 17:19:07 -07:00

25 Commits