ollama-intel-gpu

eSlider/ollama-intel-gpu

Fork 0

Commit Graph

Author	SHA1	Message	Date
Andriy Oblivantsev	c56646e7e7	Switch GPU backend from Vulkan to SYCL for ~2x inference performance on Intel GPUs Build ggml-sycl from upstream llama.cpp (commit a5bb8ba4, matching ollama's vendored ggml) using Intel oneAPI 2025.1.1 in a multi-stage Docker build. Patch two ollama-specific API divergences via patch-sycl.py: added batch_size parameter to graph_compute, removed GGML_TENSOR_FLAG_COMPUTE skip-check that caused all compute nodes to be bypassed. Tested: gemma3:1b — 27/27 layers on GPU, 10.2 tok/s gen, 65.3 tok/s prompt eval. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-12 17:28:23 +00:00

Author

SHA1

Message

Date

Andriy Oblivantsev

c56646e7e7

Switch GPU backend from Vulkan to SYCL for ~2x inference performance on Intel GPUs

Build ggml-sycl from upstream llama.cpp (commit a5bb8ba4, matching ollama's
vendored ggml) using Intel oneAPI 2025.1.1 in a multi-stage Docker build.
Patch two ollama-specific API divergences via patch-sycl.py: added batch_size
parameter to graph_compute, removed GGML_TENSOR_FLAG_COMPUTE skip-check that
caused all compute nodes to be bypassed.

Tested: gemma3:1b — 27/27 layers on GPU, 10.2 tok/s gen, 65.3 tok/s prompt eval.
Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-12 17:28:23 +00:00

1 Commits