ollama-intel-gpu/patch-sycl.py at c56646e7e7192372c9e5b923c8d38c802a1bdd2f

Files

Andriy Oblivantsev c56646e7e7 Switch GPU backend from Vulkan to SYCL for ~2x inference performance on Intel GPUs

Build ggml-sycl from upstream llama.cpp (commit a5bb8ba4, matching ollama's
vendored ggml) using Intel oneAPI 2025.1.1 in a multi-stage Docker build.
Patch two ollama-specific API divergences via patch-sycl.py: added batch_size
parameter to graph_compute, removed GGML_TENSOR_FLAG_COMPUTE skip-check that
caused all compute nodes to be bypassed.

Tested: gemma3:1b — 27/27 layers on GPU, 10.2 tok/s gen, 65.3 tok/s prompt eval.
Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-12 17:28:23 +00:00

2.2 KiB

Raw Blame History

View Raw

2.2 KiB Raw Blame History

2.2 KiB

Raw Blame History