ollama-intel-gpu/docker-compose.yml at 971852d3aff915c03896f44ba2cbe2384f737a20

Files

Andriy Oblivantsev c56646e7e7 Switch GPU backend from Vulkan to SYCL for ~2x inference performance on Intel GPUs

Build ggml-sycl from upstream llama.cpp (commit a5bb8ba4, matching ollama's
vendored ggml) using Intel oneAPI 2025.1.1 in a multi-stage Docker build.
Patch two ollama-specific API divergences via patch-sycl.py: added batch_size
parameter to graph_compute, removed GGML_TENSOR_FLAG_COMPUTE skip-check that
caused all compute nodes to be bypassed.

Tested: gemma3:1b — 27/27 layers on GPU, 10.2 tok/s gen, 65.3 tok/s prompt eval.
Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-12 17:28:23 +00:00

1.6 KiB

Raw Blame History

View Raw

1.6 KiB Raw Blame History

1.6 KiB

Raw Blame History