Files
ollama-intel-gpu/CHANGELOG.md
Andriy Oblivantsev 971852d3af Rework README for better GitHub presentation
Rewrite README with clear value proposition, architecture diagram,
troubleshooting section, and streamlined structure. Update CHANGELOG
to reflect full history of Vulkan-to-SYCL migration.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 17:31:33 +00:00

1.3 KiB

Changelog

2026-02-12 — Switch to SYCL backend

GPU backend: Vulkan -> SYCL

  • Replaced Vulkan GPU backend with custom-built SYCL backend for ~2x inference speed on Intel GPUs
  • Multi-stage Dockerfile: builds libggml-sycl.so from upstream llama.cpp (commit a5bb8ba4) using Intel oneAPI 2025.1.1
  • Added patch-sycl.py to fix two ollama-specific API divergences:
    • graph_compute signature (int batch_size parameter)
    • GGML_TENSOR_FLAG_COMPUTE removal (critical — without this patch all compute nodes are skipped, producing garbage output)
  • Bundled oneAPI runtime libraries (SYCL, oneMKL, oneDNN, TBB, Level-Zero) into the runtime image

Ollama upgrade: 0.9.3 -> 0.15.6

  • Upgraded from IPEX-LLM bundled ollama 0.9.3 to official ollama v0.15.6
  • Switched from IPEX-LLM portable zip to official ollama binary
  • Removed CUDA/MLX/Vulkan runners from image to reduce size

Intel GPU runtime stack

  • level-zero: v1.22.4 -> v1.28.0
  • intel-graphics-compiler (IGC): v2.11.7 -> v2.28.4
  • compute-runtime: 25.18.33578.6 -> 26.05.37020.3
  • libigdgmm: 22.7.0 -> 22.9.0

Docker Compose

  • Device mapping changed to full /dev/dri access for SYCL/Level-Zero
  • Added ONEAPI_DEVICE_SELECTOR=level_zero:0 and ZES_ENABLE_SYSMAN=1
  • Removed OLLAMA_VULKAN=1
  • Disabled web UI authentication (WEBUI_AUTH=False)