ollama-intel-gpu/CHANGELOG.md

# Changelog

## 2026-02-12 — Switch to SYCL backend

### GPU backend: Vulkan -> SYCL

- Replaced Vulkan GPU backend with custom-built SYCL backend for ~2x inference
  speed on Intel GPUs
- Multi-stage Dockerfile: builds `libggml-sycl.so` from upstream llama.cpp
  (commit `a5bb8ba4`) using Intel oneAPI 2025.1.1
- Added `patch-sycl.py` to fix two ollama-specific API divergences:
  - `graph_compute` signature (`int batch_size` parameter)
  - `GGML_TENSOR_FLAG_COMPUTE` removal (critical — without this patch all
    compute nodes are skipped, producing garbage output)
- Bundled oneAPI runtime libraries (SYCL, oneMKL, oneDNN, TBB, Level-Zero)
  into the runtime image

### Ollama upgrade: 0.9.3 -> 0.15.6

- Upgraded from IPEX-LLM bundled ollama 0.9.3 to official ollama v0.15.6
- Switched from IPEX-LLM portable zip to official ollama binary
- Removed CUDA/MLX/Vulkan runners from image to reduce size

### Intel GPU runtime stack

- **level-zero**: v1.22.4 -> v1.28.0
- **intel-graphics-compiler (IGC)**: v2.11.7 -> v2.28.4
- **compute-runtime**: 25.18.33578.6 -> 26.05.37020.3
- **libigdgmm**: 22.7.0 -> 22.9.0

### Docker Compose

- Device mapping changed to full `/dev/dri` access for SYCL/Level-Zero
- Added `ONEAPI_DEVICE_SELECTOR=level_zero:0` and `ZES_ENABLE_SYSMAN=1`
- Removed `OLLAMA_VULKAN=1`
- Disabled web UI authentication (`WEBUI_AUTH=False`)