Files
ollama-intel-gpu/CHANGELOG.md
Andriy Oblivantsev 971852d3af Rework README for better GitHub presentation
Rewrite README with clear value proposition, architecture diagram,
troubleshooting section, and streamlined structure. Update CHANGELOG
to reflect full history of Vulkan-to-SYCL migration.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 17:31:33 +00:00

37 lines
1.3 KiB
Markdown

# Changelog
## 2026-02-12 — Switch to SYCL backend
### GPU backend: Vulkan -> SYCL
- Replaced Vulkan GPU backend with custom-built SYCL backend for ~2x inference
speed on Intel GPUs
- Multi-stage Dockerfile: builds `libggml-sycl.so` from upstream llama.cpp
(commit `a5bb8ba4`) using Intel oneAPI 2025.1.1
- Added `patch-sycl.py` to fix two ollama-specific API divergences:
- `graph_compute` signature (`int batch_size` parameter)
- `GGML_TENSOR_FLAG_COMPUTE` removal (critical — without this patch all
compute nodes are skipped, producing garbage output)
- Bundled oneAPI runtime libraries (SYCL, oneMKL, oneDNN, TBB, Level-Zero)
into the runtime image
### Ollama upgrade: 0.9.3 -> 0.15.6
- Upgraded from IPEX-LLM bundled ollama 0.9.3 to official ollama v0.15.6
- Switched from IPEX-LLM portable zip to official ollama binary
- Removed CUDA/MLX/Vulkan runners from image to reduce size
### Intel GPU runtime stack
- **level-zero**: v1.22.4 -> v1.28.0
- **intel-graphics-compiler (IGC)**: v2.11.7 -> v2.28.4
- **compute-runtime**: 25.18.33578.6 -> 26.05.37020.3
- **libigdgmm**: 22.7.0 -> 22.9.0
### Docker Compose
- Device mapping changed to full `/dev/dri` access for SYCL/Level-Zero
- Added `ONEAPI_DEVICE_SELECTOR=level_zero:0` and `ZES_ENABLE_SYSMAN=1`
- Removed `OLLAMA_VULKAN=1`
- Disabled web UI authentication (`WEBUI_AUTH=False`)