Rewrite README with clear value proposition, architecture diagram, troubleshooting section, and streamlined structure. Update CHANGELOG to reflect full history of Vulkan-to-SYCL migration. Co-authored-by: Cursor <cursoragent@cursor.com>
1.3 KiB
1.3 KiB
Changelog
2026-02-12 — Switch to SYCL backend
GPU backend: Vulkan -> SYCL
- Replaced Vulkan GPU backend with custom-built SYCL backend for ~2x inference speed on Intel GPUs
- Multi-stage Dockerfile: builds
libggml-sycl.sofrom upstream llama.cpp (commita5bb8ba4) using Intel oneAPI 2025.1.1 - Added
patch-sycl.pyto fix two ollama-specific API divergences:graph_computesignature (int batch_sizeparameter)GGML_TENSOR_FLAG_COMPUTEremoval (critical — without this patch all compute nodes are skipped, producing garbage output)
- Bundled oneAPI runtime libraries (SYCL, oneMKL, oneDNN, TBB, Level-Zero) into the runtime image
Ollama upgrade: 0.9.3 -> 0.15.6
- Upgraded from IPEX-LLM bundled ollama 0.9.3 to official ollama v0.15.6
- Switched from IPEX-LLM portable zip to official ollama binary
- Removed CUDA/MLX/Vulkan runners from image to reduce size
Intel GPU runtime stack
- level-zero: v1.22.4 -> v1.28.0
- intel-graphics-compiler (IGC): v2.11.7 -> v2.28.4
- compute-runtime: 25.18.33578.6 -> 26.05.37020.3
- libigdgmm: 22.7.0 -> 22.9.0
Docker Compose
- Device mapping changed to full
/dev/driaccess for SYCL/Level-Zero - Added
ONEAPI_DEVICE_SELECTOR=level_zero:0andZES_ENABLE_SYSMAN=1 - Removed
OLLAMA_VULKAN=1 - Disabled web UI authentication (
WEBUI_AUTH=False)