Rewrite README with clear value proposition, architecture diagram, troubleshooting section, and streamlined structure. Update CHANGELOG to reflect full history of Vulkan-to-SYCL migration. Co-authored-by: Cursor <cursoragent@cursor.com>
37 lines
1.3 KiB
Markdown
37 lines
1.3 KiB
Markdown
# Changelog
|
|
|
|
## 2026-02-12 — Switch to SYCL backend
|
|
|
|
### GPU backend: Vulkan -> SYCL
|
|
|
|
- Replaced Vulkan GPU backend with custom-built SYCL backend for ~2x inference
|
|
speed on Intel GPUs
|
|
- Multi-stage Dockerfile: builds `libggml-sycl.so` from upstream llama.cpp
|
|
(commit `a5bb8ba4`) using Intel oneAPI 2025.1.1
|
|
- Added `patch-sycl.py` to fix two ollama-specific API divergences:
|
|
- `graph_compute` signature (`int batch_size` parameter)
|
|
- `GGML_TENSOR_FLAG_COMPUTE` removal (critical — without this patch all
|
|
compute nodes are skipped, producing garbage output)
|
|
- Bundled oneAPI runtime libraries (SYCL, oneMKL, oneDNN, TBB, Level-Zero)
|
|
into the runtime image
|
|
|
|
### Ollama upgrade: 0.9.3 -> 0.15.6
|
|
|
|
- Upgraded from IPEX-LLM bundled ollama 0.9.3 to official ollama v0.15.6
|
|
- Switched from IPEX-LLM portable zip to official ollama binary
|
|
- Removed CUDA/MLX/Vulkan runners from image to reduce size
|
|
|
|
### Intel GPU runtime stack
|
|
|
|
- **level-zero**: v1.22.4 -> v1.28.0
|
|
- **intel-graphics-compiler (IGC)**: v2.11.7 -> v2.28.4
|
|
- **compute-runtime**: 25.18.33578.6 -> 26.05.37020.3
|
|
- **libigdgmm**: 22.7.0 -> 22.9.0
|
|
|
|
### Docker Compose
|
|
|
|
- Device mapping changed to full `/dev/dri` access for SYCL/Level-Zero
|
|
- Added `ONEAPI_DEVICE_SELECTOR=level_zero:0` and `ZES_ENABLE_SYSMAN=1`
|
|
- Removed `OLLAMA_VULKAN=1`
|
|
- Disabled web UI authentication (`WEBUI_AUTH=False`)
|