Rework README for better GitHub presentation

Rewrite README with clear value proposition, architecture diagram,
troubleshooting section, and streamlined structure. Update CHANGELOG
to reflect full history of Vulkan-to-SYCL migration.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
2026-02-12 17:31:33 +00:00
parent c56646e7e7
commit 971852d3af
2 changed files with 130 additions and 109 deletions
+23 -27
View File
@@ -1,40 +1,36 @@
# Changelog
## 2026-02-12
## 2026-02-12 — Switch to SYCL backend
### Fix: Ollama not reachable from host via Docker port mapping
### GPU backend: Vulkan -> SYCL
The bundled IPEX-LLM `/start-ollama.sh` entrypoint hardcodes
`OLLAMA_HOST='127.0.0.1:11434'` and `OLLAMA_KEEP_ALIVE=10m`, overriding any
values set through Docker Compose environment variables.
- Replaced Vulkan GPU backend with custom-built SYCL backend for ~2x inference
speed on Intel GPUs
- Multi-stage Dockerfile: builds `libggml-sycl.so` from upstream llama.cpp
(commit `a5bb8ba4`) using Intel oneAPI 2025.1.1
- Added `patch-sycl.py` to fix two ollama-specific API divergences:
- `graph_compute` signature (`int batch_size` parameter)
- `GGML_TENSOR_FLAG_COMPUTE` removal (critical — without this patch all
compute nodes are skipped, producing garbage output)
- Bundled oneAPI runtime libraries (SYCL, oneMKL, oneDNN, TBB, Level-Zero)
into the runtime image
- Added a custom `start-ollama.sh` that respects environment variables
(`${OLLAMA_HOST:-0.0.0.0:11434}`, `${OLLAMA_KEEP_ALIVE:-24h}`) instead of
hardcoding them
- Mounted the script into the container as a read-only volume
(`./start-ollama.sh:/start-ollama.sh:ro`)
- Fixed `LD_LIBRARY_PATH` env var syntax in docker-compose.yml (`:` -> `=`)
### Ollama upgrade: 0.9.3 -> 0.15.6
### Updated Intel GPU runtime stack to latest releases
- Upgraded from IPEX-LLM bundled ollama 0.9.3 to official ollama v0.15.6
- Switched from IPEX-LLM portable zip to official ollama binary
- Removed CUDA/MLX/Vulkan runners from image to reduce size
### Intel GPU runtime stack
- **level-zero**: v1.22.4 -> v1.28.0
- Loader based on oneAPI Level Zero Specification v1.15.31
- Memory leak fixes, expanded multidriver teardown support
- **intel-graphics-compiler (IGC)**: v2.11.7 (build 19146) -> v2.28.4 (build 20760)
- Built with LLVM 16.0.6, opaque pointers support
- **intel-graphics-compiler (IGC)**: v2.11.7 -> v2.28.4
- **compute-runtime**: 25.18.33578.6 -> 26.05.37020.3
- Built with IGC v2.28.4 and level-zero v1.27.0
- Panther Lake production support, Wildcat Lake pre-release
- **libigdgmm**: 22.7.0 -> 22.9.0
- **ipex-llm ollama** (nightly): 2.3.0b20250612 -> 2.3.0b20250725
- Latest available nightly Ubuntu ollama portable zip
### Docker Compose adjustments
### Docker Compose
- Disabled persistent webui volume for stateless restarts
- Device mapping changed to full `/dev/dri` access for SYCL/Level-Zero
- Added `ONEAPI_DEVICE_SELECTOR=level_zero:0` and `ZES_ENABLE_SYSMAN=1`
- Removed `OLLAMA_VULKAN=1`
- Disabled web UI authentication (`WEBUI_AUTH=False`)
### README
- Formatting and heading structure improvements
- Updated tested GPU model to Intel Core Ultra 5 155H