release/Intel-Core-Ultra-7-155H #1

eSlider · 2026-02-19T16:20:42Z

eSlider commented

2026-02-19 16:20:42 +00:00

Summary

Project Structure

Refactored into a clean <service>/Dockerfile + docker-compose.<service>.yml convention:

.
├── docker-compose.yml                # Main stack: Ollama (IPEX-LLM) + Open WebUI
├── docker-compose.sycl-ollama.yml    # SYCL-from-source Ollama + Open WebUI (alternative)
├── docker-compose.comfyui.yml        # ComfyUI image generation
├── docker-compose.sdnext.yml         # SD.Next image generation
├── docker-compose.whisper.yml        # OpenAI Whisper speech recognition
├── docker-compose.ramalama.yml       # RamaLama support
│
├── ipex-ollama/Dockerfile            # IPEX-LLM bundle build (Ollama v0.9.3, SYCL)
├── sycl-ollama/                      # SYCL-from-source build (Ollama v0.16.1)
│   ├── Dockerfile                    # Multi-stage: oneAPI build → minimal runtime
│   ├── patch-sycl.py                 # API compat patches (no-op since v0.16.1)
│   ├── start-ollama.sh               # Legacy entrypoint
│   └── test-glm-ocr.sh              # Vision model test script
│
├── docs/
│   ├── sycl-vs-vulkan.md             # SYCL vs Vulkan backend comparison
│   └── intel-arc-a770-context-limits.md  # VRAM & context length guide
└── ...

Custom Dockerfiles

ipex-ollama/Dockerfile — IPEX-LLM bundle-based image (Ollama v0.9.3):
- BuildKit # syntax=docker/dockerfile:1.4 with --mount=type=cache for apt and download caching
- ARG version pins for all Intel GPU runtime components (bump in one place)
- Latest runtimes: Level Zero v1.28.0, IGC v2.28.4, compute-runtime 26.05.37020.3, IPEX-LLM v2.3.0b20250725
- Clean ENV section — no duplicates, no unverified vars, NPU disabled by default
sycl-ollama/Dockerfile — SYCL-from-source image (Ollama v0.16.1):
- Multi-stage build: Stage 1 compiles ggml-sycl with Intel oneAPI icpx, Stage 2 is minimal runtime
- sycl-ollama/patch-sycl.py — backward-compatible API patching (no patches needed since v0.16.1 — APIs converged)
- GGML commit ec98e200 (llama.cpp tag b7437) matching Ollama v0.16.1
- Drops in libggml-sycl.so + stripped oneAPI runtime libs alongside official Ollama binary
- Includes test-glm-ocr.sh vision model test script

Docker Compose

docker-compose.yml (main stack):
- All env vars configurable via ${VAR:-default} syntax (override with .env or shell)
- shm_size: "16G" for SYCL/Level Zero shared memory (Docker defaults to 64 MB)
- no_proxy / NO_PROXY on all services — prevents corporate/system HTTP proxies from intercepting container-to-container traffic
- Intel GPU perf tuning (SYCL immediate command lists, SDP fusion, persistent cache, XeTLA)
- Ollama context/memory management (context length, KV cache quantization, flash attention)
- Detailed inline comments explaining each variable's impact
- Restored full open-webui service with OLLAMA_BASE_URL, RAG web search, telemetry opt-out
docker-compose.sycl-ollama.yml (alternative stack):
- Self-contained alternative using SYCL-from-source build (Ollama v0.16.1)
- Same env-var driven defaults, no_proxy, and Open WebUI config
- Shares ollama-volume so models persist when switching between stacks

Documentation

docs/sycl-vs-vulkan.md — SYCL vs Vulkan comparison:
- Performance benchmarks (SYCL 40–100% faster on Intel Arc)
- Three backend options: IPEX-LLM bundle, SYCL from source, upstream Vulkan
- Full Dockerfile snippets showing the multi-stage SYCL build
- How patch-sycl.py works (and why it's a no-op since v0.16.1)
- Step-by-step guide for updating to new Ollama versions
- Troubleshooting (device detection, ABI mismatch, OOM, kernel 6.18+ regression)
docs/intel-arc-a770-context-limits.md — VRAM & context guide:
- VRAM budget breakdown (weights + KV cache + overhead)
- Context length vs VRAM trade-off tables (f16 / q8_0 / q4_0)
- Recommended settings by model size (7B, 13B, 30B+) for 16 GB Arc
- Environment variables reference for all Ollama and Intel tuning knobs
- Building a custom image section with version pin table
README.md:
- Tested Hardware table, Documentation section, Project Structure tree
- SYCL-from-source setup command and validate output in Setup section
- Fixed broken Whisper markdown link

Build & test verification (2026-02-16)

IPEX-LLM stack (ipex-ollama)

docker build -t ipex-ollama:latest ./ipex-ollama/ — all layers build successfully
Ollama v0.9.3 starts and registers all API routes
Intel GPU detected at startup (using Intel GPU)
Server listens on 0.0.0.0:11434

SYCL-from-source stack (sycl-ollama)

docker compose -f docker-compose.sycl-ollama.yml build — all stages pass (~87s)
patch-sycl.py exits with code 0 — no patches needed (APIs converged in v0.16.1)
ggml-sycl compiled successfully → libggml-sycl.so built and stripped

Ollama v0.16.1 starts, discovers SYCL backend:

name=SYCL0 description="Intel(R) Arc(TM) Graphics" type=discrete total="28.0 GiB"

ollama list returns models from shared volume (7 models loaded)
Inference test passed: ollama run llama3.2:1b responded correctly using SYCL0 compute buffer (1074 MiB allocated)
Container cleans up properly with docker compose down

Test plan (remaining manual checks)

Build SYCL-from-source stack: docker compose -f docker-compose.sycl-ollama.yml up --build
Verify Ollama v0.16.1 responds
Run a model and confirm correct inference output
Confirm Intel GPU is used (SYCL0 device in logs)
Verify patch-sycl.py exits with code 0 during build
Run the IPEX-LLM stack: docker compose up -d
Verify Open-WebUI loads at http://localhost:4040 and connects to Ollama
Test with custom env overrides (e.g. OLLAMA_CONTEXT_LENGTH=8192 in .env)
Verify no_proxy prevents proxy interference on corporate networks

See https://github.com/eleiton/ollama-intel-arc/pull/38

## Summary ### Project Structure Refactored into a clean `<service>/Dockerfile` + `docker-compose.<service>.yml` convention: ``` . ├── docker-compose.yml # Main stack: Ollama (IPEX-LLM) + Open WebUI ├── docker-compose.sycl-ollama.yml # SYCL-from-source Ollama + Open WebUI (alternative) ├── docker-compose.comfyui.yml # ComfyUI image generation ├── docker-compose.sdnext.yml # SD.Next image generation ├── docker-compose.whisper.yml # OpenAI Whisper speech recognition ├── docker-compose.ramalama.yml # RamaLama support │ ├── ipex-ollama/Dockerfile # IPEX-LLM bundle build (Ollama v0.9.3, SYCL) ├── sycl-ollama/ # SYCL-from-source build (Ollama v0.16.1) │ ├── Dockerfile # Multi-stage: oneAPI build → minimal runtime │ ├── patch-sycl.py # API compat patches (no-op since v0.16.1) │ ├── start-ollama.sh # Legacy entrypoint │ └── test-glm-ocr.sh # Vision model test script │ ├── docs/ │ ├── sycl-vs-vulkan.md # SYCL vs Vulkan backend comparison │ └── intel-arc-a770-context-limits.md # VRAM & context length guide └── ... ``` ### Custom Dockerfiles - **[`ipex-ollama/Dockerfile`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/ipex-ollama/Dockerfile)** — IPEX-LLM bundle-based image (Ollama v0.9.3): - BuildKit `# syntax=docker/dockerfile:1.4` with `--mount=type=cache` for apt and download caching - `ARG` version pins for all Intel GPU runtime components (bump in one place) - Latest runtimes: Level Zero v1.28.0, IGC v2.28.4, compute-runtime 26.05.37020.3, IPEX-LLM v2.3.0b20250725 - Clean ENV section — no duplicates, no unverified vars, NPU disabled by default - **[`sycl-ollama/Dockerfile`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/sycl-ollama/Dockerfile)** — SYCL-from-source image (**Ollama v0.16.1**): - Multi-stage build: Stage 1 compiles `ggml-sycl` with Intel oneAPI `icpx`, Stage 2 is minimal runtime - [`sycl-ollama/patch-sycl.py`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/sycl-ollama/patch-sycl.py) — backward-compatible API patching (no patches needed since v0.16.1 — APIs converged) - GGML commit `ec98e200` (llama.cpp tag b7437) matching Ollama v0.16.1 - Drops in `libggml-sycl.so` + stripped oneAPI runtime libs alongside official Ollama binary - Includes [`test-glm-ocr.sh`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/sycl-ollama/test-glm-ocr.sh) vision model test script ### Docker Compose - **[`docker-compose.yml`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/docker-compose.yml)** (main stack): - All env vars configurable via `${VAR:-default}` syntax (override with `.env` or shell) - `shm_size: "16G"` for SYCL/Level Zero shared memory (Docker defaults to 64 MB) - `no_proxy` / `NO_PROXY` on all services — prevents corporate/system HTTP proxies from intercepting container-to-container traffic - Intel GPU perf tuning (SYCL immediate command lists, SDP fusion, persistent cache, XeTLA) - Ollama context/memory management (context length, KV cache quantization, flash attention) - Detailed inline comments explaining each variable's impact - Restored full `open-webui` service with `OLLAMA_BASE_URL`, RAG web search, telemetry opt-out - **[`docker-compose.sycl-ollama.yml`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/docker-compose.sycl-ollama.yml)** (alternative stack): - Self-contained alternative using SYCL-from-source build (Ollama v0.16.1) - Same env-var driven defaults, `no_proxy`, and Open WebUI config - Shares `ollama-volume` so models persist when switching between stacks ### Documentation - **[`docs/sycl-vs-vulkan.md`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/docs/sycl-vs-vulkan.md)** — SYCL vs Vulkan comparison: - Performance benchmarks (SYCL 40–100% faster on Intel Arc) - Three backend options: IPEX-LLM bundle, SYCL from source, upstream Vulkan - Full Dockerfile snippets showing the multi-stage SYCL build - How `patch-sycl.py` works (and why it's a no-op since v0.16.1) - Step-by-step guide for updating to new Ollama versions - Troubleshooting (device detection, ABI mismatch, OOM, kernel 6.18+ regression) - **[`docs/intel-arc-a770-context-limits.md`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/docs/intel-arc-a770-context-limits.md)** — VRAM & context guide: - VRAM budget breakdown (weights + KV cache + overhead) - Context length vs VRAM trade-off tables (f16 / q8_0 / q4_0) - Recommended settings by model size (7B, 13B, 30B+) for 16 GB Arc - Environment variables reference for all Ollama and Intel tuning knobs - Building a custom image section with version pin table - **[`README.md`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/README.md)**: - Tested Hardware table, Documentation section, Project Structure tree - SYCL-from-source setup command and validate output in Setup section - Fixed broken Whisper markdown link ## Build & test verification (2026-02-16) ### IPEX-LLM stack (ipex-ollama) - [x] `docker build -t ipex-ollama:latest ./ipex-ollama/` — all layers build successfully - [x] Ollama **v0.9.3** starts and registers all API routes - [x] Intel GPU detected at startup (`using Intel GPU`) - [x] Server listens on `0.0.0.0:11434` ### SYCL-from-source stack (sycl-ollama) - [x] `docker compose -f docker-compose.sycl-ollama.yml build` — **all stages pass** (~87s) - [x] `patch-sycl.py` exits with code 0 — no patches needed (APIs converged in v0.16.1) - [x] `ggml-sycl` compiled successfully → `libggml-sycl.so` built and stripped - [x] Ollama **v0.16.1** starts, discovers SYCL backend: ``` name=SYCL0 description="Intel(R) Arc(TM) Graphics" type=discrete total="28.0 GiB" ``` - [x] `ollama list` returns models from shared volume (7 models loaded) - [x] **Inference test passed**: `ollama run llama3.2:1b` responded correctly using `SYCL0 compute buffer` (1074 MiB allocated) - [x] Container cleans up properly with `docker compose down` ## Test plan (remaining manual checks) - [x] Build SYCL-from-source stack: `docker compose -f docker-compose.sycl-ollama.yml up --build` - [x] Verify Ollama v0.16.1 responds - [x] Run a model and confirm correct inference output - [x] Confirm Intel GPU is used (SYCL0 device in logs) - [x] Verify `patch-sycl.py` exits with code 0 during build - [ ] Run the IPEX-LLM stack: `docker compose up -d` - [ ] Verify Open-WebUI loads at http://localhost:4040 and connects to Ollama - [ ] Test with custom env overrides (e.g. `OLLAMA_CONTEXT_LENGTH=8192` in `.env`) - [ ] Verify `no_proxy` prevents proxy interference on corporate networks See https://github.com/eleiton/ollama-intel-arc/pull/38

Build and push Docker image / build-and-push (push) Failing after 3m46s

Details

Build and push Docker image / build-and-push (pull_request) Failing after 29s

Details

This pull request can be merged automatically.

This branch is out-of-date with the base branch

You are not authorized to merge this pull request.

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin release/Intel-Core-Ultra-7-155H:release/Intel-Core-Ultra-7-155H

git checkout release/Intel-Core-Ultra-7-155H

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: eSlider/ollama-intel-gpu#1