release/Intel-Core-Ultra-7-155H #1

Open
eSlider wants to merge 0 commits from release/Intel-Core-Ultra-7-155H into main
Owner

Summary

Project Structure

Refactored into a clean <service>/Dockerfile + docker-compose.<service>.yml convention:

.
├── docker-compose.yml                # Main stack: Ollama (IPEX-LLM) + Open WebUI
├── docker-compose.sycl-ollama.yml    # SYCL-from-source Ollama + Open WebUI (alternative)
├── docker-compose.comfyui.yml        # ComfyUI image generation
├── docker-compose.sdnext.yml         # SD.Next image generation
├── docker-compose.whisper.yml        # OpenAI Whisper speech recognition
├── docker-compose.ramalama.yml       # RamaLama support
│
├── ipex-ollama/Dockerfile            # IPEX-LLM bundle build (Ollama v0.9.3, SYCL)
├── sycl-ollama/                      # SYCL-from-source build (Ollama v0.16.1)
│   ├── Dockerfile                    # Multi-stage: oneAPI build → minimal runtime
│   ├── patch-sycl.py                 # API compat patches (no-op since v0.16.1)
│   ├── start-ollama.sh               # Legacy entrypoint
│   └── test-glm-ocr.sh              # Vision model test script
│
├── docs/
│   ├── sycl-vs-vulkan.md             # SYCL vs Vulkan backend comparison
│   └── intel-arc-a770-context-limits.md  # VRAM & context length guide
└── ...

Custom Dockerfiles

  • ipex-ollama/Dockerfile — IPEX-LLM bundle-based image (Ollama v0.9.3):

    • BuildKit # syntax=docker/dockerfile:1.4 with --mount=type=cache for apt and download caching
    • ARG version pins for all Intel GPU runtime components (bump in one place)
    • Latest runtimes: Level Zero v1.28.0, IGC v2.28.4, compute-runtime 26.05.37020.3, IPEX-LLM v2.3.0b20250725
    • Clean ENV section — no duplicates, no unverified vars, NPU disabled by default
  • sycl-ollama/Dockerfile — SYCL-from-source image (Ollama v0.16.1):

    • Multi-stage build: Stage 1 compiles ggml-sycl with Intel oneAPI icpx, Stage 2 is minimal runtime
    • sycl-ollama/patch-sycl.py — backward-compatible API patching (no patches needed since v0.16.1 — APIs converged)
    • GGML commit ec98e200 (llama.cpp tag b7437) matching Ollama v0.16.1
    • Drops in libggml-sycl.so + stripped oneAPI runtime libs alongside official Ollama binary
    • Includes test-glm-ocr.sh vision model test script

Docker Compose

  • docker-compose.yml (main stack):

    • All env vars configurable via ${VAR:-default} syntax (override with .env or shell)
    • shm_size: "16G" for SYCL/Level Zero shared memory (Docker defaults to 64 MB)
    • no_proxy / NO_PROXY on all services — prevents corporate/system HTTP proxies from intercepting container-to-container traffic
    • Intel GPU perf tuning (SYCL immediate command lists, SDP fusion, persistent cache, XeTLA)
    • Ollama context/memory management (context length, KV cache quantization, flash attention)
    • Detailed inline comments explaining each variable's impact
    • Restored full open-webui service with OLLAMA_BASE_URL, RAG web search, telemetry opt-out
  • docker-compose.sycl-ollama.yml (alternative stack):

    • Self-contained alternative using SYCL-from-source build (Ollama v0.16.1)
    • Same env-var driven defaults, no_proxy, and Open WebUI config
    • Shares ollama-volume so models persist when switching between stacks

Documentation

  • docs/sycl-vs-vulkan.md — SYCL vs Vulkan comparison:

    • Performance benchmarks (SYCL 40–100% faster on Intel Arc)
    • Three backend options: IPEX-LLM bundle, SYCL from source, upstream Vulkan
    • Full Dockerfile snippets showing the multi-stage SYCL build
    • How patch-sycl.py works (and why it's a no-op since v0.16.1)
    • Step-by-step guide for updating to new Ollama versions
    • Troubleshooting (device detection, ABI mismatch, OOM, kernel 6.18+ regression)
  • docs/intel-arc-a770-context-limits.md — VRAM & context guide:

    • VRAM budget breakdown (weights + KV cache + overhead)
    • Context length vs VRAM trade-off tables (f16 / q8_0 / q4_0)
    • Recommended settings by model size (7B, 13B, 30B+) for 16 GB Arc
    • Environment variables reference for all Ollama and Intel tuning knobs
    • Building a custom image section with version pin table
  • README.md:

    • Tested Hardware table, Documentation section, Project Structure tree
    • SYCL-from-source setup command and validate output in Setup section
    • Fixed broken Whisper markdown link

Build & test verification (2026-02-16)

IPEX-LLM stack (ipex-ollama)

  • docker build -t ipex-ollama:latest ./ipex-ollama/ — all layers build successfully
  • Ollama v0.9.3 starts and registers all API routes
  • Intel GPU detected at startup (using Intel GPU)
  • Server listens on 0.0.0.0:11434

SYCL-from-source stack (sycl-ollama)

  • docker compose -f docker-compose.sycl-ollama.yml buildall stages pass (~87s)
  • patch-sycl.py exits with code 0 — no patches needed (APIs converged in v0.16.1)
  • ggml-sycl compiled successfully → libggml-sycl.so built and stripped
  • Ollama v0.16.1 starts, discovers SYCL backend:
    name=SYCL0 description="Intel(R) Arc(TM) Graphics" type=discrete total="28.0 GiB"
    
  • ollama list returns models from shared volume (7 models loaded)
  • Inference test passed: ollama run llama3.2:1b responded correctly using SYCL0 compute buffer (1074 MiB allocated)
  • Container cleans up properly with docker compose down

Test plan (remaining manual checks)

  • Build SYCL-from-source stack: docker compose -f docker-compose.sycl-ollama.yml up --build
  • Verify Ollama v0.16.1 responds
  • Run a model and confirm correct inference output
  • Confirm Intel GPU is used (SYCL0 device in logs)
  • Verify patch-sycl.py exits with code 0 during build
  • Run the IPEX-LLM stack: docker compose up -d
  • Verify Open-WebUI loads at http://localhost:4040 and connects to Ollama
  • Test with custom env overrides (e.g. OLLAMA_CONTEXT_LENGTH=8192 in .env)
  • Verify no_proxy prevents proxy interference on corporate networks

See https://github.com/eleiton/ollama-intel-arc/pull/38

## Summary ### Project Structure Refactored into a clean `<service>/Dockerfile` + `docker-compose.<service>.yml` convention: ``` . ├── docker-compose.yml # Main stack: Ollama (IPEX-LLM) + Open WebUI ├── docker-compose.sycl-ollama.yml # SYCL-from-source Ollama + Open WebUI (alternative) ├── docker-compose.comfyui.yml # ComfyUI image generation ├── docker-compose.sdnext.yml # SD.Next image generation ├── docker-compose.whisper.yml # OpenAI Whisper speech recognition ├── docker-compose.ramalama.yml # RamaLama support │ ├── ipex-ollama/Dockerfile # IPEX-LLM bundle build (Ollama v0.9.3, SYCL) ├── sycl-ollama/ # SYCL-from-source build (Ollama v0.16.1) │ ├── Dockerfile # Multi-stage: oneAPI build → minimal runtime │ ├── patch-sycl.py # API compat patches (no-op since v0.16.1) │ ├── start-ollama.sh # Legacy entrypoint │ └── test-glm-ocr.sh # Vision model test script │ ├── docs/ │ ├── sycl-vs-vulkan.md # SYCL vs Vulkan backend comparison │ └── intel-arc-a770-context-limits.md # VRAM & context length guide └── ... ``` ### Custom Dockerfiles - **[`ipex-ollama/Dockerfile`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/ipex-ollama/Dockerfile)** — IPEX-LLM bundle-based image (Ollama v0.9.3): - BuildKit `# syntax=docker/dockerfile:1.4` with `--mount=type=cache` for apt and download caching - `ARG` version pins for all Intel GPU runtime components (bump in one place) - Latest runtimes: Level Zero v1.28.0, IGC v2.28.4, compute-runtime 26.05.37020.3, IPEX-LLM v2.3.0b20250725 - Clean ENV section — no duplicates, no unverified vars, NPU disabled by default - **[`sycl-ollama/Dockerfile`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/sycl-ollama/Dockerfile)** — SYCL-from-source image (**Ollama v0.16.1**): - Multi-stage build: Stage 1 compiles `ggml-sycl` with Intel oneAPI `icpx`, Stage 2 is minimal runtime - [`sycl-ollama/patch-sycl.py`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/sycl-ollama/patch-sycl.py) — backward-compatible API patching (no patches needed since v0.16.1 — APIs converged) - GGML commit `ec98e200` (llama.cpp tag b7437) matching Ollama v0.16.1 - Drops in `libggml-sycl.so` + stripped oneAPI runtime libs alongside official Ollama binary - Includes [`test-glm-ocr.sh`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/sycl-ollama/test-glm-ocr.sh) vision model test script ### Docker Compose - **[`docker-compose.yml`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/docker-compose.yml)** (main stack): - All env vars configurable via `${VAR:-default}` syntax (override with `.env` or shell) - `shm_size: "16G"` for SYCL/Level Zero shared memory (Docker defaults to 64 MB) - `no_proxy` / `NO_PROXY` on all services — prevents corporate/system HTTP proxies from intercepting container-to-container traffic - Intel GPU perf tuning (SYCL immediate command lists, SDP fusion, persistent cache, XeTLA) - Ollama context/memory management (context length, KV cache quantization, flash attention) - Detailed inline comments explaining each variable's impact - Restored full `open-webui` service with `OLLAMA_BASE_URL`, RAG web search, telemetry opt-out - **[`docker-compose.sycl-ollama.yml`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/docker-compose.sycl-ollama.yml)** (alternative stack): - Self-contained alternative using SYCL-from-source build (Ollama v0.16.1) - Same env-var driven defaults, `no_proxy`, and Open WebUI config - Shares `ollama-volume` so models persist when switching between stacks ### Documentation - **[`docs/sycl-vs-vulkan.md`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/docs/sycl-vs-vulkan.md)** — SYCL vs Vulkan comparison: - Performance benchmarks (SYCL 40–100% faster on Intel Arc) - Three backend options: IPEX-LLM bundle, SYCL from source, upstream Vulkan - Full Dockerfile snippets showing the multi-stage SYCL build - How `patch-sycl.py` works (and why it's a no-op since v0.16.1) - Step-by-step guide for updating to new Ollama versions - Troubleshooting (device detection, ABI mismatch, OOM, kernel 6.18+ regression) - **[`docs/intel-arc-a770-context-limits.md`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/docs/intel-arc-a770-context-limits.md)** — VRAM & context guide: - VRAM budget breakdown (weights + KV cache + overhead) - Context length vs VRAM trade-off tables (f16 / q8_0 / q4_0) - Recommended settings by model size (7B, 13B, 30B+) for 16 GB Arc - Environment variables reference for all Ollama and Intel tuning knobs - Building a custom image section with version pin table - **[`README.md`](https://github.com/eSlider/ollama-intel-arc/blob/feat/ipex-ollama-dockerfile-and-config-tuning/README.md)**: - Tested Hardware table, Documentation section, Project Structure tree - SYCL-from-source setup command and validate output in Setup section - Fixed broken Whisper markdown link ## Build & test verification (2026-02-16) ### IPEX-LLM stack (ipex-ollama) - [x] `docker build -t ipex-ollama:latest ./ipex-ollama/` — all layers build successfully - [x] Ollama **v0.9.3** starts and registers all API routes - [x] Intel GPU detected at startup (`using Intel GPU`) - [x] Server listens on `0.0.0.0:11434` ### SYCL-from-source stack (sycl-ollama) - [x] `docker compose -f docker-compose.sycl-ollama.yml build` — **all stages pass** (~87s) - [x] `patch-sycl.py` exits with code 0 — no patches needed (APIs converged in v0.16.1) - [x] `ggml-sycl` compiled successfully → `libggml-sycl.so` built and stripped - [x] Ollama **v0.16.1** starts, discovers SYCL backend: ``` name=SYCL0 description="Intel(R) Arc(TM) Graphics" type=discrete total="28.0 GiB" ``` - [x] `ollama list` returns models from shared volume (7 models loaded) - [x] **Inference test passed**: `ollama run llama3.2:1b` responded correctly using `SYCL0 compute buffer` (1074 MiB allocated) - [x] Container cleans up properly with `docker compose down` ## Test plan (remaining manual checks) - [x] Build SYCL-from-source stack: `docker compose -f docker-compose.sycl-ollama.yml up --build` - [x] Verify Ollama v0.16.1 responds - [x] Run a model and confirm correct inference output - [x] Confirm Intel GPU is used (SYCL0 device in logs) - [x] Verify `patch-sycl.py` exits with code 0 during build - [ ] Run the IPEX-LLM stack: `docker compose up -d` - [ ] Verify Open-WebUI loads at http://localhost:4040 and connects to Ollama - [ ] Test with custom env overrides (e.g. `OLLAMA_CONTEXT_LENGTH=8192` in `.env`) - [ ] Verify `no_proxy` prevents proxy interference on corporate networks See https://github.com/eleiton/ollama-intel-arc/pull/38
Some checks are pending
Build and push Docker image / build-and-push (push) Failing after 3m46s
Build and push Docker image / build-and-push (pull_request) Failing after 29s
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin release/Intel-Core-Ultra-7-155H:release/Intel-Core-Ultra-7-155H
git checkout release/Intel-Core-Ultra-7-155H
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: eSlider/ollama-intel-gpu#1