Merge pull request #3 from eleiton/use-intel-docker-image

Use the Ollama docker image provided by Intel
This commit is contained in:
eleiton
2025-03-11 22:43:03 +01:00
committed by GitHub
3 changed files with 22 additions and 75 deletions

View File

@@ -1,53 +1,8 @@
FROM ubuntu:24.04 FROM intelanalytics/ipex-llm-inference-cpp-xpu:latest
ENV DEBIAN_FRONTEND=noninteractive ENV DEBIAN_FRONTEND=noninteractive
ENV PIP_BREAK_SYSTEM_PACKAGES=1
ENV OLLAMA_NUM_GPU=999
ENV OLLAMA_HOST=0.0.0.0:11434 ENV OLLAMA_HOST=0.0.0.0:11434
# Install base packages
RUN apt update && \
apt install --no-install-recommends -q -y \
wget \
gnupg \
ca-certificates \
python3-pip \
pkg-config \
build-essential \
python3-dev \
cmake
# Install Client GPUs
# Reference: https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop-24-04-lts
RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg && \
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu noble unified" | \
tee /etc/apt/sources.list.d/intel-gpu-noble.list && \
apt update && \
apt install -y libze-intel-gpu1 libze1 intel-opencl-icd clinfo intel-gsc && \
apt install -y libze-dev intel-ocloc && \
apt install --no-install-recommends -q -y \
udev \
level-zero \
libigdgmm12
# Install oneAPI Base Toolkit
# Reference: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?packages=oneapi-toolkit&oneapi-toolkit-os=linux&oneapi-lin=apt
RUN apt update && \
apt install -y gpg-agent wget && \
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | \
gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null && \
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | \
tee /etc/apt/sources.list.d/oneAPI.list && \
apt update && \
apt install -y intel-oneapi-base-toolkit
# Install serve.sh script
COPY ./scripts/serve.sh /usr/share/lib/serve.sh COPY ./scripts/serve.sh /usr/share/lib/serve.sh
# Install ipex-llm[cpp] using pip
# Reference: https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#1-install-ipex-llm-for-llamacpp
RUN pip install --pre --upgrade ipex-llm[cpp]
# Set entrypoint to run the serve.sh script
ENTRYPOINT ["/bin/bash", "/usr/share/lib/serve.sh"] ENTRYPOINT ["/bin/bash", "/usr/share/lib/serve.sh"]

View File

@@ -9,9 +9,9 @@ This repository provides a convenient way to run Ollama as a backend and Open We
## Services ## Services
1. Ollama 1. Ollama
* Runs llama.cpp and Ollama with IPEX-LLM on your Linux computer with Intel GPU. * Runs llama.cpp and Ollama with IPEX-LLM on your Linux computer with Intel Arc GPU.
* Built following the guidelines from [Intel](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md). * Built following the guidelines from [Intel](https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/README.md).
* Uses [Ubuntu 24.04 LTS](https://ubuntu.com/blog/tag/ubuntu-24-04-lts), Ubuntu's latest stable version, as a container. * Uses the official [Intel ipex-llm docker image](https://hub.docker.com/r/intelanalytics/ipex-llm-inference-cpp-xpu) as the base container.
* Uses the latest versions of required packages, prioritizing cutting-edge features over stability. * Uses the latest versions of required packages, prioritizing cutting-edge features over stability.
* Exposes port `11434` for connecting other tools to your Ollama service. * Exposes port `11434` for connecting other tools to your Ollama service.
@@ -29,7 +29,12 @@ $ podman compose up
``` ```
## Validate ## Validate
You should see this partial output in your console, indicating your arc gpu was detected Run the following command to verify your Ollama instance is up and running
```bash
$ curl http://localhost:11434/
Ollama is running
```
When using Open WebUI, you should see this partial output in your console, indicating your arc gpu was detected
```bash ```bash
[ollama-intel-arc] | Found 1 SYCL devices: [ollama-intel-arc] | Found 1 SYCL devices:
[ollama-intel-arc] | | | | | |Max | |Max |Global | | [ollama-intel-arc] | | | | | |Max | |Max |Global | |
@@ -38,13 +43,8 @@ You should see this partial output in your console, indicating your arc gpu was
[ollama-intel-arc] | |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------| [ollama-intel-arc] | |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
[ollama-intel-arc] | | 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 62400M| 1.6.32224+14| [ollama-intel-arc] | | 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 62400M| 1.6.32224+14|
``` ```
Run the following command to verify your Ollama instance is up and running
```bash
$ curl http://localhost:11434/
Ollama is running
```
## Usage ## Usage
* Run the services using the setup instructions above.
* Open your web browser to http://localhost:3000 to access the Open WebUI web page. * Open your web browser to http://localhost:3000 to access the Open WebUI web page.
* For more information on using Open WebUI, refer to the official documentation at https://docs.openwebui.com/ . * For more information on using Open WebUI, refer to the official documentation at https://docs.openwebui.com/ .
@@ -55,7 +55,7 @@ $ podman compose down
``` ```
### ollama-intel-arc Image ### ollama-intel-arc Image
If there are new updates in ipex-llm[cpp] or oneAPI or GPU drivers, you may want to update the Ollama image and containers, to stay updated. If there are new updates in the [ipex-llm docker image](https://hub.docker.com/r/intelanalytics/ipex-llm-inference-cpp-xpu), you may want to update the Ollama image and containers, to stay updated.
First check any containers running the docker image, and remove them First check any containers running the docker image, and remove them
```bash ```bash
@@ -89,16 +89,14 @@ You can connect directly to your Ollama container by running these commands:
```bash ```bash
$ podman exec -it ollama-intel-arc /bin/bash $ podman exec -it ollama-intel-arc /bin/bash
> source /opt/intel/oneapi/setvars.sh $ /llm/ollama/ollama -v
> /usr/local/lib/python3.12/dist-packages/bigdl/cpp/libs/ollama -v
``` ```
## My development environment: ## My development environment:
* Core Ultra 7 155H * Core Ultra 7 155H
* Intel® Arc™ Graphics (Meteor Lake-P) * Intel® Arc™ Graphics (Meteor Lake-P)
* Fedora 40 * Fedora 41
## References ## References
* [Intel guidelines for installing Linux GPU support](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/install_linux_gpu.md)
* [Open WebUI documentation](https://docs.openwebui.com/) * [Open WebUI documentation](https://docs.openwebui.com/)
* [Ollama Quickstart](https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md) * [Intel - ipex-llm](https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/docker_cpp_xpu_quickstart.md)

View File

@@ -1,15 +1,9 @@
#!/bin/sh #!/bin/sh
# Reference: https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md#3-run-ollama-serve cd /llm/scripts/
export OLLAMA_NUM_GPU=999 source ipex-llm-init --gpu --device Arc
export no_proxy=localhost,127.0.0.1
export ZES_ENABLE_SYSMAN=1
source /opt/intel/oneapi/setvars.sh mkdir -p /llm/ollama
export SYCL_CACHE_PERSISTENT=1 cd /llm/ollama
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation init-ollama
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ./ollama serve
# [optional] if you want to run on single GPU, use below command to limit GPU may improve performance
export ONEAPI_DEVICE_SELECTOR=level_zero:0
/usr/local/lib/python3.12/dist-packages/bigdl/cpp/libs/ollama serve