Adding OpenAI Whisper integration

2025-06-08 13:30:10 +02:00
parent 2ce7a0224a
commit 09a60eb520
3 changed files with 90 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -1,10 +1,14 @@
-# Run Ollama and Stable Diffusion with your Intel Arc GPU
+# Run Ollama, Stable Diffusion and Automatic Speech Recognition with your Intel Arc GPU
+
+[[Blog](https://blog.eleiton.dev/posts/llm-and-genai-in-docker/)]

 Effortlessly deploy a Docker-based solution that uses [Open WebUI](https://github.com/open-webui/open-webui) as your user-friendly 
 AI Interface and [Ollama](https://github.com/ollama/ollama) for integrating Large Language Models (LLM).

 Additionally, you can run [ComfyUI](https://github.com/comfyanonymous/ComfyUI) or [SD.Next](https://github.com/vladmandic/sdnext) docker containers to 
-streamline Stable Diffusion capabilities
+streamline Stable Diffusion capabilities.
+
+You can also run an optional docker container with [OpenAI Whisper](https://github.com/openai/whisper) to perform Automatic Speech Recognition (ASR) tasks.

 All these containers have been optimized for Intel Arc Series GPUs on Linux systems by using [Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch).

@@ -27,13 +31,17 @@ All these containers have been optimized for Intel Arc Series GPUs on Linux syst

 3. ComfyUI
   * The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
-   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu&version=v2.6.10%2Bxpu&os=linux%2Fwsl2&package=docker)
+   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)

 4. SD.Next
   * All-in-one for AI generative image based on Automatic1111
-   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu&version=v2.6.10%2Bxpu&os=linux%2Fwsl2&package=docker)
+   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)
   * Uses a customized version of the SD.Next [docker file](https://github.com/vladmandic/sdnext/blob/dev/configs/Dockerfile.ipex), making it compatible with the Intel Extension for Pytorch image.

+5. OpenAI Whisper
+   * Robust Speech Recognition via Large-Scale Weak Supervision
+   * Uses as the base container the official [Intel® Extension for PyTorch](* Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)
+
 ## Setup
 Run the following commands to start your Ollama instance with Open WebUI
 ```bash
@@ -54,6 +62,11 @@ For SD.Next
 $ podman compose -f docker-compose.sdnext.yml up
 ```

+If you want to run Whisper for automatic speech recognition, run this command in a different terminal:
+```bash
+$ podman compose -f docker-compose.whisper.yml up
+```
+
 ## Validate
 Run the following command to verify your Ollama instance is up and running
 ```bash
@@ -89,6 +102,45 @@ When using Open WebUI, you should see this partial output in your console, indic
 * That's it, go back to Open WebUI main page and start chatting.  Make sure to select the `Image` button to indicate you want to generate Images.
 ![screenshot](resources/open-webui-chat.png)

+## Using Automatic Speech Recognition
+* This is an example of a command to transcribe audio files:
+```bash
+  podman exec -it  whisper-ipex whisper https://www.lightbulblanguages.co.uk/resources/ge-audio/hobbies-ge.mp3 --device xpu --model small --language German --task transcribe
+```
+* Response:
+```bash
+  [00:00.000 --> 00:08.000]  Ich habe viele Hobbys. In meiner Freizeit mache ich sehr gerne Sport, wie zum Beispiel Wasserball oder Radfahren.
+  [00:08.000 --> 00:13.000]  Außerdem lese ich gerne und lerne auch gerne Fremdsprachen.
+  [00:13.000 --> 00:19.000]  Ich gehe gerne ins Kino, höre gerne Musik und treffe mich mit meinen Freunden.
+  [00:19.000 --> 00:22.000]  Früher habe ich auch viel Basketball gespielt.
+  [00:22.000 --> 00:26.000]  Im Frühling und im Sommer werde ich viele Radtouren machen.
+  [00:26.000 --> 00:29.000]  Außerdem werde ich viel schwimmen gehen.
+  [00:29.000 --> 00:33.000]  Am liebsten würde ich das natürlich im Meer machen.
+```
+* This is an example of a command to translate audio files:
+```bash
+  podman exec -it  whisper-ipex whisper https://www.lightbulblanguages.co.uk/resources/ge-audio/hobbies-ge.mp3 --device xpu --model small --language German --task translate
+```
+* Response:
+```bash
+  [00:00.000 --> 00:02.000]  I have a lot of hobbies.
+  [00:02.000 --> 00:05.000]  In my free time I like to do sports,
+  [00:05.000 --> 00:08.000]  such as water ball or cycling.
+  [00:08.000 --> 00:10.000]  Besides, I like to read
+  [00:10.000 --> 00:13.000]  and also like to learn foreign languages.
+  [00:13.000 --> 00:15.000]  I like to go to the cinema,
+  [00:15.000 --> 00:16.000]  like to listen to music
+  [00:16.000 --> 00:19.000]  and meet my friends.
+  [00:19.000 --> 00:22.000]  I used to play a lot of basketball.
+  [00:22.000 --> 00:26.000]  In spring and summer I will do a lot of cycling tours.
+  [00:26.000 --> 00:29.000]  Besides, I will go swimming a lot.
+  [00:29.000 --> 00:33.000]  Of course, I would prefer to do this in the sea.
+```
+* To use your own audio files instead of web files, place them in the `~/whisper-files` folder and access them like this:
+```bash
+  podman exec -it  whisper-ipex whisper YOUR_FILE_NAME.mp3 --device xpu --model small --task translate
+```
+
 ## Updating the containers
 If there are new updates in the [ipex-llm-inference-cpp-xpu](https://hub.docker.com/r/intelanalytics/ipex-llm-inference-cpp-xpu) docker Image or in the Open WebUI docker Image, you may want to update your containers, to stay up to date.

--- a/docker-compose.whisper.yml
+++ b/docker-compose.whisper.yml
@@ -0,0 +1,18 @@
+version: '3'
+
+services:
+  whisper-ipex:
+    build:
+      context: whisper
+      dockerfile: Dockerfile
+    image: whisper-ipex:latest
+    container_name: whisper-ipex
+    restart: unless-stopped
+    devices:
+      - /dev/dri:/dev/dri
+    volumes:
+      - whisper-models-volume:/root/.cache/whisper
+      - ~/whisper-files:/app
+
+volumes:
+  whisper-models-volume: {}
--- a/whisper/Dockerfile
+++ b/whisper/Dockerfile
@@ -0,0 +1,16 @@
+FROM intel/intel-extension-for-pytorch:2.7.10-xpu
+
+ENV USE_XETLA=OFF
+ENV SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+ENV SYCL_CACHE_PERSISTENT=1
+
+# Install required packages
+RUN apt-get update && apt-get install -y ffmpeg
+
+# Download the Whisper repository
+RUN pip install --upgrade pip && pip install -U openai-whisper
+
+# Set the working directory to /app
+WORKDIR /app
+
+CMD ["tail", "-f", "/dev/null"]