Adding OpenAI Whisper integration

2025-06-08 13:30:10 +02:00
parent 2ce7a0224a
commit 09a60eb520
3 changed files with 90 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -1,10 +1,14 @@
-# Run Ollama and Stable Diffusion with your Intel Arc GPU
+# Run Ollama, Stable Diffusion and Automatic Speech Recognition with your Intel Arc GPU
 [[Blog](https://blog.eleiton.dev/posts/llm-and-genai-in-docker/)]
 Effortlessly deploy a Docker-based solution that uses [Open WebUI](https://github.com/open-webui/open-webui) as your user-friendly 
 AI Interface and [Ollama](https://github.com/ollama/ollama) for integrating Large Language Models (LLM).
 Additionally, you can run [ComfyUI](https://github.com/comfyanonymous/ComfyUI) or [SD.Next](https://github.com/vladmandic/sdnext) docker containers to 
-streamline Stable Diffusion capabilities
+streamline Stable Diffusion capabilities.
 You can also run an optional docker container with [OpenAI Whisper](https://github.com/openai/whisper) to perform Automatic Speech Recognition (ASR) tasks.
 All these containers have been optimized for Intel Arc Series GPUs on Linux systems by using [Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch).
@@ -27,13 +31,17 @@ All these containers have been optimized for Intel Arc Series GPUs on Linux syst
 3. ComfyUI
   * The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
-   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu&version=v2.6.10%2Bxpu&os=linux%2Fwsl2&package=docker)
+   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)
 4. SD.Next
   * All-in-one for AI generative image based on Automatic1111
-   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu&version=v2.6.10%2Bxpu&os=linux%2Fwsl2&package=docker)
+   * Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)
   * Uses a customized version of the SD.Next [docker file](https://github.com/vladmandic/sdnext/blob/dev/configs/Dockerfile.ipex), making it compatible with the Intel Extension for Pytorch image.
 5. OpenAI Whisper
   * Robust Speech Recognition via Large-Scale Weak Supervision
   * Uses as the base container the official [Intel® Extension for PyTorch](* Uses as the base container the official [Intel® Extension for PyTorch](https://pytorch-extension.intel.com/installation?platform=gpu)
 ## Setup
 Run the following commands to start your Ollama instance with Open WebUI
 ```bash
@@ -54,6 +62,11 @@ For SD.Next
 $ podman compose -f docker-compose.sdnext.yml up
 ```
 If you want to run Whisper for automatic speech recognition, run this command in a different terminal:
 ```bash
 $ podman compose -f docker-compose.whisper.yml up
 ```
 ## Validate
 Run the following command to verify your Ollama instance is up and running
 ```bash
@@ -89,6 +102,45 @@ When using Open WebUI, you should see this partial output in your console, indic
 * That's it, go back to Open WebUI main page and start chatting.  Make sure to select the `Image` button to indicate you want to generate Images.
 ![screenshot](resources/open-webui-chat.png)
 ## Using Automatic Speech Recognition
 * This is an example of a command to transcribe audio files:
 ```bash
  podman exec -it  whisper-ipex whisper https://www.lightbulblanguages.co.uk/resources/ge-audio/hobbies-ge.mp3 --device xpu --model small --language German --task transcribe
 ```
 * Response:
 ```bash
  [00:00.000 --> 00:08.000]  Ich habe viele Hobbys. In meiner Freizeit mache ich sehr gerne Sport, wie zum Beispiel Wasserball oder Radfahren.
  [00:08.000 --> 00:13.000]  Außerdem lese ich gerne und lerne auch gerne Fremdsprachen.
  [00:13.000 --> 00:19.000]  Ich gehe gerne ins Kino, höre gerne Musik und treffe mich mit meinen Freunden.
  [00:19.000 --> 00:22.000]  Früher habe ich auch viel Basketball gespielt.
  [00:22.000 --> 00:26.000]  Im Frühling und im Sommer werde ich viele Radtouren machen.
  [00:26.000 --> 00:29.000]  Außerdem werde ich viel schwimmen gehen.
  [00:29.000 --> 00:33.000]  Am liebsten würde ich das natürlich im Meer machen.
 ```
 * This is an example of a command to translate audio files:
 ```bash
  podman exec -it  whisper-ipex whisper https://www.lightbulblanguages.co.uk/resources/ge-audio/hobbies-ge.mp3 --device xpu --model small --language German --task translate
 ```
 * Response:
 ```bash
  [00:00.000 --> 00:02.000]  I have a lot of hobbies.
  [00:02.000 --> 00:05.000]  In my free time I like to do sports,
  [00:05.000 --> 00:08.000]  such as water ball or cycling.
  [00:08.000 --> 00:10.000]  Besides, I like to read
  [00:10.000 --> 00:13.000]  and also like to learn foreign languages.
  [00:13.000 --> 00:15.000]  I like to go to the cinema,
  [00:15.000 --> 00:16.000]  like to listen to music
  [00:16.000 --> 00:19.000]  and meet my friends.
  [00:19.000 --> 00:22.000]  I used to play a lot of basketball.
  [00:22.000 --> 00:26.000]  In spring and summer I will do a lot of cycling tours.
  [00:26.000 --> 00:29.000]  Besides, I will go swimming a lot.
  [00:29.000 --> 00:33.000]  Of course, I would prefer to do this in the sea.
 ```
 * To use your own audio files instead of web files, place them in the `~/whisper-files` folder and access them like this:
 ```bash
  podman exec -it  whisper-ipex whisper YOUR_FILE_NAME.mp3 --device xpu --model small --task translate
 ```
 ## Updating the containers
 If there are new updates in the [ipex-llm-inference-cpp-xpu](https://hub.docker.com/r/intelanalytics/ipex-llm-inference-cpp-xpu) docker Image or in the Open WebUI docker Image, you may want to update your containers, to stay up to date.
--- a/docker-compose.whisper.yml
+++ b/docker-compose.whisper.yml
@@ -0,0 +1,18 @@
 version: '3'
 services:
  whisper-ipex:
    build:
      context: whisper
      dockerfile: Dockerfile
    image: whisper-ipex:latest
    container_name: whisper-ipex
    restart: unless-stopped
    devices:
      - /dev/dri:/dev/dri
    volumes:
      - whisper-models-volume:/root/.cache/whisper
      - ~/whisper-files:/app
 volumes:
  whisper-models-volume: {}
--- a/whisper/Dockerfile
+++ b/whisper/Dockerfile
@@ -0,0 +1,16 @@
 FROM intel/intel-extension-for-pytorch:2.7.10-xpu
 ENV USE_XETLA=OFF
 ENV SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ENV SYCL_CACHE_PERSISTENT=1
 # Install required packages
 RUN apt-get update && apt-get install -y ffmpeg
 # Download the Whisper repository
 RUN pip install --upgrade pip && pip install -U openai-whisper
 # Set the working directory to /app
 WORKDIR /app
 CMD ["tail", "-f", "/dev/null"]