Building an AI Video Subtitle Generator with Whisper

Building an AI Video Subtitle Generator with Whisper

ยท

2 min read

Whisper is an AI model from OpenAI that can be used to generate subtitles from audio or video files. Please click here to learn more.

In this blog, we'll use Docker to simplify the process. First, we'll place our audio or video files into a Docker container running an Ubuntu system. Then, we'll install Whisper and use it to generate our transcription files like .srt, .vtt etc. for our subtitles. And then using volume mapping we can also access the generated files in our local system.

  1. Let's begin by opening an empty project in VSCode.

  2. Create a file named "Dockerfile" and copy the below Dockerfile content in it.

     FROM ubuntu
    
     RUN apt update && apt install ffmpeg -y && apt install python3 -y && apt install python3-pip -y
    
     RUN pip install -U openai-whisper
     RUN pip install setuptools-rust
    
     ENTRYPOINT [ "bash" ]
    

    Now we'll build our docker image from this file but make sure that your docker daemon is running.

  3. After that go ahead and run the following command:

     docker build -t my-whisper .
    

    This might take a while because we're getting the Ubuntu image and setting up other things. Just hang tight and wait for it to finish!

    Now get a sample video whose subtitle you want to generate. Then create a folder in the Desktop named "sample" and place your video in this folder.

  1. Now copy the path of the video file and run the below command:

     docker run -it -v <your sample folder path here>:/home/videos my-whisper
    

    After running this command, you will be in your container's bash. If you navigate to the /home/vidoes directory, you can access the sample video in your container.

    Now inside /home/videos run the below command to generate the English subtitle of Hindi video.

     whisper sample.mp4 --language Hindi --task translate
    

    After hitting enter, generating subtitles may take some time as AI models require a GPU to operate quickly. Since I am running it on a CPU, it will take longer but if you have GPU then it is going to be fast.

    As this command finishes generating subtitles you do ls and check there are some transcription files in your container and as well as in your Desktop sample folder.

    You can also test these subtitles by applying them on your video using tools like veed.io

So yayy!!. That's all in this blog and we have successfully generated our subtitles using Whisper AI model.

This blog is inspired by Piyush Garg's video go and check it out also.

If you liked this blog please give it a thumbs-up!! ๐Ÿ‘๐Ÿผ

ย