13 November 25
Transcribing Catalan With My New Workstation
Thanks to the Easy Languages folks, I learned the power of target language subtitling of video content in language learning, and this has been a big part of my Catalan studies. The Easy Languages approach is to do double subtitling e.g. for Catalan this is subtitles in both Catalan and English. But it is also very helpful to watch videos that are singly subtitled in the target language, e.g. Catalan subtitles for Catalan video, and I have watching these where I can find them. The YouTube channel CatalĂ al Natural does this specifically for language learning, and as I’ve described earlier I have watched many episodes of the TV series El Foraster this way.
But most of the Catalan content on YouTube has no subtitling available, which limits its utility to a beginner in the language. What to do? I came up with a plan for adding automated subtitling to the video content, and tried this out yesterday with much success. The workflow is as follows: a) download the YouTube video to my workstation b) run speech-to-text software over the audio channel of the downloaded video and c) add the transcribed text as subtitles as one watches the video stored locally.
This approach came together very easily using my new workstation. The details are as follows. First, I used the program yt-dlp to download the video from YouTube. The next step is the speech-to-text conversion. I used Whisper here, which I believe is the best open source speech-to-text converter, at least that is what I gathered from working with the AI institute a year-and-a-half ago. This is software from the belly of the AI beast, coming from the company OpenAI. It is multilingual, and Catalan is one of the better performing languages in the software. The output from this program consists of transcribed text with timestamps. Finally, I watched the video in the program Celluloid, which turns out to be smart enough to take the text-with-timestamps and overlay the text on the video as subtitles at the right times.
It greatly helps the accuracy of the transcription not to have to do it in real time, as the software can take advantage of looking at the language context around the current timepoint to produce a better transcription. My new workstation is very helpful here, having a graphics card with 12 GB of VRAM memory. It still takes a while: it was transcribing at a rate of about 4x real speed (that is, a 12 minute video was taking about 3 minutes to transcribe). The output seems very good, though as a beginner in the language I am not the best one to judge.
I tested this system today with a couple of recent videos from VilaWeb, and was pleased with how it helped. I might try experimenting with double subtitling a la Easy Languages, since I think that is supported by the video playback software after some fiddling.
Previous: Patterns of Liberation Next: Second Street Sketchcrawl
