Introducing Automatic Language Identification
With a minimum of 30 seconds of audio, Amazon Transcribe can efficiently generate transcripts in the spoken language without wasting time and resources on manual tagging. Automatic identification of the dominant language is available in batch transcription mode for all 31 languages. Thanks to sampling techniques, language identification happens much faster than the transcription itself, in the matter of seconds.
If you’re already using Amazon Transcribe for speech recognition, you just need to enable the feature in the StartTranscriptionJob API. Before your transcription job is complete, the response of the GetTranscriptionJob API will tell the dominant language of the audio recording, and its confidence score between 0 and 1. The transcript lists the top five languages and their respective confidence scores.
Of course, if you want to use Amazon Transcribe exclusively for automatic language identification, you can simply process the API response and ignore the transcript. In this case, you should stick to short 30-45 second audio recordings to minimize costs.
You can also restrict languages that Amazon Transcribe tries to identify, by passing a list of languages to the StartTranscriptionJob API. For example, if your company call center only receives calls in English, Spanish and French, then restricting identifiable languages to this list will increase language identification accuracy.
Now, I’d like to show you how easy it us to use this new feature!
Detecting the Dominant Language With Amazon Transcribe
First, let’s try a high quality sample. I’ll use the audio track from one of my breakout sessions at AWS Summit Paris 2019. I can easily download it using the youtube-dl tool.
$ youtube-dl -f bestaudio https://www.youtube.com/watch?v=AFN5jaTurfA
$ mv AWS & EarthCube _ Deep learning démarrer avec MXNet et Tensorflow en 10 minutes-AFN5jaTurfA.m4a video.m4a
Using ffmpeg, I shorten the audio clip to 1 minute.
Read More