When using Deepgram's transcription services, users may occasionally encounter what are known as "hallucinations" in the transcriptions provided by the models. These hallucinations manifest as transcribed text that does not correspond to any content in the audio file, such as random names or number patterns.
Hallucinations occur when the model generates words or phrases that were not present in the original audio. This can lead to unexpected entries like people's names or sequences of numbers appearing in the transcription output.
In reports of hallucinations, users have noted the appearance of names such as "Joshua Sandler" and sequences like "(zero zero six:fifty seven)" which were not present in the audio files being transcribed.
Switch Models: If you encounter hallucinations with Nova-2, consider using the Nova-1 model. Users have reported fewer hallucinations when switching to this version. Adjust the model parameter in your API request as shown below:
curl --location 'https://api.deepgram.com/v1/listen?smart_format=true&language=en&model=nova' \
--header 'Content-Type: audio/mpeg' \
--header 'Authorization: YOUR_API_KEY' \
--data-binary '@/YOUR_FILE'
Review API Settings: Use smart_format=true
to automatically apply punctuation and paragraph formatting without needing to explicitly set paragraphs
and punctuate
to true.
Hallucinations in transcription are an acknowledged phenomenon when using advanced models such as Nova-2, and can usually be mitigated by trying alternative models like Nova-1. It is always good practice to evaluate the output when processing transcriptions and make model adjustments as necessary.
If the problem persists, or for more complex troubleshooting, reaching out to Deepgram's support or community channels can provide additional assistance: Deepgram Community.