In some scenarios, users may experience erratic endpointing behavior and instability when using Deepgram's nova transcription models, especially with multilingual audio settings or specific configurations in live transcription via websockets.
Multilingual transcription, while powerful, can introduce complexity and issues with endpointing—when the system identifies the end of a sentence or utterance. Unstable endpointing can lead to repeated non-final responses or lack of utterance end markers, disrupting the flow of a transcription service or application.
Some users may notice intermittent errors in endpointing where transcriptions continue without finalizing (is_final: false
), potentially repeating transcriptions multiple times with false non-final indicators.
nova-2
, nova-3
, nova-2-conversationalai
) may help, as each version may handle language settings differently.language=es
instead of language=multi
, if applicable to your audience, can sometimes rectify these issues.endpointing
and utterance_end_ms
parameters can stabilize outputs.
nova-2
to newer models. Consider starting with endpointing=300
and utterance_end_ms=1200
as a baseline.Some users may experience rare instances where the stream fails to respond or becomes unstable. This could be due to complex audio inputs or server-side anomalies.
diarize
or multiple keywords aren't conflicting with the model's capabilities or limitations.Users may encounter unique challenges when configuring Deepgram's transcription models to handle diverse and multilingual audio streams. Adjusting API parameters, keeping track of model versions, and seeking support can provide pathways to resolving complex endpointing and stability issues.