The Deepgram Speech-to-Text API offers a feature known as "endpointing" that is crucial for understanding when transcriptions should be finalized and returned after a period of silence. This article explains how to configure the endpointing
parameter to optimize the flow of transcription in your applications.
Endpointing is a feature that determines the amount of silence required to decide that a user has paused or finished speaking. By default, Deepgram's endpointing is set to finalize the transcript after detecting 10 milliseconds of silence (endpointing=10
). This enables a quick response after the completion of speech.
You can configure the endpointing parameter by setting it to an integer that represents the millisecond value of desired silence. For example, if you wish to wait 500 milliseconds (0.5 seconds) before finalizing the transcript, you would set the endpointing parameter to 500:
endpointing=500
This setup is useful if you expect users to pause for longer periods but still want to capture their complete thoughts before endpointing.
If your application's needs are such that endpointing should be disabled—meaning that transcriptions are returned at a cadence determined solely by Deepgram's chunking algorithms—you can set the parameter to false
:
endpointing=false
This ensures continuous transcription without waiting for defined pauses.
Adjust the endpointing parameter to suit the nature of your audio input to achieve an optimal balance between speed and accuracy in transcriptions. If issues persist or the system behavior seems inconsistent, reach out to your Deepgram support representative (if you have one) or visit our community for assistance: Deepgram Community.