Skip to main content

Speech Translation

Speech Translation enables users to convert speech from more than 60 languages into text in a chosen target language. Users can choose from 16 available languages for the translation output.

Supported languages

To see the complete list of more than 60 supported audio (source) languages and 16 translation (target) languages visit this documentation page

Uploading files

  1. Select audio language utilized in the recordings. If uncertain about the language, an alternative is to employ auto-detect mode, which seamlessly identifies the language and proceeds with translation.
  2. Select translation language into which you wish to translate speech in recording.

If you don't have your own files, you can use the provided Phonexia examples to explore how speech translation works.

Results

After processing, the translations will appear in the right panel. Please note that the translation process may take a while.

caution

Leaving the page for an extended period while awaiting the results may interrupt the process. If this happens, you will need to restart the audio processing.

Once the recordings are processed, you can play the original audio file while viewing the corresponding translated text, with the spoken segments highlighted in real-time.

info

The audio language and translation language can be changed using the dropdown menu next to the filename in the left card. When either language is modified, the entire translation process restarts from the beginning.

Export formats

Whether you export results in a bulk action or individually, you have the option to select from various export formats.

Plain text

This format provides plain text without timestamps or any additional metadata. The text merges together the translation of all speech without specifying individual channels.

Hello Andreas, I just want to let you know that we decided with my brother-in-law
that we'd rather go to Vienna this weekend,
because there is an exhibition in the Nature Science Museum on the topic of crystals
and you know that my nephew is back to minerals.
So we'll come to Berlin later
and I hope it will fit you at the end of the month at some point.
Please let me know.
Okay, bye.

Text with timestamps

This format contains two types of data: timestamps and text. The text merges together the translation of all speech without specifying individual channels.

00:00:00  Hello Andreas, I just want to let you know that we decided with my brother-in-law
00:00:05 that we'd rather go to Vienna this weekend,
00:00:09 because there is an exhibition in the Nature Science Museum on the topic of crystals
00:00:15 and you know that my nephew is back to minerals.
00:00:18 So we'll come to Berlin later
00:00:22 and I hope it will fit you at the end of the month at some point.
00:00:27 Please let me know.
00:00:29 Okay, bye.

CSV and XLSX formats

Both these formats contain the translated text, and identical metadata: translation technology, language code, source language code, detected language code, channel tags, segment timestamps, and confidence score.

info
  • Language code refers to the target language of transcription.
  • Source language code is the code of the original language of the audio specified by the user.
  • Detected source language code is the code of the original language of the audio as identified by the system.

The .CSV format is well-suited for users who work with large datasets, as it facilitates automated processing and filtering based on specific metadata. Start time and end time of each segment are represented in seconds in this format.

Transcription technology,Language code,Source language code,Detected source language code,Channel,Start time,End time,Confidence score,Transcription
Built on Whisper,en,de,de,0,0.34,5.78,,"Hello Andreas, I just want to let you know that we decided with my brother-in-law"
Built on Whisper,en,de,de,0,5.78,9.16,,"that we'd rather go to Vienna this weekend,"
Built on Whisper,en,de,de,0,9.16,15.18,,because there is an exhibition in the Nature Science Museum on the topic of crystals
Built on Whisper,en,de,de,0,15.18,18.97,,and you know that my nephew is back to minerals.
Built on Whisper,en,de,de,0,18.97,22.22,,So we'll come to Berlin later
Built on Whisper,en,de,de,0,22.22,27.37,,and I hope it will fit you at the end of the month at some point.
Built on Whisper,en,de,de,0,27.37,29.17,,Please let me know.
Built on Whisper,en,de,de,0,29.17,30.49,,"Okay, bye."

The .XLSX format provides a clear, comprehensive, and human-readable overview of the metadata and textual content, catering to users who prefer working with a more graphical data representation. In this format, timestamps are presented in the format: HH:MM:SS.

Table showing a list of translations including metadata such as selected variant of transcription technology, language code, source language code, detected source language code, channel and timestamps.

JSON format

This format presents machine-readable metadata equivalent to those provided in the CSV and XLSX formats.

{
"one_best": {
"segments": {
"segments": [
{
"channel_number": 0,
"start_time": 0.34,
"end_time": 5.78,
"language": "en",
"text": "Hello Andreas, I just want to let you know that we decided with my brother-in-law",
"source_language": "de",
"detected_source_language": "de"
},
{
"channel_number": 0,
"start_time": 5.78,
"end_time": 9.16,
"language": "en",
"text": "that we'd rather go to Vienna this weekend,",
"source_language": "de",
"detected_source_language": "de"
},
{
"channel_number": 0,
"start_time": 9.16,
"end_time": 15.18,
"language": "en",
"text": "because there is an exhibition in the Nature Science Museum on the topic of crystals",
"source_language": "de",
"detected_source_language": "de"
}
]
}
}
}