At the moment, Machine Translation (MT) and Speech Recognition (SR) can’t be fully utilized. This is mainly because people speak continuously and there needs to be an acoustic remedy for that which reduces the flow down to sentences or smaller segments which then sends the output to what’s referred to as an audio optimization layer. From that point linguistic optimization will be required to ensure better translation accuracy, like ensuring interrogative sentences will be annotated by using question marks.
In order for translation to be effective, parts of spontaneous language such as hesitating while speaking, repeating and correcting have to be cleaned between the automatic SR and MT. Microsoft has started to build a function which it calls TrueText. This transforms what you actually said into what you actually wanted to say. This has been presently trained on real-world data so it works the best when handling the commonest errors.
There has been a suggestion that any future developments in S2S technology might also include more ways of assessing quality than some of the present automatic techniques like Bleu Scores.
The Uses of Speech to Speech Translation
- Translation and transcription of telephone conversations.
- Speech analytics which could be used to gauge how well a customer service agent performs. The S2S technology should be able to enable extension to this type of real-time analysis in the not too distant future
- There is still a great gap in the translation technology and being able to communicate in multiple languages in such situations as humanitarian missions, broadcasting news, and interpreting seminars, lectures and political speeches have still not been exposed to any useful speech recognition technology that can instantly translate between multiple languages.
The Cost of Breaking Down Language Barriers
There are more than 7,000 languages in the world today and huge translation and speech databases will need to be collected when trying to support one single language. All sorts of different software is involved for authentic speech to speech translations to take place.
Challenges are Too Many to Count
Despite the fact that speech recognition is an important area younger people do prefer to communicate by sending Instant Messages (IM) or using Snapchat. They are doing this now in preference to calling.
A second but quite interesting obstacle is that in China, holding up a phone to somebody when involved in a translated interchange could be seen as quite threatening. In this situation the technology could be a watch or some type of wristband instead.