"For an appreciation of young front way filed" ... "It's nothing the types"..."swirl around the case of a patient in Cottbus" ...make sense to you? If you've ever clicked "Translate this" on a webpage or underneath a Facebook post, you might be rather sceptical about the idea of a machine being able to transform whole sentences from one language to another. Sure, it can replace one language's words with their corresponding words in another language, but can they make it make sense?
The next big step now in machine translation (MT), is to refine machine translations for highly particular domains of language – especially types of language that are repetitive and thus predictable. This allows translators and programmers to “teach” a machine how to translate specific types of language – say, a manual, that has a very predictable use of jargon or sentence structure. Deepan Patel, MT solutions architect at Milengo, a translation company that specialises in machine translation, explains more: “Technical documentation which generally contains highly specialist and standardized terminology and a language style which contains minimal syntactical variation e.g. ‘Click on X to open Y’ is especially suitable for customizable machine translation and human post-editing, as you can train MT systems using defined terminology rules and language models in order to produce good, literal translations of content.”
So – good news if you need software manuals translated, bad news if you want to explain the poetry of an Argentinian proverb to a Russian: “Anything which requires more creative language input from a linguist, such as literary content, is not really suitable for machine translation.”
Though we can train machines to give ever-better approximations of what us humans actually mean, the big differentiator, as usual, is data. If you have a lot of content that is translated efficiently between two languages, you can tell the machine a lot about how you want it to do its job. This is, however, not as easy as it sounds: “Essentially an MT system is only as good as the data with which you ‘feed’ it. Some companies are not very good at maintaining their language content databases (called translation memories in the industry), often key terminology has been translated inconsistently, the bilingual sentences are not always correctly aligned, and these databases contain a mixture of different registers of content e.g. you might get a database which contains blog post translations, user guides content, or even content from the marketing department all mixed in together.”
And then there is the issue of which languages you want to translate between – the “harder” the language pairing, the more data you need to get a meaningful interpretation from a computer. Patel suggests that “Any English to Asian language MT system will require either at least double the language data for MT system building than for European languages in order to produce usable content for post-editing.”