A couple of weeks ago a prospective client told me about her intention to implement a Google auto-translate plugin (in Spanish) in her online shop. It would be created with PrestaShop. I held my breath, counted to ten and used a polite tone to tell her that it was not an ideal option. However, a jarring alarm was blaring in my brain and a huge red light was flashing brightly.
In order to prove myself right, I did a machine translation test with one of the online shop’s luxury products: a pair of Rockstud flats (“ bailarinas Rockstud” in Spanish). An item priced between €600 and €900. Not each one, but the pair.
I started by Google. The quickest option and the one they wanted to implement.
As I expected, it translated the term “bailarinas” as if referring to a professional dancer and not to the item of footwear that they actually sell in the shop. Apart from the possible risk of being seen as a website promoting human trafficking in its artistic form, the shop was not reaching its target audience with this translation.
I then tried other machine translation engines. First of all, Microsoft’s Bing.
The query returned an equally erroneous result, and I kept getting more errors, until Yandex got it right.
It is certainly a great result because these translation engines work on a largely statistical basis. The probability that “dancer” is the correct translation of “bailarina” is much greater than the probability of “flat shoes” being correct. That’s why most of them give us an inaccurate translation. So why does Yandex get it right? Because this probability is established on the basis of millions of words lined up in combinations of n-grams, i.e., in groups of one, two, three, four words and more. It just so happens that Yandex contains the segment “Rockstud flats” in its database, with its correct translation, while Google and Bing do not. “We have hundreds of millions of related terms, but not these”, they argue from their respective headquarters.
But that’s the way it is with today’s statistically based machine translation: if you have a similar segment, you’ll get it right, and if you don’t, you won’t. Which is why it offers such variable results. Excellent in some cases, good in quite a few cases and improvable in most. With some laughable results, it must be said.
I could recommend that my client implement the Yandex plugin instead of the Google plugin, but I’m not going to, because it could lead to errors with other product translations. Machine translation, for the time being, does not guarantee total reliability.
Amazon does not see this lack of guaranteed reliability as a problem. Faced with the large volume of content they publish on their site; they opt for the lesser evil and offer an unsupervised automated translation. They believe that this is better than nothing. To a certain extent they may be right, but with the risk of ending up with errors like the ones I describe in another post on our blog.