Machine translation (MTL) is proving an increasingly common practice on the web, yet its output is far from trustworthy, and companies offering these services are obscuring that fact.

Increasing Ubiquity

Machine translation, originally, was little more than an interesting research topic. With the very limited computational resources of the 50s and lack of established research in the field, early machines were capable of handling a mere few hundred words at a time. The systems used fixed syntactical rules and a word list to attempt to replicate constructs from one language into another.

The first large-scale research effort came from – like a lot of modern technology – the military. In the 60s, the US funded research into translation from Russian into English, noting its limitations and eventually abandoning the project due to it being both more costly and less accurate than human translation.

By the 80s and 90s, more powerful computers and further research had improved the capabilities of rule-based translation systems. Research also began on systems that abandoned the rule-based approach in favor of statistical models built by analyzing large bodies of sample translation data. This approach allowed for greater flexibility, and has since grown to dominate research in the field.

Today, the explosive growth of the web, continued improvements in computing power, and a wealth of new research have made machine translation easily accessible to individuals, at low or no cost. Google Translate enables anyone to submit text to be translated instantly between a startling variety of language pairs. Facebook and Twitter posts are translated automatically at the click of a button, and in some modern browsers even full websites receive this treatment. News sites, rather than bothering with the expense and time of locating a professional, sometimes run foreign material through machine translation software.¹ Google has begun showing off a prototype live-machine translation headset. Articles from reputable sources praise Google Translate as “really, really accurate”² and even “approaching human-level accuracy.”³ Almost everyone in the US has interacted with machine translated content at some point.

The problem? Machine translation is now used by police⁴, government⁵, news sites, and other influential organizations, yet despite presenting itself as reliable, that is far from the truth.

Existing Limitations

In this paragraph, discussion of translation quality concerns the Japanese-English language pair, which is as prominent as any other on both Google Translate and Bing Translate, the latter being what is used to automatically translate tweets. Anecdotally, both services perform poorly on a variety of sources from news articles to tweets, with Google marginally ahead of Bing. For a third-party opinion by an accomplished professional translator, using examples from a real Japanese game, Clyde Mandelin’s Funky Fantasy Project offers an entertaining look at Final Fantasy IV’s script run through Google Translate.

Other language pairs, perhaps, fare better than Japanese-English. Spanish-English, for example, seems to perform reasonably well on common sentences, and Google claims their Chinese-English translations to be highly reliable, though the veracity of this claim is perhaps doubtful. However, over 10,000 of these pairs are offered on Google Translate, all of them equally prominent, with no distinguishing factor for the accuracy of any given pair. Claims regarding the quality of a few individual pairs do not accurately reflect the overall trustworthiness of the service.

Lack of Disclosure

One could argue that organizations using machine translation for important tasks are themselves solely at fault. After all, any sort of machine translation service is going to offer a legal disclaimer absolving them of any responsibility for mistakes. Here’s Google’s. However, while this absolves them from legal responsibility, the increased availability and promotion of these services means that the companies offering them carry an ethical burden. And a bit of legalese hidden away on the site with a general absolution of liability does not qualify as effective disclosure of the service’s limitations.

When viewing translated tweets and Facebook statuses, there is no indicator that the translation is anything but accurate. Additionally, when browsers offer to automatically translate web pages, there once again is no disclosure to users that the translation may not even remotely resemble the original. It is simply presented as an objective translation of the material. Even if you assume that in-app translations have no room for such disclosure, surely this does not apply to the primary site for these services. However, those too are barren of notice to users of the service’s limitations.

This was not always the case. For example, here is Google Translate’s old FAQ, which includes the following warning:

Even today’s most sophisticated software, however, doesn’t approach the fluency of a native speaker or possess the skill of a professional translator. Automatic translation is very difficult, as the meaning of words depends on the context in which they’re used.

This is an unambiguous, honest representation of even the current state of machine translation. During the site’s redesign, as part of an overhaul of the full Google brand, the decision must have been made that Translate was “good enough” to do away with such a disclaimer. It’s impossible to know who decided this or how, but it’s a step in the wrong direction. Rather than removing this information, companies should work to make it as clear as possible to users.

How to improve

How, then, can these services more accurately represent themselves? The first, and hopefully obvious way, is to just disclose that they have flaws. Whether this be in a popup disclaimer, message bar, or some other visible location, a clear, unambiguous explanation of the software’s limitations would inform users that the service is not trustworthy enough for any instance where mistranslations are a concern. This is the least that these services could do in an effort to behave responsibly – and yet, disappointingly, not one seems to nowadays.

Additionally, the statistical models used could offer confidence intervals, or some similar measurement, for given translations. This is something machine learning is very good at, and these models are almost certainly already doing this internally. While confidence is not necessarily reflective of accuracy, displaying it still offers an additional vector of information to judge quality. Making this information available may lower user confidence in the results, but that’s the whole point! Rather than presenting everything as totally accurate, letting users evaluate for themselves whether they want to trust a given result is a viable and honest way to frame the output.

Machine translation has come a long way, but still has a long way to go. Rather than pretending that we’ve reached the end, companies offering this service should do their best to inform users as to its limitations, lest a potentially useful tool be unwittingly used for a task it is not yet capable of.

Footnote: