Language mystery: Maschinenübersetzung

Showing posts with label Maschinenübersetzung. Show all posts

Wednesday, 12 October 2016

“We ran out of legs”

This sentence appeared this week in an Internet newspaper report. How would you translate it into other languages? Surely it can't be that difficult! Every word is short, and I'm sure you had no difficulty understanding any of the words. So what is the speaker trying to tell us?

I looked at various on-line machine translation engines to see how they would translate it into my second language, German.

I entered the English sentence “We ran out of legs." and found four different suggestions:
Wir liefen aus Beinen (Google Translate)
Wir rannten aus Beinen (Microsoft Translation)
Wir sind an Beinen knapp geworden (Promt online translator)
Wir hatten keine Beine mehr (Systranet)

So Google and Microsoft seem to think that Beinen (legs) is a place and that we left this place by running. Promt thinks that there is a shortage in the supply of legs, and Systranet says “We had no legs left”. I looked at a few other on-line translation websites, but I found that they had simply copied from one or more of the above sites.

Time for some context?
I found the above sentence in a BBC report on Tuesday's football match between Germany and Northern Ireland. The sentence was a quote from the Northern Ireland Manager Michael O'Neill: “In the last 20 minutes of the first half we had opportunities on the counter-attack and we could possibly have done a little bit better with those. We ran out of legs a little bit to threaten them.” So in context he is saying: our legs were tired, we weren't fit enough, we couldn't run fast enough. And it turns out that one of the on-line translations would actually work in a translation of the report (the one by Systranet), although in this case I suspect that this was more by accident than by design.

Easy if you know how?
Would you have understood the sentence from the outset if I had given you the context? I'm sure most of my readers would have had no problem, although some familiarity with football jargon (in this case the frequent metaphorical use of “legs”) would be helpful. But how would you fare if a report on a football match told you that one team had “parked the bus”? Or if the German report on the same game spoke of “Beton anmischen” (mixing concrete)? Would you instantly recognise that these images denote a densely packed defensive approach to the game? And how well would you understand the use of the word “leg” in another sporting context, such as cricket (leg before, leg sweep, leg spin, leg slip, leg side, short fine leg, leg boundary)?

The lesson for today
This very simple example sentence tells us a few things about translation.

1. Context is everything. Even a very simple sentence consisting of well-known words can be a complete mystery if you don't know the situation that it refers to.

2. Dictionaries will never catch up with usage. The way words are used is constantly changing, indeed they are often used in new and unique ways at the whim of the individual writer. Writing a dictionary is like trying to pin down a moving target.

3. Computers can only go so far. Humans are creative in the way they speak and write. If you use language creatively, I can normally understand you – as long as you do it in a language that I know well. But the computer hasn't a clue what we are talking about. The computer can recognise and manipulate patterns in the data, and some computer programs can do this very very well. But if our use of language goes off into uncharted territory, the computer is often up the creek without a paddle.

4. Subject knowledge is crucial. I can understand reports on football and cricket matches because I know the games and played them myself once upon a time. But show me a report about motorcycle speedway, deep sea diving or Mah Jong, and your guess is probably much much better than mine.
Applying this to my regular translation work: I have developed expertise in translating materials such as contracts, legal reports, court papers, architectural descriptions, building specifications and similar areas. I can understand what the writers are talking about in German, and this enables me to translate their texts into English. But I would be hopeless in subject areas such as cookery and textile design, and I am uncomfortable with medical texts.

5. There's more to it than meets the eye. Translation is a highly specialised skill, and most people don't understand what it involves. And to my translator friends and colleagues: I would suggest that you specialise, learn to understand your subject areas extremely well in all of your languages, and never forget that you are offering a specialised expert service.

Wednesday, 8 May 2013

Humpty Dumpty and the TAUS quality concept

The “Translation Automation User Society” (TAUS) is a think tank which promotes the use of machine translation and technology within the translation industry. It organises events and offers services such as data sharing and language technology training. A recent article on the TAUS blog focused on the problem of quality evaluation in automated translation. It proposes a model called “dynamic quality evaluation”. This model has also been discussed onthe LinkedIn group “Translation Automation”, and Rahzeb Choudhury of Leeds University kindly sent me a link to a longer report in PDF format, the DynamicQuality Framework Report.

Looking at these materials, the underlying logic looks to me rather suspect, like a circular argument. It is worth considering the reasons for this.

The TAUS demographics

The Dynamic Quality Evaluation Framework report is based on a study conducted with a number of major multinational organisations (“reviewers”) which have a high volume of text which needs translation. Most of these organisations are large businesses with high volume technical products such as Dell, Google, Microsoft, Phillips and Siemens. The organisations also include the EU, which has a high volume of translations between the national languages in the European Community.

In other words, the work of TAUS, at least in this particular instance, is based on a very limited sample, i.e. major international organisations with an extremely high volume of multilingual text requirements, most of which service a limited range of subject areas. There is no consideration given to highly complex and confidential legal texts which will be read in different jurisdictions, no mention of complicated architectural texts, of urban planning, high-powered business management documents and much more. Given this highly selective demographic situation, it is not surprising that TAUS claims broad agreement on certain priorities in its reports and other documents. I would suggest, however, that the translation industry is much broader than the demographic group represented by TAUS.

The part and the whole

This limited demographic sample would not in itself be a problem if TAUS freely admitted that the study deliberately focuses on a certain scenario and certain types of translation work. But the actual usage in the report exacerbates the problem and is often misleading. For example, there are frequent references to “the translation industry”, although the actual descriptions and conclusions actually apply to clients (and perhaps selected suppliers) in the translation technology industry working on high volume automated translation in specified subject domains.

If the work of TAUS claimed to be impartial academic research, it would take a far more self-critical approach to its own sampling procedures and would openly point out the limitations of its material. Instead, it acts like a political pressure group, presenting its results in the way that most suits its own agenda. In some of the TAUS material that I have read, I have wondered whether this confusion is deliberate, or whether it reflects a genuine inability to perceive that there are different perspectives on the issues.

Dynamic quality evaluation – a definition of convenience?

The report on “dynamic quality evaluation” uses this very problem as its starting point. It states, for example, “Quality evaluation (QE) in the translation industry is problematic”. The blog post claims “The industry needs common measurable definitions”. Both of these statements pose more questions than they answer. Which sector(s) of the translation industry is TAUS referring to? What quality is referred to, who wants to evaluate this quality, for what purpose and in what kinds of text? What measurements could be used to define something as flowing and variable as language? To what extent would industrial-scale evaluation and defined measurements miss the essential characteristics of the material they are used on?

Instead of dealing with these fundamental issues, TAUS posits a quality evaluation system with three main elements, which it calls utility, time and sentiment. We are told that utility refers to the functionality of the content, speed refers to how quickly the translation is needed and sentiment denotes the effect of the resulting text on the brand image. You may notice that the actual quality of a text is not one of the three elements. So where does it come in? As far as I can gather, it seems to be relegated to a sub-category of “Utility” and to be marginally touched on in the category “Sentiment”. At the stroke of the categoriser's computer keyboard, the quality of the text itself is relegated to a mere sub-category.

The pinnacle of the “dynamic quality” logic is reached in the blog post. At the conference which is reported on the blog, there were apparently some participants who did not agree with the majority opinion – they advocated absolute rather than relative quality, and they felt that universal measurable standards did not do justice to the phenomenon of translation. Then comes the classic conclusion: most participants at the conference felt that “unless we maintain the simplicity of the model we get lost in endless details and personal requirements, and we end up … having no generalizable reference …”

Get yourself a cup of coffee and sit down and consider this sentence for a few moments. I would paraphrase it like this: some people argue that the world of language and translation is complicated, but we can’t handle a complex world because we could then not create the simple and measurable system that we want. We must have simplicity, so let there be simplicity. Simplicity rules, simply because we want it to rule.

This is rather like the semantic principles expressed by Humpty Dumpty in Lewis Carroll's novel “Alice in Wonderland”: “When I use a word, it means just what I choose it to mean – neither more nor less.” It would be a wonderfully simple way to use language: I say what I want, and it means what I want. The only problem is the puzzled expression on the faces of my listeners.

The toxic disclaimer

The final section of the blog is where TAUS dances on the borderline of Imperialism. In the title of this section, and three times in the paragraphs, it mentions the possibility of applying for the “dynamic quality” system to be certified as a standard. Each time, the possibility is retracted, at least partially, rather like the song of the Mock Turtle in Carroll's novel: “Will you, won't you, will you, won't you, will you join the dance?” In a TAUS context, this translates as “we would not be so sure that we would want to apply for official standardisation” and “Whether we go for standard certification is a decision we can take together when we get to this crossroads”.

Together? Dear TAUS, does this mean that you will gather all of the translators in the world and involve us in deciding whether to apply for certification of a standard? I think not. Your agenda seems to be domination of the translation industry rather than cooperation with real life translators. You do not look kindly on people like me who have differing opinions, far less do you take us seriously. For you, we are unwelcome “quality gatekeepers” who are “blinkered by prior assumptions”. Ho hum, I suppose Humpty would be proud of these sweeping allegations.

Unintended consequences

The occupation of Gaul by the Roman Empire gave rise to the insurrection by Asterix and Obelix in the wonderful French comics and films. Many other literary parallels come to mind, such as Luke Skywalker and the Empire, Thursday Next and Goliath Corporation, etc. If you continue to play Humpty with the values which translators hold dear, please do not be surprised when you meet opposition. Every group which aspires to global domination must expect resistance. The rhetoric adopted by TAUS and others will bring forth a myriad Luke Skywalkers, and your glorious automated future will be lit up by the flash of lightsabres all over the globe.

Previous related posts on this blog

Would I advise my grandchildren to translate?

Still building Babel?

Fight the machine? (1)

Fight the machine? (2)

Wednesday, 25 April 2012

Computer language mystery solved by humans

Computers have languages, too. According to an article in the American Scientist, even the experts do not agree how many programming languages there are – estimates range from 2,500 to over 8,500.

One recent example which highlighted this variety was the mystery of the programming language used in the creation of “Duqu”, a computer Trojan which has been studied by heavyweight anti-virus companies like Symantec, Kaspersky Labs and F-Secure. These IT giants were able to see the code which this Trojan consisted of, but they were not able to identify which programming language had been used to compile this code.

Why didn’t they ask a computer?

To me, as a mere computer user without a programming background, the solution appears simple. It is a computer language, and a computer is obviously able to follow the instructions in the code (otherwise the Trojan would be of no use to the crooks who created it). So a computer should be able to identify what language it is. This seems to be an obvious logical conclusion.

But it is not so. Igor Soumenkov, a Kaspersky Lab Expert, wrote a blog article “The Mystery of the Duqu Framework”. The article outlines the history of the study of Duqu and the structure of the threat which it poses, and it ends with an appeal which amazed me: “We would like to make an appeal to the programming community and ask anyone who recognizes the framework, toolkit or the programming language that can generate similar code constructions, to contact us or drop us a comment in this blogpost.”

Digital guesswork?

Soumenkov received a flood of blog comments and e-mail responses, and the mystery of the programming language has now been solved. But it is interesting to check out the wording of the 159 comments on the original blog article. They are peppered with phrases like:

That code looks familiar

It may be a tool developed by ...

I think it's a ...

What about ...?

Just a guess ... the first thing that pops to my mind is ...

Sounds a lot like ...

I am not a specialist but I would say it could be ...

One more guess ...

This does smell to me a little bit like ...

I'm gonna take a wild guess ...

Plus a generous sprinkling of words like might, perhaps, maybe, probably, similar, clue, feel, remember, possibility and similar vague terms.

Data or brains?

For me, this throws an interesting light on the use of computers in natural language processing. The human guesswork in the comments on Duqu included many ideas that turned out to be wrong, but the brainstorming process was helpful to the computer experts involved, and the fuzzy process of human thinking led to a solution which evidently was not possible with the computer alone. And all of this for a language which is only useful in computers and has no meaning for human communication (when did you last _class_2.setup_class13)[esi]?).

The situation in translation between human languages is comparable. Automatic translation programs from Google, Microsoft, IBM and others can achieve a certain amount of pattern recognition and sometimes come up with plausible solutions. But only a competent human being can evaluate whether this solution is really accurate or appropriate. So these programs can be a useful tool in the hands of an expert, but there is a distinct risk that they may get the wrong end of the stick.

Friday, 2 March 2012

Would I advise my grandchildren to translate?

Bang, bang, bang.

Is this another nail in the coffin of freelance translation as a career?

A recent article on the blog of the Translation Automation User Society (TAUS) does not hold out much hope for specialist translators. The title of the article is “Who gets paid for translation in 2020?”. I would love to quote the author of this article by name, but no name is given. Perhaps this is a model article, generated by a computer, untouched by human hand. This would graphically illustrate the creed which underlies the article:

“In 2020 words are ‘free’. Almost every word has already been translated before. Our words will be stored somewhere and used again, legitimately in the eyes of the law or not. .... Even today ‘robots’ are crawling websites to retrieve billions of words that help to train machine translation engines. The latent demand for translation created by unprecedented globalization is making piracy an act of common sense.”

The TAUS vision paints a glowing picture of a completely automated future, with instant computerised translation in every hand-held device, every computer application and on every website, without any need for specialist intervention. To achieve this, TAUS aims to build up a database of all the translation work done in the world. It seems to envisage three methods to do this:

BEG, SCAVENGE and STEAL

BEG: In conference lectures, blog articles and other publications, TAUS calls on translators to donate their translations to its central database. The reward for doing this is to know that we are contributing to the BRAVE NEW WORLD of global computerised translation. There may be some payback in the form of access to databases provided by others, but the rhetoric of the begging prose is that we should contribute for free to the ideal of a humanity without language barriers.

SCAVENGE: The above quote speaks of the “robots” which are retrieving billions of translated words to train machine translation engines. But a scavenger takes everything that it can find. A scavenger cannot afford to be fussy about quality. There are two experts in the industry who have important things to say about this. First of all Kirti Vashee in his blog eMpTy Pages. Kirti is an ardent advocate of machine translation, but he insists that the data used to train the translation engines must be of extremely high quality. The danger of the TAUS vision of innumerable robots scavenging for more and more data is that this can include lots of low quality data, so the resulting translations will be inherently problematical. The other expert is Miguel Llorens, a highly insightful freelance translator who ridicules many of the assumptions of the machine translation gurus and elegantly criticises buzzwords such as the “content tsunami” and “crowdsourcing”.

As an aside: Kirti and Miguel disagree on many things - I suppose it is not often that they are recommended as two leading experts in the debate on machine translation.

STEAL: It has often been suggested that Internet giants such as Google and Facebook are in fact data-gobbling monsters which think nothing of violating data protection standards. But at least in their public statements, they usually claim to respect the privacy of their users and to comply with data protection laws. Not so TAUS. In the above quotation, TAUS explicitly suggests that piracy is “an act of common sense”. I wonder if the similarity to the confiscation of private assets in the ideology of Marx, Stalin and others is merely accidental. Brave new world indeed!

Translation and my grandchildren

By the time the brave new world predicted by TAUS comes to pass (2020), my own translation career will be drawing to a close, or perhaps already ended. But what about my wonderful grandchildren? They will be on the threshold of their working lives (and some will be still in primary school). What should I tell them if they ask about translation as a career?

I will say: “Why not - if that is what you are really good at.” Of course I will point out the general principles of working in a career like translation: real language expertise in two languages, realistic self-appraisal and self-management, translating skills, the need for solid specialisation, how to use the tools of the trade (including computer-aided translation and various forms of machine translation), how to advertise and find customers and much more.

This is because essentially I do not accept the TAUS creed that “Almost every word has already been translated before.”. Even at the word level, in my work I regularly come across newly created terms or compound words (German legal and architectural prose has an amazing level of inventiveness in this respect). And at the sentence level, every language on earth has an incredible potential for creative new combinations of ideas and even new linguistic structures - after all, I believe that we are still building the tower and city of Babel.

Language mystery

Wednesday, 12 October 2016

“We ran out of legs”

Wednesday, 8 May 2013

Humpty Dumpty and the TAUS quality concept

Wednesday, 25 April 2012

Computer language mystery solved by humans

Friday, 2 March 2012

Would I advise my grandchildren to translate?

Popular Posts

Blog Archive

About Me

My Blog List

Followers