Wednesday, 19 October 2011

Deep mining with Déjà Vu X2

"Déjà Vu" is a translation memory program created by the company Atril. It stores my previous translation work in databases and uses these databases to help me in every new translation job. There are three types of database. The "translation memory" (TM) contains the sentences in my source language together with my translations of these sentences. My main TM has about 385,000 sentence pairs in my two languages (German and English), so it is effectively an archive of all the work I have done since I started using Déjà Vu in late 1999. The "termbase" (TB) has terminology items which I have entered. My main TB has about 54,500 terminology pairs. In addition, there is a "lexicon" for each project, which is a place to put proper nouns, client-specific terminology etc. When I work on a new translation project, the program calls on these databases to offer as much help as possible. Sometimes this enables me to work much faster on a project. But usually I have projects with long and complicated sentences, especially contracts, so the speed gains are usually more modest. The main benefit of Déjà Vu for me is as a tool for quality which enables me to be more consistent in my work.

Over the last 12 years I have seen three generations of the program. The first version was known by the abbreviation "DV3". The next generation, DVX, was released in May 2003. The latest version is DVX2, which was released in May 2011.

Each new version has new features. A list of new features in DVX2 can be found here. One new feature which has puzzled many people is "DeepMiner". The theory is that it uses both the TM and the terminology databases to retrieve even more material. But how does it work in practice? There is a training video which uses an extremely simple example to show cross-analysis between the sentences "I have a brown dog" and "I have a black dog" when translating them into French.

So far so good. In practice, however, my sentences are never as simple as this example, and the size of my databases means that DeepMiner has to work much harder. As a result, using DeepMiner on a largish project with big databases can be very slow. And in my experience, DeepMiner is sometimes not helpful because it tries to be too clever and reconstruct the solution from similar sentences in the TM, and in the process it may overlook what I have in my termbase and lexicon. Thankfully, it is easy to switch the DeepMiner function on or off.

So how helpful is this new function? To illustrate this, let's look at one example sentence from a complicated German land purchase and partitioning contract in two alternative versions: with and without DeepMiner:

My translation:
I make the following declarations not in my own name, but as a manager with power of sole representation of ...

Looking at the first half of the sentence, where do the phrases "my own name" and "the following declarations" come from in the example with DeepMiner? They are not in the terminology hits for this segment, and there is no whole sentence match. But the TM has many matches containing "die nachstehenden Erklärungen" and the translation "the following declarations" (although "nachstehend" on its own is only in the TB as "hereinafter"). The first three words "my own name" seem strange at first sight. Somehow, DeepMiner seems to have found a correlation between the words "ich ... im eigenen Namen" and the English "my own name", in spite of the fact that the TB entries which use "im eigenen Namen" only offer the English "its own name" and "his own name".

At least in this example, DeepMiner offers solutions which go beyond the conventional assembly and pretranslation routines in the previous version of DVX. In my experience, it is still a matter of trial and error - sometimes it finds surprisingly good suggestions, but sometimes it is not really helpful. One possible workflow to get the best of both worlds is to "Pretranslate" the whole file with DeepMiner activated and then, if the solution is not helpful, to "Assemble" the individual sentence without DeepMiner. To do this, the settings for Pretranslate are:

And the settings for Assemble (under Tools>Options>General) are:

I am still experimenting to find out how DeepMiner can be used to best advantage, so perhaps I will be able to add more insights at a later date. Before too long (hopefully) I will comment on some of the other features of DVX2 such as AutoWrite, the information design options in the variable grid layout etc.