Wednesday 19 October 2011

Deep mining with Déjà Vu X2

"Déjà Vu" is a translation memory program created by the company Atril. It stores my previous translation work in databases and uses these databases to help me in every new translation job. There are three types of database. The "translation memory" (TM) contains the sentences in my source language together with my translations of these sentences. My main TM has about 385,000 sentence pairs in my two languages (German and English), so it is effectively an archive of all the work I have done since I started using Déjà Vu in late 1999. The "termbase" (TB) has terminology items which I have entered. My main TB has about 54,500 terminology pairs. In addition, there is a "lexicon" for each project, which is a place to put proper nouns, client-specific terminology etc. When I work on a new translation project, the program calls on these databases to offer as much help as possible. Sometimes this enables me to work much faster on a project. But usually I have projects with long and complicated sentences, especially contracts, so the speed gains are usually more modest. The main benefit of Déjà Vu for me is as a tool for quality which enables me to be more consistent in my work.

Over the last 12 years I have seen three generations of the program. The first version was known by the abbreviation "DV3". The next generation, DVX, was released in May 2003. The latest version is DVX2, which was released in May 2011.

Each new version has new features. A list of new features in DVX2 can be found here. One new feature which has puzzled many people is "DeepMiner". The theory is that it uses both the TM and the terminology databases to retrieve even more material. But how does it work in practice? There is a training video which uses an extremely simple example to show cross-analysis between the sentences "I have a brown dog" and "I have a black dog" when translating them into French.

So far so good. In practice, however, my sentences are never as simple as this example, and the size of my databases means that DeepMiner has to work much harder. As a result, using DeepMiner on a largish project with big databases can be very slow. And in my experience, DeepMiner is sometimes not helpful because it tries to be too clever and reconstruct the solution from similar sentences in the TM, and in the process it may overlook what I have in my termbase and lexicon. Thankfully, it is easy to switch the DeepMiner function on or off.

So how helpful is this new function? To illustrate this, let's look at one example sentence from a complicated German land purchase and partitioning contract in two alternative versions: with and without DeepMiner:

My translation:
I make the following declarations not in my own name, but as a manager with power of sole representation of ...

Looking at the first half of the sentence, where do the phrases "my own name" and "the following declarations" come from in the example with DeepMiner? They are not in the terminology hits for this segment, and there is no whole sentence match. But the TM has many matches containing "die nachstehenden Erklärungen" and the translation "the following declarations" (although "nachstehend" on its own is only in the TB as "hereinafter"). The first three words "my own name" seem strange at first sight. Somehow, DeepMiner seems to have found a correlation between the words "ich ... im eigenen Namen" and the English "my own name", in spite of the fact that the TB entries which use "im eigenen Namen" only offer the English "its own name" and "his own name".

At least in this example, DeepMiner offers solutions which go beyond the conventional assembly and pretranslation routines in the previous version of DVX. In my experience, it is still a matter of trial and error - sometimes it finds surprisingly good suggestions, but sometimes it is not really helpful. One possible workflow to get the best of both worlds is to "Pretranslate" the whole file with DeepMiner activated and then, if the solution is not helpful, to "Assemble" the individual sentence without DeepMiner. To do this, the settings for Pretranslate are:

And the settings for Assemble (under Tools>Options>General) are:

I am still experimenting to find out how DeepMiner can be used to best advantage, so perhaps I will be able to add more insights at a later date. Before too long (hopefully) I will comment on some of the other features of DVX2 such as AutoWrite, the information design options in the variable grid layout etc.

9 comments:

  1. Thanks Victor, nice analysis. I will be sending it to some of my friends

    ReplyDelete
  2. Thanks Charles, glad you found it helpful.

    ReplyDelete
  3. Thanks, Victor. I am looking into alternatives to SDL Studio and this is a very enlightening analysis to the features which DV offers.

    ReplyDelete
  4. Thanks for your comment Nicolas. More could be said, of course. If you think DVX2 might fit the bill for you, it could be worthwhile downloading the 30 day demo from the Atril website (www.atril.com). It is free and works without any limitations.

    ReplyDelete
  5. Thanks for the introduction to Deep mining, Victor. Déjà Vu is looking more and more appealing to me.

    ReplyDelete
  6. Hi Alejandro, I'm sure Mina would approve. Not sure about Pam and Calvo though.

    ReplyDelete
  7. Thks for taking the trouble to share this. BTW, is it me or has Atril not updated the Help/Instructions on DVX2? There is no ref to Deep Miner, nor to CodeZapper, supposedly 2 of their supa-dupa innovations.
    Anyway, my use of DMiner has been much more sobering (only been using it for a week). Here are 2 examples.
    In a transln of a restaurant guide (DE>EN) it suggested "and" as a translation for "der". NOwhere in my TBs is there der=and. In another segment it suggested "Küchenchef" = "and his team". Later, by looking through the TM, I found the original segment which was "Küchenchef xxxx und sein Team" and in fact "Küchenchef" was even in the TB as a single word, yet DM had failed to find it. Under Tranlation Menu>Pretranslate, one of the options is :"For Portions not found in Databases>Use DM/Use MT/..." which implies it looks in the DBs first.
    Not too impressive. I hope I can fine-tune it later...

    ReplyDelete
  8. Hi Gordon. There is more information on DeepMiner on the Atril website. http://helpdesk.atril.com/index.php?pid=tutorials&cmd=viewcatclient&id=32 offers a choice of manuals. DM is dealt with on pages 152-160 (DVX2Pro) or 154-162 (Workgroup). There is also a video at http://helpdesk.atril.com/index.php?pid=tutorials&cmd=viewcatclient&id=32
    Like you, I suspect that DM looks so intensively in the TMs that it may overlook TB material. As I wrote in the blog, the results are not always helpful. But it is another resource in DVX2, and as I explain above, there are ways to get the best of both worlds.

    ReplyDelete
  9. Ich finde Ihre Erklärung des DeepMiners wirklich erhellend. Nur aus dem Video von Atril ist mir nicht wirklich klar geworden, wie der DeepMiner arbeitet bzw. wofür er überhaupt gut ist. Vielen Dank!

    ReplyDelete