Language mystery: DVX2

Showing posts with label DVX2. Show all posts

Tuesday, 17 January 2012

12 facts, hints and ideas on databases in DVX2

Déjà Vu X2 is a “Translation Memory” program (TM). It does not come with pre-packaged language content. Instead, it remembers your own work, i.e. it acts as a “memory” for what you have “already seen” (= “déjà vu” in French).

1. There are three types of memory:
The TM (Translation Memory), the TB (Termbase) and the lexicon for each project.

The TM is a database where you can save the sentences from your source text together with your finished translation.
The TB is a terminology database which you can use for single words or whole phrases.
The lexicon is a database which only applies to the individual project. For every project file you can create a new lexicon.

When you then work on your project DVX2 combines the content of these three database types to suggest translations and help you in your work. The methods which DVX2 uses to make these suggestions are known as “Pretranslate”, “Assemble” and “AutoAssemble” – but that is another topic for another day.

2. Big Mama and Big Papa:
You can keep all of your work in just one TM (“Big Mama”) and one TB (“Big Papa”). If you are careful to give your entries the appropriate subject and client codes, DVX2 will take these codes into account when suggesting translations from your databases. My main TM contains about 40,000 sentence pairs accumulated over 12 years, and my main TB has about 55,000 entries.

3. Separate TMs and TBs:
In DVX2 Professional you can have up to 5 TMs and 5 TBs open in any project, and DVX2 Workgroup has no limitation. So you can use your Big Mama/Papa together with external databases, e.g. a TM or terminology list provided by the client, general reference material such as the EU DGT database, or terminology lists from major enterprises such as Microsoft, SAP or from various banks. Or you may even decide to keep separate databases for different subjects or clients instead of a Big Mama or Big Papa. You may feel that this is safer if you work on texts for competing engineering or IT firms which deliberately use different terminology for their own brands. The problem is that it may be more difficult to access all of your reference material, for example if you know that you have dealt with a term or sentence in DVX2, but you can’t remember which database you were using at the time.

4. Fuzzy matching:
You can allow DVX2 to find matching material which is not quite exact. Under Tools>Options>General you can set a percentage figure for the variants which DVX2 is allowed to find (= “Minimum Score”). The default setting is 75%, but depending on the type of inflections which occur in your languages it may be useful to set it to 50% or less. The percentage applies to both the TM and TB. It does not apply to the lexicon – only exact matches are found in the lexicon. And the “minimum score” does not affect the performance of the DVX2 functions DeepMiner and AutoWrite.

5. Adding new entries:
This is very quick and easy in DVX2. For the TM you enable AutoSend (either with the tick box at Tools>Options>Environment, or via the icons at the bottom of the DVX2 window – AutoSend is the second icon from the right). Then all you need to do is click CTRL-DownArrow when you have finished each segment. For the lexicon you have to highlight the word or phrase in the source and target text, then hit the F10 key. For the TB you again highlight the word or phrase in the source and target text, then hit F11. This brings up the following window:

Here you can edit the term in either language to add or remove declensions, correct spelling problems etc. You can check that the terms are marked with the right subject and client codes. There are additional fields, too (Definition, Part of Speech, Gender, Number, and you may also see a field called Context). I have not yet seen any reason to use any of these fields, although some users may have found ways to do so.
The termbase (TB) is one of the keys to productivity in DVX2. It is advisable to add words, and even whole phrases, as often as you can. Some users have the principle of adding an entry to the TB in every single sentence they translate. Steven Marzuola’s article about using the terminology database was based on the previous version of DVX (now often called DVX1), but it offers great advice which is also relevant to DVX2.

6. Subject and client codes:
These are important, because DVX2 refers to them when it decides what material to offer to help you with your current translation. When you first install DVX2, you will see a suggested list of subjects, but you can easily delete this and create your own list if you think this is better for your work. Each subject consists of a short index code (435 in my example above) and a descriptive text (Regional planning/ecology). When DVX2 decides how close the subject is to your current project, it works hierarchically, so in this example it would consider that entries with my subject codes 43 (Urban planning) and 4 (Building) are closely related. You can use letters instead of numbers if this suits your work.

7. Build lexicon:
This is a function which you can find in the “Lexicon” menu, and which is sometimes useful in preparation for a job which is heavy on terminology. I use this function for between 5% and 10% of my jobs. My procedure is as follows. First I call up “Build lexicon” and define the maximum number of words (usually 4). The program then takes a couple of minutes to find solutions. Then I open the lexicon (with the Project Explorer), click on the heading over the left hand column and define the sort criteria: 1. Number of words (descending), 2. Frequency (descending). Then I go through the list manually from the top. First I decide which four-word phrases are worth adding a lexicon entry for. This is usually only worthwhile for phrases which are meaningful in themselves and which occur frequently. When I get down to phrases which appear three times or less, I then use the scroll bar to move down to the most frequent three-word phrases. And so on, until I have defined a number of lexicon entries. Then I select “Remove entries” from the Lexicon menu, click on “Entries with empty targets” and OK. Typically, this gives me between 30 and 50 lexicon entries for a job consisting of several hundred segments, but they are entries which occur frequently and require consistency, so this preliminary process improves the results achieved by Pretranslate or Assemble as I work on the job.
This function (Build lexicon) can also be used to identify terms that can be used for a terminology list to be delivered to the client if this is part of the client’s instructions for the job. Over the years I have only had one such project, but this may be relevant for translators who often work in highly technical fields.

8. Names, places and proprietary titles:
These are the classic elements which should be added to the lexicon. If you have a product name or number, this is normally only relevant to the job in hand. You do not usually want this term to occur in jobs for other clients. The same applies to the names of the people who work for the client. Therefore, such elements should only be sent to the lexicon, and not to the termbase. But some names occur so often that they may be useful in the TB. My general principle here: if names could be confused with actual words in the language, they are not suitable for the TB. So the common German name Helmut is not in my TB because, depending on the level of fuzzy matching, it could be confused with the word Helm=helmet (and the declined forms Helme/Helmen/Helmes). Similarly, the surname Kohl is not in the TB to avoid confusion with Kohl=cabbage (and the near-match Kohle=coal). But the two names together are in the TB – i.e. the former German Chancellor Helmut Kohl. And other famous politicians are there too with the spelling in German and English, such as Gorbatschow/Gorbachev.
9. Adapting your use of the databases to your languages:
In some cases, your language pair and translation direction will influence the way you use the different databases because of issues such as word order and inflection. One example of this is the English phrase “public green spaces”. In French the words come in a different order, e.g. “espaces verts publics”, and alternative wordings are possible, e.g. “espaces verts des lieux publics”, “espace verts ouverts au public”, “espaces verts pour le public” etc. (Thanks to Dave Turner for providing these and other examples). In German the first translation that comes to mind is “öffentliche Grünflächen”, although the first word could also be declined as “öffentlichen”.
If you are translating from French to English, you will probably want to enter each and every French phrase as a lexical unit, especially if it occurs frequently in the type of text you deal with. Merely entering the elements does not help very much, because the order of the words must be changed. Depending on your type of work and the frequency of such phrases, you may decide to store them in the lexicon, the TB or the TM.
If you are translating from German, in this case it is sufficient to add the two words to the termbase and let DVX2 handle the endings as “fuzzy matches”. Even if we consider phrases with a greater number of inflected variants such as “public building”, (“öffentliche Gebäude”, “öffentliches Gebäude”, “öffentlichen Gebäudes”, “öffentlichem Gebäude”), it is still possible to enter just one version of each word and use fuzzy matching. The advantage here is that although the German source is inflected, the English target phrase is not.
Translating from a largely uninflected language into inflected languages like French and German can be more complicated, so you will have to find a strategy which fits the languages that you work with. There is no single solution which will work for all languages and all subject areas, but DVX2 offers flexibility in the use of the databases.

10. Looking things up in the database:
There are various ways to access the information that is in your databases. The first is that DVX2 uses this information to compile its suggested translation (when you use the functions “Pretranslate”, “Assemble” or “AutoAssemble”). When you have done that, you will see that some words or phrases in the suggested translation are underlined in blue. These are terms for which your databases contain several possibilities. Right clicking on the word or phrase will show you the other suggestions, and you can examine these and select them with the mouse or by using the number shown. The third way to see the relevant content of your database is by looking at the “Portions” window or windows. There are several screenshots illustrating this here. The fourth way to look up the information is to use Scan (CTRL-S) to call up a concordance from the TM, or Lookup (CTRL-L) to see entries from the TB.

11. Moving databases to another computer:
If you need to move your work to a different computer, e.g. to work on a laptop while you are travelling, you will need to copy certain files to the other computer. The first file is your project file, which has the extension .dvprj. The project file contains the lexicon, so no special steps are needed to transfer the lexicon. The termbase is a single file with the extension .dvtdb. The TM consists of at least four files. The main content is in a file with the extension .dvmdb. Then there is an index file for each of your languages; my index files have the extension en.dvmdi and de.dvmdi (for English and German). There is also a file with the extension .dvmdx. When you open the project on the other computer, DVX2 may complain that it cannot find the databases. But this is not a problem – when the project is open, you can select them with Project>Properties>Databases.
Another file which is worth moving to the other computer is the settings file with the extension .dvset. This contains your subject and client lists and various other settings. And don’t forget your dongle, or if you use an electronic licence key, make sure that the key will apply to the other computer.

12. How to find out more:
For more detailed information it is worth looking at the DVX2 User Guide for DVX2 Professional or DVX2 Workgroup. The link is at the bottom of the page, and the user guides are PDF files with over 600 pages. On the website http://www.atril.com there are also links to various videos, webinars and training courses, and also to the mailing list dejavu-l (under Support>Technical forum).
I already mentioned Steven Marzuola’s article on terminology databases. It is also worth looking at Nelson Laterman’s collection of tips and tricks for DVX1 (and even its predecessor DV3).
I am sure there are plenty of tips and questions which I have not covered, so I am looking forward to reading comments by my readers.

Friday, 11 November 2011

DVX2 screenshot gallery

At first sight, the screen of the Translation Memory program DéjàVuX2 (DVX2) is just a mass of boxes, a chaotic pattern of vertical and horizontal lines. What are they all for? Where in this enormous jigsaw puzzle can I find the text I want to translate? What other information is provided on the screen, and how is it helpful? The best way to explore this is with screenshots.

The classic layout

When you start working on a project with DVX2, the screen will probably look something like this. The pane at the top left is the working area. The left column is headed "German" - that is my source language. The right column, English (United Kingdom), is where my translation goes.

At the bottom left and bottom right of the screen I can see my reference material. At the bottom right I have terminology suggestions ("AutoSearch Portions"), and at the bottom left I have similar sentences ("AutoSearch Segments"). The top right ("Project Explorer") shows me the files in the project. When I am working on the translation, I normally hide this pane so that I have the full window height for the terminology.

There are various ways to personalise this layout. I can change the font and type size in the various windows, and I can also change the arrangement of the different panes in the working window.

My personal layout

Modern monitors, laptops and netbooks tend to have a wide screen. There is not much space to display elements above each other, so it is sometimes better to display the elements side by side. Therefore, my normal DVX2 screen looks like this:

In this "tramline" layout, the working area is in the middle of the screen and the reference material is arranged to the right and left. It provides more context (i.e. the text before and after the active sentence). The shorter lines could be a disadvantage for longer sentences, and especially on smaller screens. The above screenshot is taken from my 22" monitor. On my 10" netbook, this layout is rather more cramped, although it would be just about workable:

One way to make the lines longer in the working area is to work in a separate text area at the bottom of the screen and to split this text area vertically (Tools>Options>Environment). The active sentence is highlighted in the grid, but the working area is now at the bottom, i.e.:

I often get jobs with very long sentences, and sometimes the reference pane on the left is empty for most segments. In such jobs, I can simply hide this column, which gives me longer text lines even without using the separate text area:

Hide and display

In the last screenshot, note the little tabs on the left and right of the screen. They are "mouse-over" tabs. If I want to have a quick look at "AutoSearch Segments", I simply move the mouse over the tab, and the AS Segments pane opens up, but closes again when I return the mouse to the main grid.

Note also the little drawing pin icon at the top right of the "AS Portions" pane. This is a three-way switch for the display of this pane. It can either be fully displayed, as it is here, folded away like the "AS Segments" pane, or it can hover as in the mouse-over function. The combination of the tabs and the drawing pin icons takes a bit of practice, but it helps me to be flexible in using the screen layout.

Smaller details

There are a number of smaller details in the screen layout which can be useful.

The top of the DVX2 window shows the name and path of the current project. For example, the project I used for these screenshots is on drive D at the location shown.

These six icons are in the middle of the bottom edge of the DVX window. Mousing over them displays what they mean - here I had the mouse over the first icon (AutoWrite). The background colour shows me whether the function is on or off. Here, for example, AutoWrite, AutoAssemble, AutoPropagate and AutoCheck are enabled, but AutoSearch and AutoSend are disabled. These functions can also be switched on or off via Tools>Options>Environment, but the icons are quicker.

This is the area above the working part of the grid, and it contains a few hidden details. The grid language heading boxes (here "German" and "English") switch between alphabetical and chronological view of the project sentences. The language field with the flag has a little arrow to the right, which leads to a list of the target languages in the project (useful for project managers, but not usually for freelancers like me). The box "All segments" also has a little arrow, which opens up a list of types of sentence (all fuzzy matches, all exact matches etc.). The empty box on the left is a row finder. If I know the number of a segment, I can type it here, and DVX2 jumps to that segment (useful if I am proofreading and notice that a segment needs more work when I have finished proofing - I simply jot down the number and jump to the segment afterwards).

The tabs above this row show the name of the files which I have opened, so I can move to another file simply by clicking the tab. That in itself does not sound special. But these tabs can also be used to display files side by side (or one above the other). I can then compare my work on two files in context, for example like this:

This article only looks at the main grid, in other words the screen which I usually see when I work on a project. It does not explore the menu or any of the subsidiary screens, nor does it examine the efficiency of the many functions of the program. But I hope that this visual summary gives a general impression of the working environment.

Wednesday, 19 October 2011

Deep mining with Déjà Vu X2

"Déjà Vu" is a translation memory program created by the company Atril. It stores my previous translation work in databases and uses these databases to help me in every new translation job. There are three types of database. The "translation memory" (TM) contains the sentences in my source language together with my translations of these sentences. My main TM has about 385,000 sentence pairs in my two languages (German and English), so it is effectively an archive of all the work I have done since I started using Déjà Vu in late 1999. The "termbase" (TB) has terminology items which I have entered. My main TB has about 54,500 terminology pairs. In addition, there is a "lexicon" for each project, which is a place to put proper nouns, client-specific terminology etc. When I work on a new translation project, the program calls on these databases to offer as much help as possible. Sometimes this enables me to work much faster on a project. But usually I have projects with long and complicated sentences, especially contracts, so the speed gains are usually more modest. The main benefit of Déjà Vu for me is as a tool for quality which enables me to be more consistent in my work.

Over the last 12 years I have seen three generations of the program. The first version was known by the abbreviation "DV3". The next generation, DVX, was released in May 2003. The latest version is DVX2, which was released in May 2011.

Each new version has new features. A list of new features in DVX2 can be found here. One new feature which has puzzled many people is "DeepMiner". The theory is that it uses both the TM and the terminology databases to retrieve even more material. But how does it work in practice? There is a training video which uses an extremely simple example to show cross-analysis between the sentences "I have a brown dog" and "I have a black dog" when translating them into French.

So far so good. In practice, however, my sentences are never as simple as this example, and the size of my databases means that DeepMiner has to work much harder. As a result, using DeepMiner on a largish project with big databases can be very slow. And in my experience, DeepMiner is sometimes not helpful because it tries to be too clever and reconstruct the solution from similar sentences in the TM, and in the process it may overlook what I have in my termbase and lexicon. Thankfully, it is easy to switch the DeepMiner function on or off.

So how helpful is this new function? To illustrate this, let's look at one example sentence from a complicated German land purchase and partitioning contract in two alternative versions: with and without DeepMiner:

My translation:
I make the following declarations not in my own name, but as a manager with power of sole representation of ...

Looking at the first half of the sentence, where do the phrases "my own name" and "the following declarations" come from in the example with DeepMiner? They are not in the terminology hits for this segment, and there is no whole sentence match. But the TM has many matches containing "die nachstehenden Erklärungen" and the translation "the following declarations" (although "nachstehend" on its own is only in the TB as "hereinafter"). The first three words "my own name" seem strange at first sight. Somehow, DeepMiner seems to have found a correlation between the words "ich ... im eigenen Namen" and the English "my own name", in spite of the fact that the TB entries which use "im eigenen Namen" only offer the English "its own name" and "his own name".

At least in this example, DeepMiner offers solutions which go beyond the conventional assembly and pretranslation routines in the previous version of DVX. In my experience, it is still a matter of trial and error - sometimes it finds surprisingly good suggestions, but sometimes it is not really helpful. One possible workflow to get the best of both worlds is to "Pretranslate" the whole file with DeepMiner activated and then, if the solution is not helpful, to "Assemble" the individual sentence without DeepMiner. To do this, the settings for Pretranslate are:

And the settings for Assemble (under Tools>Options>General) are:

I am still experimenting to find out how DeepMiner can be used to best advantage, so perhaps I will be able to add more insights at a later date. Before too long (hopefully) I will comment on some of the other features of DVX2 such as AutoWrite, the information design options in the variable grid layout etc.

Language mystery

Tuesday, 17 January 2012

12 facts, hints and ideas on databases in DVX2

Friday, 11 November 2011

DVX2 screenshot gallery

Wednesday, 19 October 2011

Deep mining with Déjà Vu X2

Popular Posts

Blog Archive

About Me

My Blog List

Followers