A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

Internal fuzzy matches explained

Most translation tools can produce analysis reports that include 100% matches, fuzzy matches and internal repetitions. Some tools can also report “internal fuzzies” or “fuzzy repetitions”, which are segments with no matches in the TM, but that are quite similar to each others within the document (or set of documents) to be translated.

This video shows how three tools implement this feature: memoQ translator pro (homogeneity analysis), Wordfast Pro (internal fuzzy matches) and SDL Trados Studio (internal fuzzy matches). Explanations from the online help of each tool:

memoQ: Analysis against the segments within the selected scope is called homogeneity analysis. This is one of memoQ’s power features. Check this check box to emulate building a translation memory during translation, and see the savings that will result from the internal similarities within the project. Using homogeneity, you are able to see the benefits of your future contribution – i.e. the contribution while you will be translating – to the translation memory. You are also able to give a much better estimation of your resources to be spent on translation than without homogeneity. If you use the analysis to give a quotation, always look for the aggregate results as they reflect the real productivity gain through using memoQ.

Wordfast Pro: Calculate internal fuzzy matches in source files. For example, if there is a partial repetition of segments in a source file, it will be calculated as an internal fuzzy match. Select the percentage that should be calculated for such segments.

SDL Trados Studio: When this option is selected, the Analyze Files report shows the internal fuzzy match word counts. Internal fuzzy match analysis calculates the maximum additional leverage that can be obtained by the translator interactively translating the document with a translation memory. It assumes that the translator will translate the document from start to end, segment by segment. After each segment is confirmed, the translation memory is updated, and the best match applied to the next segment.

Déjà Vu X2 (usually included in comparisons with memoQ, Wordfast Pro and SDL Trados Studio) does not support the calculation of internal fuzzy matches. Other tools reported to support the feature include: SDLX, MetaTexis, Transit. If you are aware of other tools, please mention them in a comment!

6 comments to Internal fuzzy matches explained

  • Hi Dominique,

    Nice to see a clear explanation of this… very useful. Maybe worth mentioning that Workbench sort of has this too. The “Use TM from previous analysis” feature allows you to add more files later and then analyse these based on what you would get if you translated the previous one first. So whilst you get no value on a single file as you have shown here, you do get an interesting benefit of being able to decide what order you tackle the files in. This is something none of the modern tools will do until you actually translate the files… I think.

    I can also confirm SDLX had this for many years.

    Regards

    Paul

    • Dominique

      Hi Paul,
      Actually, I did see your post from 11-May-2011 (http://www.proz.com/post/1737371), but as you said, it will not work if the fuzzy repetitions are contained in the same file.
      I remember attending a presentation by Kilgray where they explained the order in which segments are translated matters in projects involving multiple translators: obviously the translator grabbing the first occurrence of fuzzy repetitions will do the hard work (having to translate from scratch), while those starting later would be “free riders”, reaping the benefits of translations made by the first translator. I’m not sure whether their feature was related to homogeneity or to the ability to run analysis reports afterwards, or both, but I think it also relates to what you say about the order.
      Cheers,

      Dominique

  • Hi Dominique,
    Great comparison between these 3 tools. Of course, internal fuzzies isn’t something we want agencies to catch onto. It’s bad enough to see some of the fuzzy rates that are offered (I refused one the other day that was 50% of my rate for 50-75% fuzzy matches!), so if we get similar offers for internal fuzzies, it won’t be good news. However, fuzzy matches within a file are probably more useful matches than ones from a TM. Worth a thought.

    To pick up on something Paul mentioned in his comment, when you merge files in a project in Studio, I find it useful to choose the order in which you want to translate the files. After selecting them and clicking “merge”, I give quite a lot of thought when re-arranging them – specifically for this reason.
    Emma

    • Dominique

      Hi Emma,
      Thanks for your comments, and for retweetting too! You are right: if there is a way to misuse an otherwise useful feature, some agencies will no doubt do it. Discounts for fuzzy matches below 75% make little sense anyway, since repairing these fuzzies often takes as much time as translating from scratch.
      Cheers,
      Dominique

  • Hi Emma,
    Using this feature in Studio to see how many fuzzy repetitions you might get is quite useful in terms of activating another feature. If you are getting a lot then you can create a bilingual file containing these potential matches (frequently occurring segments) and then translate this first. Of course you have no direct context, but if the material lends itself to this then after translating it won’t matter what order you translate the files in because all the possible matches will now be in the TM, or in the files if you then pretranslate.

    Interesting possibilities.

    Paul

    • Yes, exporting frequently occurring segments into a separate file is a useful resource. The only thing, as you say, Paul, is that you lose the context, so when I use this workflow I put the source file on a 2nd monitor for reference. It works well!
      Emma

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>