YouTube

Terminology recognition and auto-completer in OmegaT 3.0

Terminology recognition in OmegaT is handled via tokenizers. Starting with version 3.0.0, tokenizers are included in the standard OmegaT distribution, whereas one had to download them separately in previous versions. They are also automatically selected during the project creation process, whereas one had to launch them via the command line in previous versions. Tokenizers are especially important for terminology recognition in heavily inflected languages. This video shows how the tokenizer works with Finnish as the source language.

Starting with OmegaT version 3.0.1, recognized terminology can be inserted in the target segment via a new auto-completer feature, which works entirely in the editor pane and with the keyboard (the shortcut is Ctrl+space in Windows, and Esc in OS X, so as to stay consistent with the system-wide completion engine). In previous versions, one had to right-click with the mouse in the glossary pane. This video shows how terminology can be inserted in the target segment, using a sample Finnish-English project.

Related link:
Finnish tokenizer (Lucene)

Related posts:
First steps with OmegaT
Machine translation in OmegaT for Mac
Machine translation in OmegaT for Windows

11 comments to Terminology recognition and auto-completer in OmegaT 3.0

  • Jean-Christophe Helary

    Dominique,

    As was mentioned on the OmegaT user list, the shortcut for the autocompletion on OSX is Esc to stay consistent with the system-wide completion engine.

    On the other hand, Shift+Esc triggers the display of the contextual menu.

  • Dominique

    Thanks, Jean-Philippe! I’ve edited the post according to your comment.

  • Thank you for this video. It is my pleasure to put a link to it on my blog. Explaining the latest features in OmegaT is very important, in my opinion, because not all users follow the mailing list, where the announcements are made. And even when they do, they might miss an announcement or have hard time figuring out how to benefit from the new features.
    Cordially yours,
    Roman

  • This was sooo useful

    Thanks Dominique

  • Hi Dominique,

    Do you happen to know if any other CAT tools are currently using the Lucene tokenizers? Any idea if Kilgray might have used them for their new fuzzy matching in termbases, for example?

    Michael

    • Dominique

      Hi Michael,
      I’m not aware of any other tools that took the approach of OmegaT for terminology recognition. I believe tokenizers such as the Lucene tokenizers are language-dependent (a separate tokenizer has to be created for each language), whereas Kilgray’s fuzzy matching is generic. You could ask them!
      Cheers,
      Dominique

  • SafeTex

    Hi

    Don’t know what has happened with this video on YouTube but it keeps cutting off before the end.

    YouTube shows it to be 6 mins long but it gets to the end of 6 mins and then cuts off, sometimes immediately, sometimes a few seconds or even minutes later but I couldn’t get it to play to the end.

    Just in case you are not aware of this and perhaps know the solution

    Regards

    SafeTex

    • Dominique

      Thanks, Dave, for letting me know about this issue! I just played the video and I was able to watch it until 7:06, after which it was cut off, all of a sudden. I have no idea what is causing this.

      Here is a workaround that will let you watch the video until the end: replay the video and pause it right away; let YouTube load the entire video in its cache (you will see the grey progress indicator move until the end); once this is done, position the cursor to the point where you got cut off (about 6 minutes) and press the Play button. I just tested it in Chrome and it worked.

  • Hi, Dominique.
    Thank you for your videos. Some of them refer to intagrating MT and TMs. If I am not mistaken, I refer to web based MT services. Did you cover desktop MT systems? Thank you in advance.

    Merry Christmas!

    Oleg

    • Dominique

      Hi Oleg,
      I do have one video covering a desktop MT system: SYSTRAN. Wordfast Classic can integrate with at least two such systems: SYSTRAN and ProMT. You’re probably interested in ProMT, which supports Russian (ProMT is a Russian company).
      I’m not aware of any other CAT tools that integrates with the desktop version of SYSTRAN and ProMT. Déjà Vu (and maybe some other tools too) integrate with the

        server

      version of SYSTRAN, but it’s an enterprise solution probably way too expensive for freelancers.
      Hope this helps,
      Dominique

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>