Tool Box Logo

 A computer journal for translation professionals


Issue 14-8-238
(the two hundred thirty eighth edition)  
Contents
1. New World Localization (Premium edition)
2. SDL's Language Cloud
3. The Canadian Response (Premium edition)
4. New Password for the Tool Box Archive
The Last Word on the Tool Box
Blazing Trails That Never End

I recently asked myself why so many translators who are deeply engaged with translation technology don't continue their technological exploration and interest once they reach a certain level of expertise.

Just so you know, no one has ever accused me of being too subtle. So to make sure we're all on the same page, the illustration I will be using here is intended as a caricature, or a description in which, according to Merriam-Webster, "certain striking characteristics are exaggerated in order to create a comic or grotesque effect."

In that spirit, let's imagine the life of a typical, successful, "technically adept" translator. Our composite translator probably reached this point in life through one of two paths.

Following one path, they receive a translation degree and -- depending on the era and location -- attain a certain level of technological proficiency through instruction (which will reflect the allegiances of the corresponding professor and school). Once they've launched out "into the wild," they apply what they have learned and continue to refine their use of technology to their particular needs and circumstances. When they've reached a level of competence they feel comfortable with, they consider themselves well equipped and stop looking for improvements.

On the other route, the self-trained translator looks for technology solutions by searching the web, newsgroups, and translator portals and by going to conferences or talking to colleagues. They find a first set of technology that they settle on, though in the early years they continue to change or tweak it as they learn more about the industry. Once they've assembled a suite of tools that work for them, however, they feel well equipped and stop looking for improvements.

You won't have missed the commonality between the two: In the "end" they feel well equipped and see no room for further improvement.

"Feel well equipped..."

For many years it's been easy to share a wide variety of opinions in topic-oriented discussion forums, going all the way back to the LANTRA-L list and CompuServe's Language Forum (FLEFO). Today, it's actually hard not to express ideas through all-pervasive blogs (and responses to blog posts), Facebook, Twitter, LinkedIn, and others. For our "successful and technically adept" translator, it's even harder to abstain: after all, good translators also have above-average writing skills and know how to express themselves very effectively. When they persuasively describe the technology they use as one of the cornerstones of their success, they quickly realize that it earns them admiration and a leadership position among their peers.

"...stop looking for improvements"

They readily embrace gradual change in the technology they're already using because they're able to integrate it quickly into their expertise portfolio, thus retaining their position as one of the public champions of "their" technology. But the more fundamental paradigm changes -- where the existing technology is completely replaced by something new -- are more difficult. Really difficult, in fact. These threaten to challenge their hard-earned status and identity as a community leader, and might even imperil the important business opportunities that arise from that identity.

So what do they do? They uses their status to rail against the new technology, converting the perceived identity threat into a platform that allows them to predict doom for the community as a whole. Since they do indeed have considerable influence, especially among less experienced translators, their rallying cry becomes the rallying cry of many, with the result that the natural and ever-ongoing development of technology gets stuck.

Remember: This is a caricature. Still, if we're honest, do we not recognize a kernel of truth in the midst of my hyperbole? And since I'm calling for honesty, I'm very specifically not excluding myself from the same guilt.

What can be done to avoid these knee-jerk responses that have the potential to technologically stagnate an entire generation of translators? I can think of three things.

First, we need to de-politicize the situation. The Oxford Dictionary defines "politics" as "activities aimed at improving someone's status or increasing power within an organization." If even some of my exaggerated illustration is true, we are dealing with politics rather than arguments based only on fact. Once we recognize the difference between politics and fact, discussions about the future of translation technology should become much more productive.

Second, we might need a change of values. Expertise should be rewarded, but only if it does not promote stagnation. By its very essence, technology undergoes constant development. Those intrepid translators who master today's technology while continuously exploring new possibilities -- embracing some and rejecting others -- should be rewarded with the most prestige in the community.

Finally, and perhaps most importantly and practically, I would love to see technology developers reach out to the language community -- and language community leaders in particular -- to make them part of the development process. Not only will this raise the likelihood of creating a successful product that benefits the community, but it will develop technology champions in the process.

(Versions of this will also appear in the ATA Chronicle and ITI Bulletin.) 

ADVERTISEMENT

Translation Forum Russia 2014 in Ekaterinburg

Translation Forum Russia, Russia's top language industry conference with a focus on practical issues of interpreting and translation, will be held for the fifth time from September 26-28.

For the first time there will be a GALA Localization Forum as part of the Translation Forum.

Come join us!  

Learn more right here.

1. New World Localization (Premium edition)

Recently I stumbled on the folks from Lingohub, an Austrian/German tool and company that is offering a platform for the localization of mobile and web applications. I'll have more to say on the tool later, because the reason I stumbled on them was not the actual platform but a number of very interesting and well-written articles on app localization on their blog. At about the same time, Sebastian Haselbeck from Lingohub contacted me because he wanted to share with me (and with you) a survey that Lingohub had just completed among translators about a whole range of issues ranging from billing to marketing to translation technology.

It won't surprise you that I found the answers on technology the most interesting. According to the survey app, 62% of all translators use a translation environment tool (yes, the survey still uses the term "CAT tool," but they too shall see the truth at some point) and 34% do not use a TEnT. I would guess that if the survey represented the worldwide community of translators, the "yes" percentage would be significantly lower.

Or how about this: Regarding which TEnT translators use, 34% said Trados, 21% Wordfast, and 12% memoQ. I think it's surprising to see that only a third of TEnT users use Trados (I would venture to guess that that would have looked very different just a few years ago), and I think it's also surprising that significantly more translators use Wordfast than memoQ -- one possible explanation for that might be that "Wordfast" essentially refers to two if not even three tools (Wordfast Classic, Wordfast Pro, and Wordfast Anywhere).

You might find the survey results interesting, and you can find them right here. With surveys like this it is important to know who participated and how the participants were found. I asked Sebastian that question, and he said:

As for our outreach, we had a few hundred translators we know that we reached out to, and aside from social media snowball effects we also contacted academics around the globe for them to spread the word. It's not a scientific method, but low cost and I think we got a pretty interesting picture. We hope it triggers some additional research and gets a discussion going.

When talking to Sebastian, I also learned that there was actually a "method to the madness" of that survey. You will notice that the download location is actually not the lingohub.com site but Lingo.io. That's not an oversight or coincidence. Lingo.io (or LingoIO) will be a new tool that the Lingohub team is planning to launch later this year. Eventually it will be a full-fledged browser- and cloud-based translation environment tool as well as a community of translators. At first it will connect to Lingohub -- in fact, it will replace Lingohub's own translation editor -- but it will also support a modest number of file formats to which the team will eventually add so that the typical number of formats will be supported.

Aside from the community idea (which I have to admit I'm always skeptical of, particularly because it has been tried so often and has so far always failed; I recommended to Sebastian -- and others before him -- partnering with existing communities such as TranslatorsCafe.com), the makers of the new TEnT-to-be also hope to more easily connect with blogging or ebook platforms by integrating the tool either right into the platforms or via add-ons. That part sounds really interesting to me. (I'm sure that you also must have thought about the untapped market of self-published ebooks and this could present an easier way to get to that market.)

And why not use the current Lingohub translation editor for all of that? Well, Sebastian said, Lingohub really was built with the developer and not the translator in mind. This is visible in the messaging of the tool as well as its functionality. For instance, right now there is no translation memory functionality available (though that was just introduced into the beta version) or many of the other translation-related features that you appreciate (and need). These will be available in the LingoIO tool, along with additional features such as billing.

I was surprised that Lingohub does not offer a WYSIWYG environment for the translation process. WYSIWYG stands for "what-you-see-is-what-you-get" and relates to the fact that the translator sees what the translated segment looks like in its final destination in the software interface. This is one of the features that most localization tools for desktop and server products have particularly prided themselves on. For the localization of mobile and web apps, the makers of Lingohub decided that this was not practical since the number of formats are just too diverse (you can find a partial list of supported formats and platforms right here). Instead, they make it possible for the translators to generate screenshots of the specific interface that is being translated.

Lingohub has about 1300 users right now (approximately 30% of whom are translators) primarily because of the functionality that app developers appreciate. This includes a direct integration into Github, the leading web-hosting service that a large part of the development community uses for access and revision control and management of its development files.

I'm looking forward to seeing the new LingoIO product, whose release is planned for the fall. It should combine some new-world technical aspects with some good old (and maybe new) translation functionality.

ADVERTISEMENT

SDL Trados Studio 2014 - You've spoken, we've listened

"Technology is nothing. What's important is that you have a faith in people, that they're basically good and smart, and if you give them tools, they'll do wonderful things with them." Steve Jobs, Rolling Stone

Translators from all corners of the Earth have given their feedback on features and improvements in SDL Trados Studio 2014.

See the results that will continue to help translators do wonderful things.

Or, try it for yourself.

2. SDL's Language Cloud

A few years ago I gave a workshop at the AMTA conference (Association of Machine Translation in the Americas) on the various ways that MT can be utilized by translators. (It's interesting to imagine what I would say today, two years later, considering how technology to utilize MT proposals has changed. It would certainly be quite different.) Back then I had asked SDL to give me access to the customized LanguageWeaver (now: BeGlobal) engine for travel sites. I wanted to see for myself and share with the workshop participants whether a customized engine performs better on an industry-specific text with rather uncontrolled language (user complaints with lots of colloquial expressions and typos) than something like Google Translate (which is often used for this kind of data). I ran the test in German, French, and Spanish, and our (very unscientific) impression was that the SDL product clearly outperformed Google. This still doesn't mean that it produced a satisfactory translation per se, but it was more easy to understand as a gist draft. (You can see for yourself right here.)

There has been lots of talk in the industry about SDL's ongoing efforts to improve its BeGlobal MT engine. A relatively large number of folks have apparently been working to clean and update data for their baseline engines (the non-specific MT engines) so they can now offer about 100 baseline languages. When I asked SDL's Paul Filkin and Sinclair Morgan about some specifics of these efforts, though, they weren't able to share any details.

Regardless of the extent of the effort, it has to pay out at some point, and we are now seeing some first attempts to make that happen.

Users of SDL products know that the generic SDL BeGlobal engine has been part of products like Trados Studio for some time (alongside a number of other machine translation offerings that can typically be accessed with external plug-ins), but with the Studio 2014 update that was just released, users of all kinds will now have access to an overhauled system. Access to the generic baseline engines will continue to be free for the supported language, but there will also be chargeable access to specifically trained engines for industries like automotive, consumer electronics, travel, IT/high tech, and life science. The fee model will most likely not affect the freelance translator (up to 600,000 characters/month will be free, so most freelancers will be exempt), but it will be a different story for LSPs and translation buyers.

In my conversation with Paul and Sinclair, I pressed repeatedly for assurances that all the source and/or target data would stay confidential and won't be used in any way by SDL, and they assured me that SDL would indeed not save or use any of the data. After the interview they again confirmed that in Trados Studio, the machine translation data is not stored beyond the duration of a request for machine translation for a segment or file.

>Still, this is only the first step of the "Language Cloud" offering. At a recent conference, SDL introduced the XMT moniker for "eXtended Machine Translation," a new platform for statistical machine translation that incorporates their previous MT technologies but also enables SDL to roll out improvements to baseline quality more quickly. BeGlobal is a statistical machine translation engine (it's based on the analysis of large amounts of data), and SDL's machine translation teams are attempting to improve the output of that kind of processing with language-specific rules and terminology that might override or complement the data-only-based MT output of their machine translation engines.

This is part of a long-term rollout that will include self-training with your translation memories (on top of the specialized engines). These will automatically go through cleaning routines and upload of glossary-like TBX files to further train the engines. (This last point is particularly difficult for other statistical-based engines, so it'll be interesting to see how it plays out.) All this happens right from inside Trados Studio and is available with the latest update to Trados Studio, which was just made available this week. You can see the current list of available specialized engines right here.

Access to the Language Cloud module is also available through an API, so I wouldn't be surprised to see access in other translation environment tools, as well. 

ADVERTISEMENT

The Words You Want. Anywhere, Anytime

Let the new WordFinder open a new world of opportunities -- get access to millions of words and translations from the best dictionaries, on your computer, via a web browser, on your smartphone or tablet. Stuffed with lots of smart features. The new WordFinder has what you need as a translator in your everyday work - anywhere, anytime!

Read more at www.wordfinder.com  

3. The Canadian Response (Premium edition)

Most of the talk about machine translation these days is either about the free MT engines by Google/Microsoft, rules-based engines like Systran and PROMT, SDL's LanguageWeaver/BeGlobal, or the many different statistical MT systems based on the open-source Moses platform. There are some others as well, not surprisingly including one that was developed by the National Research Council (NRC) of Canada. It's "not surprising" because historically Canada has had a tendency to be very independent-minded when it comes to translation technology (for instance, Canadian translation environment tool have had a much stronger emphasis on corpora rather than translation memories, and there has been more emphasis on alignment than anywhere else).

I recently met Eric Joanis, one of the developers of NRC's MT system Portage, and I continued my conversation with him and other NRC representatives as well as folks from Terminotix, the company that is now selling commercial installations of the system.

Now, when I say that the development was "independent-minded," I don't mean to say that it happened in a vacuum. In fact, the developers who have worked on the system since 2004 are eager to describe their indebtedness to leading MT researchers such as Philipp Koehn of the Moses project and Asia Online and Franz Och of -- I was almost going to write "Google Translate," but it was announced last week that Och has left Google to join genome scientist J. Craig Venter since "understanding the human genome at the scale that we are trying to do it is going to be one of the greatest translation challenges in history." While Och is not the only smart guy working on machine translation, when I talked to him in preparation for the Found in Translation book, I was blown away by his combination of smarts and humility. As a result, I would venture to guess that the speedy development of Google Translate will either slow down or, more likely, has already slowed down sufficiently so there were no longer enough challenges for someone like Och.

Back to Portage, which cleverly also stands for "Probabilistically Optimized Rules for Translation Automatically Generated from Examples" (aside from the more obvious English and French meaning of "portage"). Like most anything in Canada, there has been an emphasis on the EN<>FR translation engine -- though it would be more correct to refer to it as EN-CA<>FR-CA since the baseline engine for that language combination has been fed with all the translated Canadian parliamentary proceedings in the Hansard Corpus and the websites of the government of Canada (gc.ca).

You can take a look at that language combination right here -- but keep in mind that this is only the baseline engine, which really is not supposed to be used without further training.

As you peruse the site, you might notice a couple of interesting things about how the MT engine processes text.

Most MT engines have a problem with dealing with tags, especially the {1}tags{2} that most translation environment tools produce. Portage uses a "two-stream process" in which the tags are stripped, the machine translation is performed, and then the tags are reentered into the target text. This achieves relatively high accuracy because it can ask the MT engine which parts of the target text correspond to which part of the source text and then set the tags accordingly. Many other tools essentially restart a new segment every time they encounter a tag which is less than helpful.

(By the way, this is a strategy that could also be used by translation environment tools with fuzzy matches or assembled matches -- and I'm not sure why this is not already happening.)

You might also see that you can enter a confidence level filter. This means that every individual machine-translated segment will internally be assigned a confidence or quality estimation level, and only those that match that level will actually be machine translated. I asked how this works, but the only answer I had received by press time is that it's a complex process and someone with more details would follow up. I'll let you know once I learn more.

Looking at the above-mentioned site, one might have the impression that French is the only supported language, but this is not the case. In fact, Portage has been participating in the US National Institute of Standards and Technology's (NIST) important Open Machine Translation Evaluation for several years in ZH>EN and AR>EN and has placed relatively well in comparison to other systems before 2012. After undergoing some major changes before the last competition in 2012, however, it won virtually every single category in which it placed (you can see the results right here if you search for "NRC"). Since these results were both somewhat surprising and really important for Portage's standing in the MT community, the Portage team did very little to hide its delight as you can see in this video -- which also serves as a really good introduction into what the system is about and what the NIST competition is (at least during the first 10 minutes or so).

I talked to Larry Rogers from CLS Lexi-tech Ltd. (Canada), whose company is presently using the French Portage engine for 60% of their workload regardless of text type (with the goal of reaching 80%); the European and Asian operations of CLS Lexi-tech are using it for Danish and Chinese as well.

(I asked Larry whether his translators are happy having everything without a perfect or fuzzy TM match translated by MT. His answer: "Happy? No, definitely not. There are exceptions, of course, but as a general rule, translators are skeptical to change and resistant to changing work methods. Having said that, it was a similar challenge with TM systems many years ago, and now they're accepted like using a word processor. I suspect the same will apply to MT in the very near future. You can't hold back progress, and it is undeniable that MT will play a very big role in our industry. In many cases, it already is.")

The system can be installed on site or hosted by its reseller, Terminotix. The cost is a per-user fee with a minimum six-month commitment, and the initial training of the engine can be done either by the user or by NRC and/or Terminotix. The ongoing training (with new materials) is typically done by the users (which, according to another LSP that I talked to, "is not a trivial task. It requires some basic Linux skills and a procedure involving multiple commands.")

Just like I did with SDL, I also asked whether data that is being used for the training at any stage will be shared with anyone else and it was confirmed that that indeed never happens.

Aside from using the MT system through a pretranslation option comparable to the one mentioned above, the MT engine is also accessible through Terminotix's own Logiterm (no surprise here -- you can find my latest review of Logiterm in issue 220 of the Tool Box Journal in the archives), in MS Word through the Terminotix toolbar I've mentioned a number of times before, and now also through a newly released add-on for Trados Studio. That add-on also connects you to a large number of terminology resources that you can find listed on its page. The problem with those other resources is that the keyboard shortcut to launch the add-on is the same needed to launch the concordance search in Studio, which makes it difficult to and/or requires you to change shortcut keys. On the other hand, Portage MT matches are displayed like other MT matches in the Matches pane and, just like matches from other MT systems, can also be accessed as partial AutoSuggestions with CodingBreeze's MT AutoSuggest.

By the way, while the Terminotix add-on officially costs $95 for access to all the resources, you can also download it for free with access to Portage (if you have access to a Portage server) and a limited number of other resources, including IATE and Linguee.

Aside from all that, Portage offers a SOAP interface that allows for integration with any other tool or platform.

My hope is that Terminotix, the company marketing Portage, will find creative and positive ways to market the tool beyond the Canadian border. In the past, Canadian tools have not always been so successful outside of Canada. While it's extremely enjoyable to talk to Canadian developers and vendors who lack the hyperbole of so many US and "Americanified" companies, they need to develop a good marketing-savvy middle ground that will ensure their products' success.

4. New Password for the Tool Box Archive

As a subscriber to the Premium version of this journal you have access to an archive of Premium journals going back to 2007.

You can access the archive right here. This month the user name is toolbox and the password is lamems.

New user names and passwords will be announced in future journals.

The Last Word on the Tool Box Journal

If you would like to promote this journal by placing a link on your website, I will in turn mention your website in a future edition of the Tool Box Journal. Just paste the code you find here into the HTML code of your webpage, and the little icon that is displayed on that page with a link to my website will be displayed.

If you are subscribed to this journal with more than one email address, it would be great if you could unsubscribe redundant addresses through the links Constant Contact offers below.

Here is one website that mentioned the Tool Box Journal recently:

www.ltcinnovates.com 

Should you be  interested in reprinting one of the articles in this journal for promotional purposes, please contact me for information about pricing.

2014 International Writers' Group