Issue 15-9-253
(the two hundred fifty third edition)  
1. The Tool Box Journal and InterpretAmerica Present: The Tech-Savvy Interpreter
2. Wordslow
3. Smart as a CAT (Premium Edition)
4. . . . Dealing with Image-Based PDFs
5. XTM 235 (Premium Edition)
6. New Password for the Tool Box Archive
The Last Word on the Tool Box

I'm so pleased that Katherine Allen and Barry Slaughter Olsen from InterpretAmerica have agreed to write a regular column for the Tool Box Journal.

No, let's try that again:

I can't believe how lucky you and I are that Katherine Allen and Barry Slaughter Olsen from InterpretAmerica have agreed to write a regular column for the Tool Box Journal.

It's going to be great.

Traditionally there's been a very distinct differentiation between translators and interpreters (I'm the first to confirm that I am not an interpreter), but for many the distinction is not so clear. In many languages there is not even a different term for the two professions (see above), and it's also clear that some of the differences between translators and interpreters are fraying. (Anyone translating tweets in real-time? Or would that be written interpretation?)

And even for those of us who feel quite comfortable staying where we are without venturing into the Other -- it's going to be a blast to hear what's happening over there technologically.

So, welcome Katherine and Barry with me! 

1. The Tool Box Journal and InterpretAmerica Present: The Tech-Savvy Interpreter 

A column about interpreting in a translation technology newsletter? Really? Why? Well, much like translators did at the turn of the 21st century, interpreters are seeing their work and workplace change and expand because of technology. And let's be honest, there are many translators who also interpret. If you are one of them, or moving towards becoming an interpreter too, then this column in the Tool Box Journal is for you.

Today, technology is a huge component of any successful translation practice. This is quickly becoming the case for interpreting as well because the influence of technology on interpreting is a rising tide that interpreters need to understand to remain competitive and successful.

In the coming months, this space will provide news and insight about a wide range of technical aspects of interpreting, from the underlying technologies that are fueling the growth of remote interpreting like standards-based video codecs and WebRTC to specific hardware and software that help interpreters become more productive and make their job easier, like terminology databases, web-based interpreting platforms, new headsets, smart pens and tablet computers.

Do you have a question about a specific technology? Or would you like to learn more about a specific interpreting platform, interpreter console or supporting technology? Send us an email at

2. Wordslow

For a long time, Atril was the company among vendors of translation environment tools that many liked to mock (a little, and very kindly at that) for the long time it took to release promised upgrades to their software. Well, I think we can clearly pass that honor to Wordfast now, or, more specifically, the team around Wordfast Pro (and we also do it kindly).

Wordfast Pro 4 was announced with great fanfare at the ATA conference in San Antonio in, yes, 2013. Just now, two years later, I had a talk with Wordfast sales and marketing manager John Di Rico, who is quite confident that it will be released in January of 2016. We shall see. (But, again, we want to stay kind.)

The version that I looked at and that we talked about (which actually already carries the version 4.5 designator) is in beta or maybe even pre-beta stage, so I won't bore you with the bugs or not completely developed features that are apparent, but it still gives a pretty good idea of what the final product will be like.

I asked John what he thought the outstanding features of the tool are, and he pointed out the ease of use (easier than other tools and easier than the previous Wordfast Pro 3) and the low price (if it stays the same as it is presently, the regular price will be 400 euros and the update for existing users of Wordfast Pro 3 will be free). I can see where he's coming from. While I'm not sure that it is indeed easier to use than all other tools in all of its aspects, it's simple enough once you understand some of its core principles.

Like most other tools, Wordfast Pro 4 will no longer have the traditional menus but a ribbon interface and a different and fresh-looking one at that (though it's unfortunately not very customizable). This is what the ribbon interface looks like with an open translation file ("TXLF Editor"):

Wordfast 4

You can see the other views listed in the Wordfast 4 menu, through which you can also cycle with the Alt+W keyboard shortcut (that's essentially the central trick you'll need to know when working with this version of Wordfast).

You can also see a couple of other new features in the screenshot, including the bilingual table, a two-column Word file that allows you to translate and/or edit outside the Wordfast environment, and the preview which allows you to preview ongoing translation in its native environment (Word, InDesign, web browser, etc.). One response to those features would be: Great! I would expect that response especially from users of Wordfast Pro 3. The other response might be: Don't essentially all tools have those features?

And I suspect that these responses will predominate once Wordfast Pro 4 is released. Aside from the great usability and the (continued) low price, I could really not see any groundbreaking new feature in the release that I looked at. I confess that this disappointed me a little because we had been promised a bit more. When I asked tool vendors in the webinars I did in 2014 about a number of features, Wordfast's response was often: "It will be there in Wordfast 4" (such as generic terminology resources, subsegment matching from MT, automatic fixing of fuzzy match segments using sub-segment TM and MT suggestions, specific user profiles for different project participants, exportable keyboard shortcuts), and I see only very few of those promises in this new version (none of the ones mentioned just now).

Here are some new and important features, though:

  • The translation format is not the odd TXML format anymore but a standard XLIFF format (which can be easily processed in tools like Dj Vu, memoQ, and Trados Studio -- in the latter even with preserved WYSIWYG functionality)
  • AutoComplete from complete MT and TM matches as well as TM subsegments
  • Package export and import for projects to ease communication between PM and translator
  • WYSIWYG display for the most common formatting choices
  • Filtering on segment status
  • Templates for projects

So, all around a great catch-up to other tools, but nothing where I would say that this really makes Wordfast Pro stand out. And the reason that I hark on that so much is that I think it's strange because the other Wordfast products -- Wordfast Classic and Wordfast Anywhere -- do have features that make them stand out. Wordfast Classic, for instance, uses AutoSuggest with subsegments from multiple MT engines (and has done that for quite some time) and Wordfast Anywhere -- well, you'll read about that elsewhere in the newsletter.

3. Smart as a CAT (Premium Edition)

I can vouch for that: Cats are smarter than dogs. Yes, I have to admit that I have given up my disdain of dogs. I love my dog (along with my cat) with all my heart -- but he ain't smart. Yesterday we went out to the beach (as we do every day) and he found a gigantic seven-foot sea lion lying in the surf (typically not a good sign for the health of the animal). My dog ran up to bark at the ailing creature and the poor, sick sea lion, probably for the last time in his life, reared himself up to twice the height of my dog, barked back, and then slid back into the ocean. My dog (with the unfortunate name of Chase) spent the next 20 minutes swimming and diving in the surf, searching for his new acquaintance so he could continue the posturing -- with no such luck (in any sense of the word). Today we found the poor sea lion dead on the beach, and what does my stupid dog do? Turns around and looks at me with a grin to say: See what I did?

Cats on the other hand? Check out these real-life stories of what your office would be like if your CEO were a cat. Smart!

Anyway . . .

I touched base last week with Anna Sidorova, the marketing chief at ABBYY Language Services, about SmartCAT. I've written in length here about SmartCAT before (see edition 240 of the Tool Box Journal) and in a review in MultiLingual (if you want to subscribe for free to the digital edition for the next six months, go here and enter the code that you'll find on that page -- you're welcome!), so I don't have to go through the whole rundown of what it actually is and what its features are again. Instead, I would like to reflect on where (I think) SmartCAT stands after some time on the market, what that means for all of us, and then look at some of the new features.

I have spoken rather highly about SmartCAT (and still do). It's a well-designed cloud- and browser-based translation environment tool with an impressive feature set and a virtually unbeatable price tag for freelance translators -- free (unless you want to use machine translation and PDF conversions, for which you'll have to pay after 100 free pages).

Based on that and some (almost) unique features -- such as the OCR conversion of image-based PDFs and graphic files into readable text -- I had the sense that SmartCAT could become a real game changer. The numbers that ABBYY prides itself on (more than 40,000 registered users and 5,000 to 7,000 users on a daily basis) sound impressive but, to be honest, I haven't seen much of that in my conversations with other translators or LSPs. Now, this doesn't mean that the numbers are not correct -- I'm sure they're accurate -- but the world of translation is complex and multi-faceted. Anyway, from my perspective SmartCAT has so far not been the game changer I'd hoped it would be, though that could certainly still change.

At this time, ABBYY is still looking for better ways to monetize SmartCAT, and that brings us to some interesting considerations. As I said, freelancers have to pay only for specific services -- and, according to Anna, these payments are essentially only covering costs. LSP clients can buy licenses for project managers, but ABBYY is looking at other ways to make money as well. To that end, ABBYY has been actively building up a marketplace that contains thousands of translator profiles.

ABBYY is not exactly the only tool vendor following this model. Others, including XTM, Text United, MateCat (see the ad in this newsletter), and the ill-fated Lionbridge Translation Workspace have been or are trying that as well (and don't forget that Trados' TranslationZone originally had that particular goal as well). The question is whether access to these databases of translators is attractive enough to pay for and, if so, how it should be paid. One possibility that ABBYY is looking at is to process payment on their site and then ask for a commission. I'm not sure whether this is the best approach -- after all, it seems unlikely that LSPs want to disclose how much they pay to a competitor (the service provider ABBYY Language Services might be a different branch of ABBYY's organization than the tool development division, but I'm not sure this is enough to convince other LSPs).

Anyway, no final decision has been made, but I'm very interested in what ABBYY (and others) will try to do. The old licensing model of initial license purchase followed by regular updates seems to be largely on the way out, and how exactly software is being paid for in a multi-tiered industry like ours is not only interesting in an academic kind of way but is going to be of a very practical concern to us all.

Let's talk about a selection of the newer features in the latest version of SmartCAT.

  • SmartCAT is one of the few tools that offers morphological processing (the ability to recognize different forms of one word as the same word), and it actually does it for more languages than anyone else, namely 38 in total. (I'll follow up with a list of those on my Twitter account -- it's too late to get them before the Tool Box Journal goes out). It has now added another 30 that you can process in SmartCAT even though they're not morphologically supported.
  • If you use machine translation, tags are now automatically inserted in the correct slots in the machine translation output (which, technically speaking, really is not so hard to do and should really be done by all tools).
  • It's now possible to upload a new version of an already translated file and have only the latest changes flagged for translation. Next week, SmartCAT is releasing a "hot folder" process in which a local directory is watched for changes in files and an automatic update is executed in SmartCAT when those occur.
  • An API is now (and has been for some time) available to embed the system into your own system or at least communicate with it.
  • And the directory of translators in the above-mentioned marketplace is more deeply integrated so that LSPs (or translation buyers) can look right within SmartCAT for translators or publicly evaluate translators once they complete a project with them.

I spent some time looking at the directory and who is listed in there (in part to find an answer to my questions about SmartCAT's impact). One thing that particularly interested me were the prices that translators asked for. The vast majority of rates I found unacceptably low. But in the dozen or so language combinations that I looked at, I always happily found a few translators with decent rates. (And I even found a few names that I knew through social media and other avenues.)

Oh, and I almost forgot: The OCR service was "updated for improved recognition of text in image files." Anna actually mentioned that as the very first thing when we talked because some time back I had complained about a rather lousy result that I'd encountered with SmartCAT (as well as with other tools). But let's talk about that in . . .


4. . . . Dealing with Image-Based PDFs

When I talked about Trados Studio 2015 in the June edition of the Tool Box Journal, I mentioned how different tools deal with image-based PDFs and how difficult it was to find a halfway decent result.

First, though, it's important to remember that the vast majority of PDF conversion tools just don't deal with image-based PDFs at all. Period. (That's actually three periods.) (Now it's four.)

The ones that do, employ an internal process of optical character recognition (OCR). ABBYY SmartCAT naturally has a real leg up on that since ABBYY is the owner of PDF Transformer, which many like as the best tool in the class of stand-alone PDF converters. But does that automatically mean that SmartCAT is the best tool?

I did some testing with an admittedly difficult document -- my own bibliography -- which contains a number of different languages and a number of made-up words (don't ask!). I converted the first page into a graphic and then had the tools have a go at it.

This is a section of the original:

OCR Original

Here is what the latest version of ABBYY Transformer+ made of it (an earlier test with an older version of Transformer was very poor):

ABBYY Transformer

You can see that in general it did a good job. It did not like words like "lobotomy" (who does?) and other strange words; it had a difficult time when formats were switched (see the "mATA" rather than "in ATA" in the first line); and it had a hard time knowing where to use commas and where to use periods.

Here is what we get from ABBYY SmartCAT:


Overall I would say it's about the same, with slightly different errors but overall very acceptable.

Now, to give you an appreciation of the quality, let's see what the latest version of Adobe Acrobat (DC) does with its integrated OCR capabilities:


Clearly a lot more errors, and many of you would agree that it would be easier to retype the text rather than correct it before translating it.

Or how about this from Trados Studio, which also boasts OCR capabilities:

Trados Studio

It's sort of humorous that the only reason it got better halfway through the mess is that it essentially gave up and just placed an image starting with "Good Right" -- well, it's neither good nor right, I'm afraid.

But the most surprising result comes from a tool that you wouldn't really expect it from (at least, I didn't): Wordfast Anywhere.

Look at this:


I would call this really, really good. It capitulated with terms like "TEnT" or "Craze," but my sense is that this is maybe the best of all results. And what's behind it? I have absolutely no idea, but Wordfast's Yves Champollion promised to tell me over a few beers sometime. So, I'd like to go on record to say that I'm ready to sacrifice myself for you, and I can't wait to drink with him and find out.

Here is one other curious thing that was introduced in the latest version of Acrobat. It's a feature that's not helpful for translation but might be helpful for last-minute touch-ups: Acrobat DC internally converts image-based documents into editable PDFs by essentially faking the look of fonts that it does not have installed or characters that in reality it does not recognize. And if you enter new characters, it will also try to emulate the look of the previous text.

Our particular example does not look too impressive, but I tried with other text containing fonts that I didn't have installed and it worked quite well. Here's how it looks with our example inside Acrobat (note the text that I entered at the end and which was automatically entered in the correct font and style):

Acrobat Internal

5. XTM 235 (Premium Edition)

Can you believe that the license plate of my sad and broken Buick actually carries the name of a translation environment tool? If that's not fate, I don't know what is. Now if I could only figure out what the "235" intends to communicate to me . . .

It's nice to hear when additional features are introduced in any given tool to help existing users go to the next level by providing feature parity with competitors. But when it comes down to it, that doesn't really interest me as much as when I see new concepts being developed or existing concepts being broadened and developed.

I had a chat with the folks from XTM this past week, and I enjoyed it because there was a mix of both categories of features.

Take, for instance, XTM's new right-click context menu. Would any user of a desktop tool would think that's even remarkable? Not so much. But for a browser-based tool it's a different story altogether, and it's a welcome addition in the new version 9 of XTM.

Oh, and if you don't really know the first thing about the browser-based translation environment tool XTM, I've written a number of articles in the Tool Box Journal in the last few years that you can find in editions 117, 174, 220, 234, and 247 (if you're a premium subscriber, that is).

Alignment -- the ability to convert documents that were translated outside a translation environment into a translation memory -- doesn't really sound like such a hot thing to most people. After all, most tools already have something to offer in that category. Andrzej Zydron and his XTM team did something interesting, though. They went back to the beginning of the technology, looked at the underlying assumptions back then, and analyzed what wasn't working so well (and anyone who has ever aligned documents knows that there are definitely things that don't work well with alignment).

They concluded that the only way to really and dramatically improve alignment would be with the help of very comprehensive lexicon data, which they acquired in 50 language pairs as the basis of their alignment. That's a very smart approach that I wouldn't be surprised to see others follow as well. You can read the process of how they got there and what languages are supported in this well-written and informative post by Andrzej.

Reading the post reveals that any alignment process results in two different files: one that lists all segment pairs with at least 90% confidence (the confidence is assigned on the basis of the lexicon data, as well as items like function words, numerals, named entities, and of course punctuation) and another file that contains every segment pair and therefore needs further editing. That's smart as well. If you're in a hurry, you just take the "good" file and import that into a TM; if you really want to squeeze everything out of a text, you take the file that might need some work.

But let's talk about the work part.

Andrzej likes Excel. He has his reasons, including that everyone has a copy of Excel or its equivalent, but I happen to think that Excel is not a great editing environment. Unfortunately, the alignment results of both kinds are output in Excel (Excel is also the tool of choice for the offline work environment for XTM). Yes, you can do many things with Excel, but at its heart it's still a number-crunching program, and support for anything with text is sort of, well, not so great.

Here is another new thing that was introduced with XTM 9: preprocessing. The preprocessing of a source text comes into play when a) the text is so badly written that it makes a lot of sense to make a translation from something like Chinglish, Hinglish, or Denglish (or is that Germlish?) into English (or from super corporate-speak into understandable language or vice versa) before sending it to translators in the 25 required languages; and b) to introduce the concept of pivot languages. Pivot languages are used when a text is produced in, say, Chinese and has to be translated into 12 languages, and it's just a lot easier (and cheaper) to find the English> Finnish translator than the Chinese> Finnish translator. (Please don't send me emails: I'm sure there are plenty of excellent ZH> FI translators!)

None of this is essentially new, and it can be done with virtually any tool, but it is new that a project can be set up to serve exactly those purposes, and the automatic workflow will then push the different stages within the right language combinations to the correct parties.

The Visual Editor is also new. As in most other tools, XTM's translation interface is tabular and independent of the original file format. In a way that's good because you just don't have to worry about things that are not directly related to the text. You and I also know that it is sometimes helpful to see text in its actual environment, though -- partly for context purposes, but also because we need to know whether the particular segment is a heading or a bullet point or something altogether different. The majority of tool vendors have addressed that problem with a on the side or altogether separate WYSIWG (what-you-see-is-what-you-get) preview that reflects your translation in real-time if you choose to enable it. The XTM developers have chosen a different path: In XTM you can select to actually work right within the respective WYSIWYG interface and still have the tools that present you with TM, MT, and term matches presented right in that view.

Presently this is only enabled for HTML and XML files with stylesheets, but it will also be available in January (for version 9.1) or April (for version 9.2) for Office and InDesign files.

We'll see whether this will make translation work more productive or more accurate (please let me know if you have gathered any experience already). To me it seems that this will be a feature liked by some and avoided by others.

But isn't that the way it always is . . .  

