| 1. Training the Machine | |
I spent this last week training a rules-based machine translation engine (in this case, PROMT -- Premium subscribers with access to the archives can read about this in the 184th Tool Kit), and I finally understood a rather fundamental truth about MT (I've never claimed to be particularly particular quick-witted!).
The translation results out of the box (language direction English into German) were really rather pitiful. And while they got better after changing some basic settings (such as the text genre and form of address for the user group), they still were not good and not much of a productivity boost for any translation effort. Plus, they were decidedly worse than what Google Translate would have given me.
But then I set out to work on it. I imported large and high-quality translation memories that the client had given me, I extracted terminology from the source texts that I translated based on the data in the TM and defined grammatically, and I imported and fine-tuned glossaries. Now the result was better, but still far from what it was supposed to be. So the rest of the week was spent "translating" the source texts with the MT engine at its current state, hunting down terms that were not at all or incorrectly recognized and entering them into the dictionary (and "entering them" here does not just mean sending source and target translation, but defining them grammatically as well), then retranslating, hunting down more terms, and on and on.
At some point it was fun to see the results getting better each day, and at the end of the week the result was certainly better (though maybe less idiomatic) than that of Google Translate. Assuming that the final product will be a translation of "high" quality, I would guess that the post editor will save somewhere between 20% to 40% of the time that it would take to translate the text within the traditional TEnT environment.
Clearly this is not something that would make sense for a smallish kind of project, but for a very targeted project with a few hundred thousand words this can be large productivity boost. (And if the goal is to deliver "only usable" translation, the saving would naturally be higher.)
But here is what I learned. I know that more and more translators are using tools like Google Translate or Bing Translator as productivity boosts. Aside from the confidentiality issues (your clients probably will not be too thrilled to see you uploading their data to Google or Microsoft), the results might look like decent productivity boosts at first glance (this obviously depends on your language combination and kind of translation you're doing). But if you are inclined to use MT as a productivity tool and you generally work in the same domain, you will achieve greater boosts with MT tools that you can train. In turn, however, this means that you will have to make a heavy investment of time before you see any results.
(And if tools like PROMT ever become widely used translation productivity tools, they might also change some things about their user-friendliness and the efficiency with which external data, in particular TM data, is being used.)
I did have a talk with a representative from the market-leading rules-based machine translation tool Systran this week -- more on that in the next newsletter.
|
| ADVERTISEMENT | |
Alessandra Di Pofi - Technical Translator / Localizer:
"Great, great, great CAT! I simply love memoQ. I have been using it with no exceptions since then. Maybe the most translator-friendly CAT on the market!"
|
| 2. Review of "Reviewing Reviewing" (Premium Content) | |
In the last issue of the newsletter, I reported on the dearth of good review features in translation environment tools. These features include a) a user-friendly display of changed items (possibly along the line of Word's Track Changes feature), b) more advanced tracking capabilities so that one can track the progress of edits through multiple review passes, and c) the ability to have the very final text rather than a preliminary form reflected in the translation memory.
These are all no-brainers considering the typical workflow of translation, editing, proofreading, possibly desktop-publishing, and then another review. While some of these features are available in some browser-based tools (Wordbee, XTM, one2edit, Translation Workspace), they are virtually absent in desktop-based tools. (Note that both SDL and Kilgray have announced the introduction of some features for their desktop tools for later this year.)
That's why I was very happy to receive a response from a representative of the Austrian company and Trados reseller Kaleidoscope about its development of globalReview, a tool that does everything I described above, plus a little more.
I had a nice meeting with them this week, and this is what I learned:
globalReview is not primarily built for the freelance translator (though it does offer a SaaS -- Software as a Service -- solution that might at some point make that possible). Instead, it's designed for medium to large language service providers or translation buyers who have a high enough output of documents to justify the cost (you'll need to purchase a server, licenses per project manager and language, and -- if you use it for InDesign -- a license for a third-party product, but more on that later).
The idea of the product is to allow already-translated Trados .ttx and .sdlxliff to be displayed in a tabular, web-based interface for which an administrator gives out certain role-based rights. For instance, it can be set up so that a reviewer (editor) can only review each segment and enter edits in a separate third column, where only the translator can accept those edits. The edits themselves can be displayed in the Track Changes-like fashion that is so familiar from MS Word, but they're also logged in a separate file so that each time an edit is made, the project manager can track back who did what. Of course, there is also a commenting feature and various filter variants so that you can view only rows with a certain status. (Another feature that I thought was interesting was the ability for the translator to accept -- or reject -- a change that a reviewer had made but still use the old translation in the TM because it might actually be more useful as a linguistic asset.)
Once the file is in its final stage with no more edits to implement, it is brought back into the Trados environment where it's sent to the TM in its final form and then output to the original format.
If InDesign files are an important aspect of your workload, and you don't mind spending another 15,000 Euro or so, it is also possible to add the FlyerEx engine. This engine allows you to export an XML format out of InDesign that maintains all formatting data, convert it into .ttx or .sdlxliff files, translate it within Trados (or any other supporting TEnT), and bring it into the web-based interface. In contrast to all other source formats, you now have the option to edit the text in a web-based InDesign replica with all formatting and images intact. (Alternatively, you can also edit in the tabular interface or you can switch back and forth.)
This WYSIWYG (what-you-see-is-what-you-get) interface is very much like working in one2edit, only here you don't actually use the InDesign server. The benefit is that there are no language limitations for this tool (unlike globalReview, one2edit does not work with bi-directional languages), but the drawback is that any actual DTP work that is carried out in the web-based interface is actually not saved in the Trados file and will have to be repeated in a local copy of InDesign at the end of the process.
In sum, it's a very interesting tool, particularly because it supports all file formats that Trados supports and provides all the review features that you might want.
|
3. Unicode Miracles
| |
Here is what Richard Ishida says about himself:
Richard works for the W3C - World Wide Web Consortium. When he's not working you can find him developing small Unicode tools, learning about non-Latin writing systems, sleeping (...)
The latter activity I can identify with, but, boy, I would love to have his passion for developing helpful and oh-so-beautiful tools. Awhile back I pointed you to his Unicode code converter, a tool that looks puzzling at first but becomes very useful if you need to convert any amount of text from one code to another. Many times that can be achieved with text editors, but sometimes it cannot -- and it's those times that make us worry. All you need to do here is paste text into the first text box, click Convert, and you'll have instantaneous conversions in all kinds of common and uncommon encodings of your text.
Yes, you say, that might be handy, but beautiful?
Check this out: under Unicode Block, select any of the options. It's OK to be daring -- don't just stay with Greek or Latin. Scroll down a little and select some of the South or Southeast Asian scripts. Or how about Phags-pa or Mandaic? And then simply marvel at the beauty of human written expression. (Just don't select Phaistos Disc -- we'll talk about that separately.) I really think it's meaningful to scroll through these samples to remind ourselves of the beauty of what we do. We may not work in any of the super-fancy scripts, but in the end we're all doing the same, transferring information between languages and scripts -- and that's beauty in its own right. (Thanks so much to Peter Reynolds for pointing this site out again.)
Of course, there are also practical reasons for this page, such as the ability to copy the characters down below into the text field in the upper right-hand side and copy and paste them into your documents, or the ability to find out the underlying coding for each of the characters. (One thing that I'm missing is a list of fonts for some of the extra-rare languages. Let's ask Richard to consider this for the next release.)
|
| 4. Cerenkov Radiation | |
We have had a number of success stories with translation technology over the years. Certainly Trados and Wordfast come to mind, IntelliWebSearch also has to be up there somewhere, and memoQ is hard on its way. But no success story has impressed me more than that of ApSIC Xbench -- a tool that most service providers nowadays assume will be installed on the translator's computer. (I must have worked on three different jobs for LSPs last month where one part of the instructions included the use of Xbench -- almost on equal footing with MS Word or a browser.)
Xbench is a tool that allows you to read an incredible long list of file formats that contain glossaries, termbases, and translation memories and search them at the speed of Cherenkov radiation -- for the molecularly uninitiated, that's really, incredibly fast. It also allows you to perform a number of formal (language-unspecific) quality assurance processes on these files.
Just yesterday, a new version of Xbench (2.9) was released into beta. Before its release, the development team actually queried the translation community about what kinds of formats were still needed, and as far as I know, virtually all wishes were granted. No wonder, the doubters might say, considering the ridiculous prices that we pay for software nowadays. Well, not for this tool. It's (still) free!
Here is the never-ending list of supported file formats:
- Tab-delimited text files (*.txt)
- XLIFF files (*.xlf, *.xlif, *.xliff)
- TMX memories (*.tmx)
- TBX/MARTIF glossaries (*.xml, *tbx, *.mtf)
- Trados exported memories (*.txt), MultiTerm 5 glossaries (*.txt), MultiTerm XML glossaries (*.xml), TagEditor files (*.ttx), Word uncleaned files (*.doc, *.rtf), Trados Studio files (*.sdlxliff, *.sdlproj)
- SDLX ITD files (*.itd), translation memories (*.mdb)
- STAR Transit 2.6/XV directory tree
- PO files (*.po)
- IBM TranslationManager/OpenTM2 exported dictionaries (*.sgm), installed and exported folders (*.fxp), exported translation memories (*.exp)
- Wordfast memories (*.txt), glossaries (*.txt)
- Wordfast Pro TXML files
- DejaVu X/Idiom files (*.wsprj, *.dvprj), translation memories (*.wstm, *.dvmdb)
- Logoport RTF files (*.rtf)
- Microsoft software glossaries (*.csv)
- Mac OS X glossaries (*.ad)
- Remote Xbench Server glossaries
Whether we're an industry or not, I feel really honored to work with such generous folks in our field of endeavor.
|
| ADVERTISEMENT | |
PROFESSIONAL TRANSLATOR'S MENU FOR MAY 2011
Join Social Network for Translators: www.langmates.com
Automate your translation business: www.to3000.com
Start your own translation agency: www.projetex.com
Discover true word counts: www.anycount.com
|
| 5. Interesting Finds (Premium Content) | |
1. Normally, developers are not particularly good with words. One of my hobbies is collecting the "interesting" product names created by developers. Consider two of my favorites: Rname-it and 1-4a Rename.In my book, these take the cake for names in greatest need of going back to the drawing board.
However, in an atypical case, the developer of these utilities to export text out of Photoshop .psd files into a text file and back could also have had a career as a marketing writer. Here is how he describes his tools:
What if you could extract all text strings from a PSD file into a TXT file? Sure, it's possible, thanks to PS_BRAMUS.TextExport, the PSD2TXT script I wrote a few months ago. Now, what if you wanted to do that in the opposite direction and import strings from a TXT file into a PSD file (viz. TXT2PSD)? Look no further, PS_BRAMUS.TextConvert is here, and does both!
I haven't tried it, so I'll leave it up to you whether it indeed does all the wonders it describes, but it would certainly be nice (and inexpensive) if it would.
2. I know that I've written about ASAP Utilities a good number of times over the years, but I just made the plunge last week and actually paid for the professional version -- and I felt really good about it after all the ways this tool has helped me in the past.
ASAP Utilities must be the most exciting Excel add-ons ever. It's the kind of tool that allows you to sample all the possibilities of Excel and actually use them without being an expert.
So many of the text-related (and other) tasks in Excel suddenly become so much easier with one of the more than 300 different utilities. Some of my favorite functions include the ability to count characters in individual cells, helpful formatting and selection functions, and filtering options that I did not even know existed.
ASAP Utilities shows up as a separate menu in Excel 2003 and as its own ribbon in Excel 2007 and 2010.
If you can't find any other good use for this tool, simply select Start: Funny (error) messages. Messages appear such as "No error . . . yet" or "Are you working hard?" And when you click on the latter, the resulting "Nonsense, you're playing with these silly messages" might add a sense of levity to your otherwise, well, hard work.
3. One of my annoyances with MS Word has been that I could not batch delete comments. Well, silly me, of course I can! In fact, in Word 2007 and 2010 it's one of the commands under the Delete Comments button in the Review ribbon. In earlier versions, simply do this: Select Edit> Replace (or press Ctrl+H), click the More button to see the extended options, click on Format> Style and select the Comment Reference style, leave the Replace With box empty, and click on the Replace All button. That was easy!
|
| 6. A Love Story (continued) | |
I already tipped my hand that I'm going to talk about the characters on the mysterious Phaistos disc, but it certainly is a worthy addition to the family of remarkable characters that I've been collecting over the years.
If you have a Javascript-enabled browser, hold your cursor over the character for a definition.
|
7. New Password for the Tool Kit Archive
| |
As a subscriber to the Premium version of this newsletter you have access to an archive of Premium newsletters going back to May 2008.
You can access the archive right here. This month the user name is toolkit and the password is privilege.
New user names and passwords will be announced in future newsletters.
|
| The Last Word on the Tool Kit | |
If you would like to promote this newsletter by placing a link on your website, I will in turn mention your website in a future edition of the Tool Kit. Just paste the code you find here into the HTML code of your webpage, and the little icon that is displayed on that page with a link to my website will be displayed.
Here is a website that added the Tool Kit link this week:
www.nononsensetranslations.com
© 2011 International Writers' Group
|
|