The 194th Tool Kit - Premium Edition

A computer newsletter for translation professionals

Issue 11-6-194
(the one hundred ninety fourth edition)

Contents

1. The Hardest Working Dictionary (Premium Edition)

2. Precious

3. Trips and Ticks

4. 84000

5. Machine Translation Brain Hurts

6. New Password for the Tool Kit Archive

The Last Word on the Tool Kit

2029

Ever since the release of the apocalyptic movie 2012, many of our high school students here in town have been convinced that the world will come to an end next year. They feel very strongly about it -- so strongly that I've given up talking to them about it.

The difference between them and famed futurist Ray Kurzweil is that Kurzweil has a formidable brain behind his predictions -- and, as far as I know, he doesn't believe that the world is coming to an end in 2012. However, he does believe very strongly that the world is changing all the time, in far more radical ways than you and I might think is possible or even reasonable.

Common Sense Advisory's Nataly Kelly sat down with Ray last week to talk about translation, translation technology, and us, and you can view the interview at the Huffington Post. So what's ahead for us, according to Ray?

By the year 2029, Ray predicts that computers and human translators will be equal in their translation ability. He derives this from his long-held assumption that computers will at that point pass the "Turing test," a benchmark where humans and computers will be indistinguishable in an interview situation. Since that requires language processing, the language processing required for at-par-with-humans translation will also be there. Still, he says, no reason to worry (and I'm slightly paraphrasing): there will be plenty of other opportunities for humans within the field of language. He reminds us that the synthesizer has not managed to replace traditional musical instruments and musicians.

I really don't have much to add to all of this (I think I did mention the "formidable brain" factoid above), but I sure do know that we're very much needed right now, and in a pretty traditional way at that.

One thing we're not good with -- and, yes, I've mentioned this many, many times before -- is using words to clearly express what we actually do. This very clever and funny video clip by CSOFT's Zachary Overline makes it very apparent that "localization" -- or the Chinese equivalent 本地化 -- does not foster communication. And the term CAT? Well, see for yourself.

1. The Hardest Working Dictionary (Premium Edition)

I'm not sure that this is true (actually, I'm not sure what it actually means), but the Global Glossary that goes with that byline is certainly an interesting project.

I learned about it through a posting at tr_jobs, the translation and interpretation jobs list that has been run faithfully for many years by Czech translator Radovan Pletka. And maybe it's no surprise that it was he who called attention to the Global Glossary since its creator, Michal Boleslav Měchura, is Czech as well, though he lives in Ireland (and speaks Irish!).

Unlike other dictionaries, this does not consist of direct user contributions; instead, it consists of 48 bilingual glossaries from open-source dictionaries such as Wiktionary, OmegaWiki, FreeDict, and various others in 22 languages. Michael's plan is to "keep adding new languages and eventually to cover all the languages of the world -- or at least those for which freely available glossaries exist."

If you would like to contribute to the dictionary, you can do so by contributing to its above-mentioned sources, which in turn will then be brought into the Global Glossary. If you know of open-source dictionaries that should be imported, let Michael know -- you will find his contact information on the website.

I found it intriguing that this glossary was not built for translators or by a translator, and yet it has productivity features that can be helpful for many of us, especially for translators in less-well-supported languages, including the upcoming ability to download large dictionary data files (unfortunately not in standard TBX format).

I had a chance to ask Michael about his reasons for creating this resource and talk with him about his future plans. He said:

Quite frankly, I designed Global Glossary for myself. I am not a translator, but I use several languages in my daily life, both for work and socially. This means I'm often having to translate things in my head, to think how to express in one language a thought that originated in another. Sometimes, when you're looking for ideas how to express something, the size of the glossary you have at hand can make a big difference. So I said to myself, wouldn't it be cool if I could build a huge database where all the lexical data I can get my hands on would be accessible in one place, without having to hunt up and down the Web every time I want to look something up? (. . .) In fact, I have huge plans for what I'm going to do in the future with all the data I will have collected. Once you have a large collection of bilingual glossaries, there is no end to the amount of exciting ways you can repackage and redistribute them: browser plugins, mobile apps, what have you. This is an area I'm definitely going to look at in the near future.

In response to my question about a collaborative feature that allows users to evaluate and enhance data, he said:

I did originally think I was going to build a website where people could not only search bilingual glossaries but also edit them collaboratively. I decided against it, though, at least for the time being, and for two reasons. Reason number one, to build and run a good crowd-sourcing website where content prospers and quality emerges, that is a big task, a whole new ballgame, and not one I'm ready to get involved in right now. Second reason, I didn't want to fork data away from already-established crowd-sourcing places like Wiktionary and take the wind out of their sails. So I decided I'm going to be happy with just recycling data from elsewhere for now.

When asked about "big" versus "small" languages -- prompted by his interest in languages like Irish -- he said:

There is no particular emphasis on small languages here. If anything, I could be accused of the opposite! I'm sure you've noticed that the first languages I added to Global Glossary were big major ones: English, French, Spanish and so on. I decided to start with large international languages and then work my way down the imaginary "language pyramid" (with occasional side trips prompted by personal interests and prejudices -- for example, I added Czech pretty early, even though it's by no means a major international language).

In fact, I think I'm going to hit a problem once I start reaching the bottom of that pyramid. For many of the small languages, the glossaries you can get from places like Wiktionary are not only small but also of questionable quality. I guess it's a simple question of numbers and critical mass. In large languages, a lot of people contribute to these collaborative dictionaries, a critical mass is reached and quality sort of emerges by itself. In smaller languages, though, only very few people contribute and the quality of what emerges from that can be literally all over the place. This is in fact one of the reasons why I haven't imported Irish into Global Glossary yet, even though it is one of my favorite languages in the world. The glossaries you get in Irish from Wiktionary, OmegaWiki and FreeDict are a bit shaky: lots of misspellings, stuff that's obviously been thrown in by people who don't even speak the language, words that aren't even in Irish. In large languages these things would be weeded out by the rest of the community but in small languages not. So, even though I have a lot of respect for the collaborative dictionary-writing model, I have to admit it doesn't seem to work well in small communities.

So, why do I give this project so much room in this newsletter? Because it's a project that seems to be run by someone who feels very passionately about it, really would like to hear about other data sources he could include that fit the framework of this project, and would certainly be open to suggestions that could make this project even more usable for professional translators, particularly those of languages with fewer resources.

It's one of those projects that aims high and does it on a good foundation -- sort of like the new Tower of Babel that was recently erected in Buenos Aires. The difference from the first one? This one was finished and then taken down voluntarily.

ADVERTISEMENT

SDL Trados Studio 2009 Sale!

If you have been waiting for the right price to invest in Studio 2009 then don't delay!

Until 30th June we are offering you an amazing 40% off new licenses and 30% off upgrades, so make sure you don't miss out!

To buy or upgrade visit -- www.sdl.com/promo/toolkit6

2. Precious

Most of you will have heard that Google has officially announced the re-introduction of its API for Google Translate after the massive public outcry (or maybe because it was their plan all along) over its announced ~~depreciation~~ deprecation. The difference is that the new version will be a paid service of some kind, so it will be interesting to see which translation environment tool vendors will shell out the dough for that (those that do will most certainly not end up being the ones who actually pay for it in the long run).

In the last newsletter I gave some room for tool vendors to state what they thought about Google's (now: temporary) withdrawal, and it was remarkable that they seemed to be much calmer and more relaxed about the whole thing than many of us were. Looks like they knew what they were doing.

3. Trips and Ticks

In some applications it can happen that your clipboard, i.e., the place that stores copied items and from which you can paste those items, gets stuck. You might get an error message that you can't copy any "additional" items, or you just can't copy them -- all you can paste is the previous clipboard content. In earlier versions of Windows there was a relatively easy way to "purge" (that word sounds a bit too bulimic to me) the clipboard, but not in Windows Vista and 7.

If you regularly encounter problems with your clipboard and don't want to reboot your computer to empty it, you can create a link on your desktop (or anywhere else) that allows you to clean things out by simply double-clicking it. Here are the steps to create the link:

Right-click on your desktop and select New> Shortcut. In the upcoming text field enter

cmd /c "echo off | clip"

Once you've done that, click Next and enter a descriptive name for the shortcut icon (if you don't share my feelings about the word "purge," you could type "Purge Clipboard") and press OK.

That's it.

Now you can simply double-click on the icon any time you feel like (or are man-handled into) emptying your clipboard.

TMbuilder is the "the easiest Translation Memory export creator," according to its creator. I'm not sure this is true -- there are a number of tools that do similar feats and similarly easily - but it's still a helpful little tool if you have data in Excel or text-based formats that you would like to have in a TM-compatible TMX format. And: It's free!

Finally, when you're on a sales call and you want to really impress your client with cool (and meaningless) terms like "innovate synergistic convergence" or "streamline best-of-breed partnerships," the Web Economy Bullsh** Generator might be the perfect tool for you. (Thanks to Ken Clark for this priceless tip. Ken's blog recently won second place in the Language Blog of the Year competition, while first place went to the incredible twins. Congratulations, all!)

ATA will host the
International Federation of Translators (FIT) XIX World Congress
this summer, August 1-4, in San Francisco, California.

FIT's triennial conference and exhibition offers a unique educational program focusing on international and intercultural translation and interpreting. Presentations will include workshops, panel discussions, and scholarly papers from a diverse and experienced group of language professionals. The meeting is a tremendous opportunity for attendees--and ATA members--to share, learn, and network with colleagues from around the world.

The FIT World Congress offers you an excellent program of continuing education, and the host city of San Francisco is an unforgettable experience with its Golden Gate Bridge, timeless neighborhoods, and historic cable cars.

Registration discounts end June 30. Look for details and registration information online at http://www.fit2011.org/register.htm.

See you in San Francisco!

4. 84000

84000 is an auspicious number symbolising the infinity and vastness of the Buddha's teachings (...) 84000 represents and encapsulates the magnitude of the task that lies in front of us -- to translate the vast collection of the Buddha's teachings within a definite and specific time period.

So reads the announcement of the 84000 project -- a project to translate all Buddhist scriptures from Tibetan and classical Chinese into modern languages, primarily English, within the next 100 years.

Buddhism has been one of the great success stories of translation. If it hadn't been for translations into Tibetan and Chinese, the vast majority of Buddhist scriptures would have been lost when many of the original Sanskrit writings were destroyed under Muslim rule in India. Fortunately, by that time major translation projects had been completed in Tibet and China. The goal of the 84000 project, therefore, is not to translate original scriptures, but instead translated versions of the originals since this is the only way to gain access to the Buddhist teachings within those writings. Presently less than 5% of the classical Tibetan texts and only 15% of the classical Chinese texts have been translated, making the amount of work to be done enormous.

Many years ago I took a couple of semesters of Buddhology from a professor who worked with his three or four students to translate classical Chinese Buddhist sutras into German. The translation was nothing short of painful because Buddhism in ancient China had developed its own vocabulary and terminology that was accessible only to someone with a comprehensive background in Buddhism -- which most of us students did not have at the time.

Since the initiators of this project are aware of the intrinsic difficulties of translating these texts into English, they are not primary looking for volunteer translators -- they already have a number of teams working on translations -- but instead they are looking for donations to sponsor the ongoing translations.

It's great to see how the project is set up. It's very quality-driven and, quite frankly, well-paid. You can find out more about the processes right here. (Thanks to Craig Meulen for pointing me to this project.)

5. Machine Translation Brain Hurts

Geoff Koby of Kent State had an interesting comment after last week's newsletter about machine translation post-editing:

I think automated translation, if good quality, could speed up good translators by giving them a basic sentence to edit -- but the quality isn't there yet. And for less strong translators and particularly for students, automated translation is actively dangerous, because it takes high-level skill to discern the right from the wrong, and from the "sounds right but is completely off base."

What this reconfirmed to me once again was that the job description of "MT post editor" has very little to do with "translator" or even "editor." I know that I have written about this in the past, but just these past couple of weeks I've realized more than ever before how vast the difference between the activities of "traditional" editing and MT post-editing is. So that I could write about it more intelligently (if only!), I accepted a fairly large MTPE job that occupied me for ten days or so.

What I found was that the actual task was not so very different from editing someone else's translation (though I would have strongly advised my client to look for another translator), but the goal was so different. The goal stated in the job description was to produce grammatically correct German, with little or no concern for sentence structure or stylistic requirements. Consistency was necessary only when user interface terms were mentioned that had to match the actual software terminology (I was working on software instruction), but I was instructed not to worry about any other textual inconsistencies.

It was hard to get used to these instructions, but if I went overboard with editing at first I was the only one who had to "pay" for that so no harm was done to the project. At some point I got into the groove, developed an eye for true offenders, and left the other stuff as it was -- understandable but not pretty.

The problem was the next job: a "normal" translation job that required what we're all used to doing -- idiomatic language, consistent terminology, and an adequate and consistent style. That's where I struggled. I spoke with a colleague about that experience earlier today and she said: "Well, yeah, it's like code-switching!" And that's exactly what it is. It hurts the brain.

OK, maybe it doesn't hurt the brain, but I spent much more time after returning to my first "real" translation job than I would have usually -- I questioned and re-questioned everything because I knew that my senses had been corrupted by the MT post-editing. I know I'm back to "normal" now, but it turned out I needed not one but two brain washes for one MTPE job.

MT post-editing will get much larger than it is today. But the folks who do this kind of job will have to more or less do only that -- switching is hard and dangerous.

6. New Password for the Tool Kit Archive

As a subscriber to the Premium version of this newsletter you have access to an archive of Premium newsletters going back to May 2008.

You can access the archive right here. This month the user name is toolkit and the password is costarica.

New user names and passwords will be announced in future newsletters.

The Last Word on the Tool Kit

If you would like to promote this newsletter by placing a link on your website, I will in turn mention your website in a future edition of the Tool Kit. Just paste the code you find here into the HTML code of your webpage, and the little icon that is displayed on that page with a link to my website will be displayed.

Here are some websites that added the Tool Kit link this week: