i-sako.com


Saturday, July 12, 2008

A little dose of Olifant can save your life

Well, maybe not your life, but certainly your life’s work.

For technical translators, there are few resources as precious as your (or, more likely, your team’s) translation memory. It represents all the work you’ve done since you started using translation memories, which in the case of translation teams can easily represent hundreds or even thousands of hours of collective labor, even over a relatively short period of time.

The makers of computer-assisted translation (CAT) tools often count on this investment of time and effort to keep their customers locked into whichever translation memory format their application uses, a strategy that is successful more often than not because of the difficulty involved in converting translation memory files saved in one format to another format. Although there is a standard format for translation memories, known as Translation Memory Exchange (TMX), the reality of the market is that each vendor’s support for this format is often less wholehearted than their users would probably like. After all, what incentive do these businesses have to make sure that their customers can easily transfer their existing translation memory files to another company’s translation tools? There are certainly translation tools that use standard TMX files as their default memory format (OmegaT is one example), but most tend to default to their own proprietary formats, for a variety of reasons.

Most translation tools will accept TMX files as input, but most of these are quick to convert them to their own proprietary formats. Sometimes the reasons for this are easy to appreciate. Wordfast, for example, has a long tradition of using tab-delimited text files as its memory format, which has the advantage of being easy to work with (although it does suffer quite a bit when it comes to TM stability). Likewise, Felix defaults to its own TM format, which is also XML based, but is a bit more streamlined than TMX and probably a lot more comprehensible to everyday users. In other cases, the rationale for using a proprietary TM format is often somewhat less clear, but vendor lock-in is certainly one reason. In any case, most users of more than one translation memory tool can readily attest that, when you need to transfer your memory files from one tool’s format to the format preferred by another tool, the process is not always a painless one.

Enter Olifant, one component of the Okapi Framework, a suite of open source translation support tools, which exists solely to manage translation memories stored in various formats and to perform conversions between them. The Okapi project aims to promote open standards where they exist and to offer its own open standards where none currently exist. As part of that project, Olifant serves as a general purpose TM-management tool, enabling users to convert TMs from one format to another, merge TMs together, edit their contents, filter entries based on SQL queries (a very useful function), flag duplicate entries, perform complex search-and-replace operations based on regular expressions (a particularly useful technique that has saved me countless hours on many occasions), search for and eliminate characters that are considered invalid in other formats, and a remarkable number of similar functions designed to make the complex task of managing multiple TMs in varying formats less daunting.

Just recently, I was faced with the task of converting a TM of some 80,000 translation units in Wordfast’s relatively forgiving tab-delimited text format into Déjà Vu X’s much more stringent relational database format. After more hours than I care to count spent vainly trying and failing to make a direct conversion between those two formats, I called in Olifant to mediate the conversion, which it did flawlessly and gracefully, reducing a task that I had been struggling with for quite some time to something that could be accomplished in about an hour.

In other cases, Olifant has helped me satisfy the requests of clients who have contacted me after a job is done to ask if I can provide the translation memory along with the finished translation. In some cases, the format they request is different from the one produced by the tool I used for that particular job, but Olifant makes it easy to provide the memory in the format they want. In other cases, the translation units for the job in question are mixed in with those for other jobs from that same client. Cases like this are often a bit trickier, but clever use of Olifant can help fish the relevant translations out of the many other ones in the same memory.

If your workflow is irrevocably wedded to a specific tool, you may never have any use for the kind of functionality that Olifant offers, but the reality of the translation market is that no single tool offers a complete solution to all the possible problems that translators face (not yet, anyway), so the need to move fluidly from one application to another is a common one. For translators who need this flexibility, Olifant is quickly evolving into a indispensable resource that a technical translator really cannot afford not to have in his or her toolkit.

Posted by Sako in • Technology
(0) Comments | (0) Trackbacks | Permalink

Wednesday, June 25, 2008

Take Your Tools With You With PortableApps

If you ever find yourself working on more than one computer (and these days, who doesn’t?), you’ve probably found yourself wishing that your data and applications could be quickly and easily ported to the various machines you work on, so that you wouldn’t need to spend so much time installing applications on your various machines, configuring them, and keeping them all in sync. If you have ever felt this way, you might be interested in the PortableApps.com suite of open source applications that can be run from just about any computer that can read from a USB memory device or just about any other type of storage device, like a memory card or even your iPod.

I’ve been getting a tremendous amount of mileage out of the PortableApps suite, which includes portable editions of such notable applications as Mozilla Firefox (in which this blog entry was written), Mozilla Thunderbird, the GIMP, and OpenOffice.org.

Although the better-known applications above tend to get most of the attention, there are a few others that are also quite noteworthy, among them:

Abiword Portable
If you need a word processor, but not necessarily all of the other applications included in OpenOffice.org, this lightweight application is a great alternative.

Sumatra PDF Portable
The sheer bulk of Adobe Reader is enough to send many users to popular alternatives like Foxit Reader, but it would be a mistake to overlook Sumatra PDF Portable, which might not have quite as many features, but is often even faster than Foxit, which is itself pretty impressive.

Task Coach Portable
This handy combination to-do list and task manager has functions for helping you manage your tasks through to completion, of course, but it also has functions for keeping track of how much time you are spending on each, which tasks are taking more time than you’ve budgeted for them, and how much revenue you’ve generated from each task.

GnuCash Portable
After you’ve found out from Task Coach how much money you’ve been earning on each task, the next logical step is to keep track of your total earnings in an application like GnuCash.

In addition to the applications I’ve mentioned already, there are also a number of very useful utilities and development tools available.

The only real downside to using the PortableApps software is that it is currently for Windows users only (or Linux, if you use Wine). (Mac users, if you know of similar offerings on the Mac side of the fence, please feel free to share them in the comments section.) For anyone who regularly needs to use more than one computer, the advantages of using PortableApps are numerous and compelling—try some and see for yourself! 

Posted by Sako in • Technology
(0) Comments | (0) Trackbacks | Permalink

Sunday, June 01, 2008

Meet Felix, a clever new CAT tool made by GITS

If you use computer-assisted translation (CAT) tools in your work—particularly if you translate between English and Japanese—you owe it to yourself to take a look at Felix, a new application released last month by Okinawa-based Ginstrom IT Solutions (GITS). Although Felix is technically not new in the sense that it is the reincarnation of TransAssist, the developer’s vision for Felix and ambitious development roadmap leave me feeling comfortable thinking of Felix as a new development in the CAT market, even if the application itself has already been under continuous development for a number of years.

Because the friendly folks at GITS have already made a demo showing how Felix works in Microsoft Word, and because there is a trial version that you can download and try on your own for translations of up to 500 units, I will not attempt to show how it works here, but will instead focus on some of the things that I think make Felix unique.

Overwriting the Source Text

As far as I know, Felix is the only CAT tool that works by overwriting the source text as you translate. Although this approach is perhaps not as cautious as the ones used by tools like OmegaT and Deja Vu X (both of which import the source text into an external editor and then export your finished translation as a separate file, leaving the original document unchanged) or Wordfast and SDL Trados (both of which create what are commonly called “bilingual” or “uncleaned” files that include both the source and target text until the translation is finished and the document is “cleaned up"), it does have the advantage of being both very easy to work with and very fast. Once you have finished working through all of the source material, you’re done. There is no export step or cleanup process at the end of the job—which, as any translator who has used other tools can tell you, can be the most panic-inducing part of any CAT-based process if it does not work as expected. Felix cleverly sidesteps potential problems in this area by eliminating these steps entirely.

The only real caveat here is that you will want to work on a copy of the document you are translating, rather than the original document, just to make sure that you have a backup of the original text in the unlikely event that something goes wrong while you are working. This is a common sense rule that should be observed when using any CAT tool, however, so it does not stand out as something that you would need to pay special attention to when using Felix.

Unobtrusive Control Over Formatting

Many clients may express a strong preference for translations to reflect the same formatting as the original document, but actually formatting is every bit as translatable as the rest of the document. In technical documentation, for example, it is common for the commands in the menus of software applications to be enclosed in brackets in Japanese, but those same commands are typically written without the brackets in bold text when translated into English. Formatting changes like this are important to ensure that the translation reads naturally in the target language. Some CAT tools make the mistake of assuming that formatting present in the original must also be present in the translation (this is particularly true of the ones that import the source text into an external editor), but Felix avoids making any assumptions in this area and instead allows the translator to control the way the text is formatted as an integral part of its translation workflow.

Works in the Application in Question

SDL Trados and Wordfast both work directly in Microsoft Word, but when it comes to Excel or PowerPoint documents, the translator is required to either switch to a different application (Trados) or attempt to bring text from those types of documents into Word for translation and then export the finished product back to the original files (Wordfast). Neither of these approaches are as convenient or as intuitive as Felix’s approach of opening the file in question and doing the translation right in the application that created it.

In the case of HTML files, Felix also requires the use of a different application (the WYSIWYG TagAssist editor), but even in this case the translator is still working in an environment that makes it possible to edit the content of the document while translating, which is an important thing to be able to do in many cases.

Unlimited TMs/glossaries

The ability to draw from an unlimited number of translation memories and glossaries is not unique to Felix (OmegaT and Deja Vu X offer this as well), but as far as I know Felix is the only CAT tool that works directly in the application in question that offers this capability. Wordfast, although it comes close, is limited to three glossaries, one active TM, one background TM, and one “very large translation memory,” which is a kind of Web-based shared repository for translations that Wordfast users are able to draw from. Felix’s approach to managing TMs and glossaries is flexible, powerful, and best of all uncomplicated.

Felix is Scriptable

If you know what you are doing, most CAT tools can be scripted to some degree, but aside from OmegaT (the source code for which is available to anyone who is interested), Felix is probably the most amenable to user-provided extensions of its native capabilities. As far as I know, Felix is the only CAT tool that provides property and method specifications along with examples in VBA/Visual Basic and C++ that show you how to build on Felix’s functionality. Although many translators might never need to do this, it is nice that Felix is designed to help you do so if you choose to (it is also a refreshing acknowledgement that the translator’s skills may extend beyond simply the linguistic ones required for translation work). 

Complementary Tools from GITS

In addition to Felix and TagAssist, GITS has been busy producing a collection of complementary tools, including Count Anything, a word-counting utility that supports a variety of file types; Analyze Assist, a program that analyzes the documents you want to translate and compares their contents with the translations already stored in your memory files, making it possible to estimate how long the translation will take; and most recently Jamming2Felix, a utility for converting glossaries from the popular Jamming format into the Felix format. All of these things, when used together, make it easy to get started using Felix right away.

Conclusion

For a single translator working primarily on translations of Microsoft Office documents or HTML files in a Windows environment, Felix is an attractively priced, compellingly robust offering. The 1.0 version does not yet support sharing TMs and glossaries over a network, so it is not yet suitable for use by translation teams of the sort you might find in an agency or corporate environment, but the development roadmap suggests that this kind of capability is coming soon. Once this is in place, I would recommend Felix for anyone who needs a translation tool that is simple yet powerful, easy to learn and easy to use, and lets you work the way you want to, without getting in the way.

Posted by Sako in • Technology
(0) Comments | (0) Trackbacks | Permalink

Thursday, May 15, 2008

Which Linux distro for the Eee PC?

Last night I met briefly with a friend who happened to have a copy of the latest release of Linux Mint on a USB drive, so I borrowed the drive for a few minutes to see if I could boot from it and run Linux on my Eee PC. I wasn’t really expecting much, but to my surprise it booted up flawlessly, thereby confirming my earlier ideas about using Linux on this machine. Mint, as I saw last night, works well. I’ve also noticed that the recent release of Fedora (the distribution I have used most over the years) comes with support for persistent Live USB key installations, which would be ideal. And, of course, there is also the ever-popular Ubuntu to try out as well. Choices, choices!

(Incidentally, the fact that this entry is being written while I am standing on a  relatively crowded commuter train on my way to work is yet another reason why I think this PC was a really good fit for my needs.)

Posted by Sako in • Technology
(1) Comments | (0) Trackbacks | Permalink

Cell phone spam

I recently noticed this survey on cell phone spam (via What Japan Thinks), and can vouch for the fact that it is a constant problem with no particularly good solution (at least, none that I am aware of).

Although the volume of spam that comes to my cell phone (probably between five and ten messages a day) is far less than the volume that gets directed at my primary e-mail address (lots!), the difference is that there are far more robust filtering options available for traditional e-mail than are typically offered by the companies providing such services for cell phones.

On my cell phone there are only a few options for filtering spam, all of which basically boil down to specifying categories of messages that should just be ignored, but the problem is that the settings are overly broad and cannot be fine-tuned in any meaningful way. You either reject all mail not sent from other cell phones or you accept everything, for example. (That might be a bit of an overstatement, but not by much.) This is not a very viable set of alternatives for what I would consider "normal" e-mail use, so I end up accepting everything to make sure that I don’t miss something that might be important, like a message from a potential new client.

With my regular mail, however, spam filtering options abound, so even though a lot of spam is sent to my address, almost none of it ever makes it into my inbox (thanks to MailFoundry, which I would really like to see phone companies introduce to stem spam), and even if it does, it will typically get zapped almost immediately by the spam-filtering functions of whichever mail application I am using, which is not a function I’ve seen in most phones.

Why is it that phone companies do not takea much harder line on spam? The survey results indicate that it is pretty clearly a common problem. My pet theory is that the reason is two-fold: (A) They don’t get enough complaints about it to justify taking additional actions and (B) they charge you for the spam you receive anyway, so their motivation to eliminate that particular stream of revenue is too low for them to act on. Those are my ideas, anyway; I welcome anyone who is better informed about these things to set me straight.

Posted by Sako in • Technology
(4) Comments | (0) Trackbacks | Permalink
Page 1 of 55 pages  1 2 3 >  Last »