SINCE its earliest days, machine translation?ªthe use of computers to translate documents from one language to another automatically?ªhas suffered from exaggerated claims and impossible expectations. One characteristic (but apocryphal) tale tells of an American military system designed to translate Russian into English, which is said to have rendered the famous Russian saying ??The spirit is willing but the flesh is weak?? into ??The vodka is good but the meat is rotten.??
This sort of joke prompts a hollow laugh from those in the machine-translation (MT) business. It does so because it demonstrates both the difficulty of getting computers to understand human languages, and the high expectations that must be met if MT is to be taken seriously. Over the years, there have been a number of promising new approaches in the field, and ever-cheaper processing and storage technology have helped improve things. But progress has been painfully slow, and the decisive breakthrough that will transform the fortunes of MT has never appeared.
Now the Internet has given MT a much needed shot in the arm. This is odd because the ability to transmit information quickly and cheaply would not, on the face of it, appear to make the process of translation any easier. Yet, although the underlying technology of MT is still the same as it ever was, the rise of the Internet changes the way in which technology is perceived and the way it is used. And there are signs that, in future, it could improve the way it works as well.
The idea of automating the process of translation using computers goes back to the late 1940s. Warren Weaver of the Rockefeller Foundation in New York wrote a memorandum suggesting that the code-breaking successes of the second world war, combined with electronic computers and the new ??information theory?? laid out by Claude Shannon, might form the basis of an automatic translation system. This prompted research at several American universities, and the first public demonstration of MT?ªthe result of a collaboration between IBM and Georgetown University?ªtook place in 1954. This early system, based on a simple bilingual dictionary with a few rules to determine word order, caused a surge of enthusiasm and funding.
For the next decade, MT researchers tried to overcome the limitations
of simple dictionary-based systems using more complex approaches which analysed the source text using grammatical rules. ??Today, the computer, or electronic brain, is well along toward picking up the burden of machine translation,?? declared the Atlantic Monthly in 1959. But despite such optimism, progress was slow, and in 1964 the American government established a committee to examine the prospects for MT. Its report, issued two years later, concluded that, compared with human translators, MT systems were slower, less accurate, and twice as expensive. Instead, the committee recommended that research should concentrate on devising systems to assist human translators, rather than trying to replace them altogether. As a result, American funding for pure MT research dried up.
In some fields, however, it was recognised that even a rough-and-ready translation was better than none at all. Systran, a company established by Peter Toma, a researcher at the California Institute of Technology in Pasadena, sold a Russian-to-English translation system to the United States Air Force in 1970, and the same system was subsequently adopted by the European Commission. During the 1970s, demand for translation systems began to emerge in the business community.
During the 1980s, the combination of rapid falls in the cost of computing power and increasing demand from governments and multinational companies caused a revival of interest in MT, spurring renewed research. New systems were developed. Many of them worked by translating the source text into an intermediate language or symbolic representation, from which it could be translated into any of several other languages. As computers became more powerful and storage became cheaper, other new approaches emerged in the 1990s: analysis of parallel texts (the same text in two languages) led to new statistical-translation systems, which did not rely on any underlying grammatical rules, and to example-based systems which translated one sentence at a time by searching a database for examples of similar sentences whose translations were known.
Even so, the quality of MT has not really improved very much over the past three decades, says John Hutchins, an expert on the history of machine translation at the University of East Anglia, in Britain. ??If you look at quality of output now, compared with 1970, in many cases you can't see much improvement,?? he says. What has changed is that MT systems have now been plugged into the Internet. That changes the way they are used, and the expectations of them.
The network of Babel
The Internet has democratised MT and boosted demand dramatically, as users around the world struggle to understand pages in languages other than their own. And as companies set up increasingly elaborate websites, they have become aware of the need to maintain multiple sites in different countries and serve customers in different languages. Of America's 100 largest firms, 33 had multilingual websites at the end of 1999, and 57 did a year later. A study by Aberdeen Group, a management consultancy, found that, on average, users spend up to twice as long at a site, and are four times more likely to buy something from it, if it is presented to them in their own language. Another study by IDC, a technology consultancy, found that only 5% of the 50 top websites responded appropriately to e-mail queries in a foreign language; most simply asked for the message to be resent in English. All of which highlights the need for MT systems to provide on-the-fly translations, and for elaborate publishing systems that can manage multilingual websites.
The Internet changes the game for machine translation: users want speed, rather than quality, and are more likely to accept poor results
Arguably the best known online MT system is Babel Fish, which relies on Systran software to translate pages retrieved by the AltaVista search engine. Anyone who has used Babel Fish will be familiar with the unintentional hilarity of the results; one popular game involves scrambling the lyrics of pop songs by translating them from English into another language and then back again (a ??round-trip?? translation). Other MT systems are also in use online, providing rough-and-ready translations of chat-room conversations and e-mail messages. Demand for such services is likely to increase as the diversity of Internet users increases. At the end of 2000, 48% of Internet users were English speakers, but this figure is expected to fall to 32% by the end of 2002.
Unfortunately, MT systems work best when they have been customised for a particular subject area, such as microbiology, aerospace or particle physics. This involves analysing typical documents and adding common words and technical terms to the system's dictionary. Using MT to translate Internet pages, which can be about anything at all, therefore produces terrible results, since no customisation is possible. To make
matters worse, most MT systems were designed for use with high quality documents, whereas many web pages, chat-rooms and e-mails tend to involve slang, colloquial language and ungrammatical constructions.
Even so, Steve McClure, an analyst at IDC, notes that the Internet has ??refocused??MT from being a tool that provides a first draft for translators to becoming a general tool ??for gaining a quick, partial understanding of perishable texts in high-volume environments without human involvement in the translation process.?? The Internet changes the game for machine translation: users want speed, rather than quality, and are more likely to accept poor results