Google Operating System is reporting that Google has switched to its own in house translation system for all 25 available language pairs. Previously, the site used Systran for almost all of its translation processing, turning to its in house software only for Arabic, Chinese, and Russian. AltaVista's Babelfish, one of the first and most well-known online translation services, still uses Systran for language processing.
I thought I would put the two head-to-head to see which produced better results. Unfortunately, my high school Spanish has grown a bit rusty, and English is the only language in which I am proficient (if you can call this proficient). So translating from English to another language would do me no good. For this test, I visited Project Gutenberg and downloaded two versions of "Pierre et Jean" by Guy de Maupassant, one of my favorite writers, one in the original French and the other an English translation. I then fed the same passage, in French, through both translators.
Here are the original passages:
Original French: Dès qu'il fut dehors, Pierre se dirigea vers la rue de Paris, la principale rue du Havre, éclairée, animée, bruyante. L'air un peu frais des bords de mer lui caressait la figure, et il marchait lentement, la canne sous le bras, les mains derrière le dos. Il se sentait mal à l'aise, alourdi, mécontent comme lorsqu'on a reçu quelque fâcheuse nouvelle. Aucune pensée précise ne l'affligeait et il n'aurait su dire tout d'abord d'où lui venait cette pesanteur de l'âme et cet engourdissement du corps. Il avait mal quelque part, sans savoir où; il portait en lui un petit point douloureux, une de ces presque insensibles meurtrissures dont on ne trouve pas la place, mais qui gênent, fatiguent, attristent, irritent, une souffrance inconnue et légère, quelque chose comme une graine de chagrin.
English (human translation): As soon as he got out, Pierre made his way to the Rue de Paris, the high-street of Havre, brightly lighted up, lively and noisy. The rather sharp air of the seacoast kissed his face, and he walked slowly, his stick under his arm and his hands behind his back. He was ill at ease, oppressed, out of heart, as one is after hearing unpleasant tidings. He was not distressed by any definite thought, and he would have been puzzled to account, on the spur of the moment, for this dejection of spirit and heaviness of limb. He was hurt somewhere, without knowing where; somewhere within him there was a pin-point of pain -- one of those almost imperceptible wounds which we cannot lay a finger on, but which incommode us, tire us, depress us, irritate us -- a slight and occult pang, as it were a small seed of distress.
And here are the translated versions (from French):
AltaVista Babelfish: As soon as it was outside, Pierre moved towards the street of Paris, the principal street of Le Havre, lit, animated, noisy. The a little fresh air of the edges of sea cherished the figure to him, and it went slowly, the cane under the arm, the hands behind the back. It was smelled badly at ease, weighed down, dissatisfied like when one received some annoying news. No thought specifies did not afflict it and it would not have known to say first of all from where this gravity came to him from the heart and this numbness from the body. It had some share badly, without knowing where; it carried in him a small painful point, one of these almost insensitive bruises which one does not find the place, but which obstructs, tire, sadden, irritate, an unknown and light suffering, something like a seed of sorrow.
Google Translate: When he was outside, Pierre went to the streets of Paris, the main street of Le Havre, lighted, busy, busy. The air of a little fresh seashores figure caressed him, and he walked slowly, cane under his arm, hands behind their backs. He felt uncomfortable weight, as unhappy when received some disturbing news. No precise thought it sad and knew he would say first of all that this was due gravity of the soul and the numbness of the body. He had badly somewhere without knowing where; He wore it a little painful point, one of these almost insensitive bruises which are not found instead, but the way, tired, sadden, irritate, suffering and unknown slight, some something like a seed of sorrow.
I think we can be confident in saying that machine translation still can't compete with human. Both Google and AltaVista get a lot wrong and have some strange output ("It was smelled badly at ease?"), and each get some things surprisingly close. Neither, though, would be suitable for more than trying to gain a quick, cursory understanding of something written in an unfamiliar language. Real translation is still best left to the people who are fluent in the languages in question.
What do you think of our test, though? Which site do you feel came up with the best translation? Will you use one over the other?
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/1727
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
I thought Google did a better job. Consider these passages:
Yahoo
Google
Google's translation read much more naturally. I agree that neither felt like a quality replacement for human translation. However, perhaps this will provide the impetus to pick up the automated translation space.
They both fared pretty poorly. I'm not sure french from the 1800's is a fair test though. (correct me if I'm wrong there) Maybe Harry Potter? That has multiple translations that should be easy to find. When I have to rely on machine translation I usually use multiple sites and then go with what I think is the best given the output as they're always different. While Google, seemed smoother in places gramatically, they sacrificed a lot of qualifiers and adjectives, Alta Vista seemed to be more descriptive. ( animated, noisy vs. busy, busy etc.)
Hi Pete,
First Off, I love R/WW.
But I have to say your test is not a fair one. You selected an advanced piece with metaphors and literary embellishments.
To support my claim I checked the Gunning Fog index of the text:
The piece's Gunning Gog index is above 14
The Gunning Fog index of the rest of the text in the article is below 13.
And the latest article in techcrunch has an index of about 8.
My point ... it's just a baby. You have to give it time to grow.
Google translation is not bad, though it is still far less than good. Natural language translation itself is too difficult a problem. We probably cannot blame it too much.
Most recently, I have tested using Google to translate one of my online articles from English to Chinese. This article is not short and the content is not easy. Surprisingly, Google does a quite nice job. Although there are plenty of mistakes (many of them are very stupid with respect to normal Chinese language) in its translation, normal Chinese readers can basically get all the main points in my article. So I would say that the job is fairly done. We cannot blame machines too much at present, can't we?
-- Yihong
Hi Yoav, unfortunately, as I said, I only speak one language fluently (English) and the selection of public domain or freely available (i.e., that I could republish legally) content in two languages out there on the web is rather sparse. :)
Hi Josh,
Sorry for addressing Pete.
In essence, you are correct. Machine translation (at least statistical machine translation) will never be able to compete with Human translation. This is because languages are partly systematic and partly creative. Metaphors/analogies will always be a huge challenge for machines. However, since most of us do not write prose, The way Google is approaching translation has a good chance of producing an excellent "non prose" translator.
P.S.
You can use google advanced search with creative commons licensing parameters to find mountains of re-publishable text in any language.
Thanks for the tip! The key, of course, is to find stuff in English and another language, but definitely a useful tip, re: Google.
This is the group (machine translation) that turned me down for a C coding job last year :(
You can try this test, it's a comprehensive one, in both directions (french-english and the other way round). At least I think so!
Jean-Marie
Try some of these phrases from French to English in Google's new translator:
"sarkozy is chirac" (use quotes!)
or sarkozy chirac Kate cheney fesses amour
Found this out on Reddit.
justin..it is not about what is more natural. It is about what is closest to the original work. And most novels are not written in the most "natural" way of speaking.
I know I shouldn't say it but...I for one welcome our misunderestimatingly automatically translating overlords.
Here's where google excels: Unlike Babelfish, in the Google translation engine, you can mouse over text and suggest different alternatives for translation (for website translations only, i think). Collectively, this should increase accuracy.
-mp
Need accurate, up-to-date and well-written translations? Why, look to bloated government institutions! The EU translates into approximately, 293,507,406.39 languages.
Well, not quite, but quite a few nonetheless! If you go to an EU site, such as the European Law site, every page is available in 23 languages--translated by expert human translators.
For example, you can go to, http://eur-lex.europa.eu/en/questions/questions.htm
Once there, in the top right, click "FR" to get it in French. Then plug the text into Google Translate. It's a pretty damn good translation! Much better than above.
The same is true of other institutions, like the UN--but typically the EU is far and away the best.
The automatic translation scene had been lagging real innovation before Google made their move.
I hope the news from Mountain View only encourage the competition to accelerate their innovation, because the world needs better machine translations, for ALL language pairs, badly.
Perfect!!! My php google hack that will take in a sentence and will make a request to google and thus updating my translation hash table will be more accurate than systran...
Anyone wanting the source code signup to my link and private message me and I will send you the AJAX scripts to translate anything you want!
There is lots of marketing stuff available from enterprise websites such as this blurb about Microsoft OneNote in English
http://office.microsoft.com/en-us/onenote/HA101656661033.aspx
and German
http://office.microsoft.com/de-de/onenote/HA101656661031.aspx
I thought this stuff would be easier to handle for both Babelfish and Google, but the translations turn out equally bad -- distorting the meaning of the sentence. (Google, in the En->De translation makes OneNote a "place to collect people"...)
to #15 (Jones): Can you share some insight what has improved in the automatic translation scene? Other than Google making yet another attempt at a non-core business, or (hopefully) attempting to bring external cost (SYSTRAN monopoly) down?
I was trying to do simple translation in japanese with systran, and it failed to properly translate simple, single words. (oyasuminasai - good night) Through babylon's dictionary program, i found lec.com, which is what babylon uses for its translation services; you should look over there at that site and try the demo translator. I think it blows systran's away, personally.
Useful article thanks. It just goes to show how fragile of a language is, and how its integrity can be compromised in the wrong 'hands'. This test suggeste that the world-wide aspect of the Web seems only open website authors who figure out that translation is more than clicking on a link.