Garbage in, garbage out. The system probably didn't have the lorem ipsum placeholder text in its dictionary for all languages, so just mapped to whatever its algorithms could guess. Since there is no right guess, it's pretty random.
The rest of the conspiracy nonsense in the article is pretty silly and stupid, honestly. There are a huge number of government documents translated into other languages that were probably used as the training set. I have programs of my own that rummage through SEC filings, for example. So NATO and countries being common mappings if you pick a random one isn't odd at all.
The comments to the article show a lot of funny results of statistical translation.
Even though most of the lorem ipsum examples are now fixed in Google Translate, as of now you still get 'lorem ipsum dolor' when you translate 'China' from English to Latin.
Maybe it's a simplistic explanation, but I would think that this was caused by the vast amount of multi-language sites in which pages in languages other than english have not been written yet, so selecting one of them displays the lorem ipsum (probably google translate identifies this untranslated pages as latin even though they were supposed to be language X).
The problem is the consistent politicization of each word in ways related to intelligence and the extremely good properties of lorem ipsum text (its nonesense that doesn't stand out as nonesense - a holy grail of ciphers).
Its possible that this is statistical noise... but it seems particularly plausible that it started out that way then some one gamed it into being far less noisey and more consistently intelligence-based.
I suspect words such as Company and China are pretty commonplace on the internet, so the data used for Google to learn is likely to include a number of these mapped to Lorem Ipsums.
Sentences making sense could be down to another part of the algorithm - Google doesn't just do word for word translation, but tries to map meanings based on the context of the sentence, and to ensure the output sentence is grammatically correct in the new language - as such the algorithms distort the results into sentences which would be unlikely to appear by chance, making them more appealing to us when suggesting conspiracy.
The small corpus makes it very easy to game the translation.
I think it was done on purpose for the DEFCON competition mentioned elsewhere is this thread. The translation must have stayed fixed for at least during the conference if it should have worked, so I bet Lost published a lot of dummy material to make it so.
It would also make the cryptic references to CIA and stuff a bit easier to believe.
Very interesting... and somewhat creepy, the phrases that are coming up. I can confirm that "lorem" and "ipsum" don't work now, but playing around with other pieces of lipsum still gives odd phrases like "suspendisse bibendum duis" -> "suspend regional banking", "nostrud exercitation turpis fermentum" -> "Iraqis saying through Arizona", and "Curabitur duis bibendum" -> "Nike's restructuring".
An explanation I have is that the Chinese could be somehow using Google Translate to "latinise" news stories in order to bypass censorship.
A lot of the translations read like spam to me, with the mentions of "commerce", "home business", "the company" etc, and in Chinese marketing copy, it's quite common to say use "China" as part of the marketing "China's first...", "China's biggest.." etc etc
So perhaps a less sinister explanation, is Chinese spam?
I work with machine translation - such artifacts would be a natural result from multilingual webpages that aren't yet fully translated. An article would have correct text in English, Chinese or Spanish but the "translation" in some other language could have some Lorem Ipsum left there.
Statistical machine translation systems would easily pick that up, as crawling 'multilingual' sites with same/similar content is a major source for machine translation training data.
I remembered a discussion about this before, once in 2010 and once in 2013. Already in 2010 Lorum Ipsum was translated to random words that are very prevalent on the internet: "hello world", "learn more" and "free on": http://www.xefer.com/2010/10/lorem-ipsum
Therefor I don't think that Google Translate is used by spies to communicate plans about China. Even if all translated words were insidious, then still Occam's Razor tells us it is unlikely that a public translating service is used as a modern-day number station.
It could just be a selection bias, in that we think it's interesting so it makes the news. If it had translated to something more banal, we probably wouldn't be discussing it on Hacker News.
This was also mentioned in an article about the Defcon 22 contest, posted on HN too: https://news.ycombinator.com/item?id=8189549. Apparently, the translation ceased to work now?
A short story on that as a Defcon 22 badge competitor - When we reached the stage where we got the "Lorem ipsum" page. We first noticed that a bunch of the lines did not directly follow the "Lorem ipsum" format exactly and had strange capitalization. So we thought that the difference between the expected "Lorem ipsum" text and this text was the clue... We eventually figured out if you pasted the entire block into Google translate something strange would pop out (that was relevant to another hint - https://www.defcon.org/1057/SarangHae/ and then was useful again with what that email address returned ).
Looks like Google updated their latin translator to completely break the puzzle :)
It would have been interesting to see if you tried this while logged in on a Google account not belonging to a security researcher.
It's a remote change but maybe it brings up totally random results that are then passed though your accounts search bubble filter. Hence sec related topics.
At the time of a previous HN thread,
https://news.ycombinator.com/item?id=5200728
there were some amusing results from prefixes of the stock boilerplate, often changing letter-by-letter:
Lorem ipsu → Dummy Item
Lorem ipsum dolor → Welcome
Lorem ipsum dolor s → The Pussycat Dolls
Lorem ipsum dolor sit amet, c → This page is available
Lorem ipsum dolor sit amet, consectetur adipisicing elit → This page is half the battle WIN!
What I've heard on the grapevine on this is that it's used as a method to defeat internet censorship in countries that are subjected to said censorship.
Not sure if that's true, but just passing that on.
The rest of the conspiracy nonsense in the article is pretty silly and stupid, honestly. There are a huge number of government documents translated into other languages that were probably used as the training set. I have programs of my own that rummage through SEC filings, for example. So NATO and countries being common mappings if you pick a random one isn't odd at all.