Hacker News new | past | comments | ask | show | jobs | submit login
Lorem Ipsum: Of Good and Evil, Google and China (krebsonsecurity.com)
142 points by panarky on Aug 18, 2014 | hide | past | favorite | 28 comments



Garbage in, garbage out. The system probably didn't have the lorem ipsum placeholder text in its dictionary for all languages, so just mapped to whatever its algorithms could guess. Since there is no right guess, it's pretty random.

The rest of the conspiracy nonsense in the article is pretty silly and stupid, honestly. There are a huge number of government documents translated into other languages that were probably used as the training set. I have programs of my own that rummage through SEC filings, for example. So NATO and countries being common mappings if you pick a random one isn't odd at all.


> The rest of the conspiracy nonsense in the article is pretty silly and stupid, honestly.

Was also surprised to read something like this on Krebs


https://news.ycombinator.com/item?id=8152663

Evaluate for yourself whether or not he may have fallen a few notches lately.


I have to admit that recent events have altered my opinion of him, which is sad because I have enjoyed reading his site.


Brian is going off the deep end I guess?


The comments to the article show a lot of funny results of statistical translation.

Even though most of the lorem ipsum examples are now fixed in Google Translate, as of now you still get 'lorem ipsum dolor' when you translate 'China' from English to Latin.


Maybe it's a simplistic explanation, but I would think that this was caused by the vast amount of multi-language sites in which pages in languages other than english have not been written yet, so selecting one of them displays the lorem ipsum (probably google translate identifies this untranslated pages as latin even though they were supposed to be language X).


The problem is the consistent politicization of each word in ways related to intelligence and the extremely good properties of lorem ipsum text (its nonesense that doesn't stand out as nonesense - a holy grail of ciphers).

Its possible that this is statistical noise... but it seems particularly plausible that it started out that way then some one gamed it into being far less noisey and more consistently intelligence-based.


I suspect words such as Company and China are pretty commonplace on the internet, so the data used for Google to learn is likely to include a number of these mapped to Lorem Ipsums. Sentences making sense could be down to another part of the algorithm - Google doesn't just do word for word translation, but tries to map meanings based on the context of the sentence, and to ensure the output sentence is grammatically correct in the new language - as such the algorithms distort the results into sentences which would be unlikely to appear by chance, making them more appealing to us when suggesting conspiracy.


The small corpus makes it very easy to game the translation.

I think it was done on purpose for the DEFCON competition mentioned elsewhere is this thread. The translation must have stayed fixed for at least during the conference if it should have worked, so I bet Lost published a lot of dummy material to make it so.

It would also make the cryptic references to CIA and stuff a bit easier to believe.


Very interesting... and somewhat creepy, the phrases that are coming up. I can confirm that "lorem" and "ipsum" don't work now, but playing around with other pieces of lipsum still gives odd phrases like "suspendisse bibendum duis" -> "suspend regional banking", "nostrud exercitation turpis fermentum" -> "Iraqis saying through Arizona", and "Curabitur duis bibendum" -> "Nike's restructuring".

An explanation I have is that the Chinese could be somehow using Google Translate to "latinise" news stories in order to bypass censorship.


To use a Google Service in China you have to bypass censorship, so there's really no point in using Google Translate in the way you suggested.


Reminds me of this hilarious bug, where Translate would randomly add the phrase "he now praises the iPad" to totally unrelated sentences: http://www.huffingtonpost.com/2013/01/05/google-bug-praise-t...


A lot of the translations read like spam to me, with the mentions of "commerce", "home business", "the company" etc, and in Chinese marketing copy, it's quite common to say use "China" as part of the marketing "China's first...", "China's biggest.." etc etc

So perhaps a less sinister explanation, is Chinese spam?


Bad training data. Lorem Ipsum is the de facto placeholder text for so many webpages.


I work with machine translation - such artifacts would be a natural result from multilingual webpages that aren't yet fully translated. An article would have correct text in English, Chinese or Spanish but the "translation" in some other language could have some Lorem Ipsum left there.

Statistical machine translation systems would easily pick that up, as crawling 'multilingual' sites with same/similar content is a major source for machine translation training data.


I remembered a discussion about this before, once in 2010 and once in 2013. Already in 2010 Lorum Ipsum was translated to random words that are very prevalent on the internet: "hello world", "learn more" and "free on": http://www.xefer.com/2010/10/lorem-ipsum

Therefor I don't think that Google Translate is used by spies to communicate plans about China. Even if all translated words were insidious, then still Occam's Razor tells us it is unlikely that a public translating service is used as a modern-day number station.

The 2013 discussion had translations like: "Cisco Security" and "Corporate Japan": http://googlesystem.blogspot.com/2013/06/lorem-ipsum-google-...

That's statistical machine translation for you.


That is correct. The interesting question is, why did it translate to 'China', rather than something more banal?


It could just be a selection bias, in that we think it's interesting so it makes the news. If it had translated to something more banal, we probably wouldn't be discussing it on Hacker News.


Agreed. It's the presence of China, NATO, Russia, and The Company all under one cryptic roof that makes it weird.


This was also mentioned in an article about the Defcon 22 contest, posted on HN too: https://news.ycombinator.com/item?id=8189549. Apparently, the translation ceased to work now?


A short story on that as a Defcon 22 badge competitor - When we reached the stage where we got the "Lorem ipsum" page. We first noticed that a bunch of the lines did not directly follow the "Lorem ipsum" format exactly and had strange capitalization. So we thought that the difference between the expected "Lorem ipsum" text and this text was the clue... We eventually figured out if you pasted the entire block into Google translate something strange would pop out (that was relevant to another hint - https://www.defcon.org/1057/SarangHae/ and then was useful again with what that email address returned ).

Looks like Google updated their latin translator to completely break the puzzle :)

page in question: https://www.defcon.org/1057/FissilingualElucidation/


It still works by translating from English to Latin. I found a bunch by running a list of NSA keywords through it: https://i.imgur.com/UGMIPpE.png

All of the resulting translations seem to be from a text generated by lipsum.com.


I tested with a conventional English word list for comparison. Here are the Lorem hits: http://pastebin.com/yh26U7iz


It would have been interesting to see if you tried this while logged in on a Google account not belonging to a security researcher.

It's a remote change but maybe it brings up totally random results that are then passed though your accounts search bubble filter. Hence sec related topics.


At the time of a previous HN thread, https://news.ycombinator.com/item?id=5200728 there were some amusing results from prefixes of the stock boilerplate, often changing letter-by-letter:

  Lorem ipsu → Dummy Item 
  Lorem ipsum dolor → Welcome
  Lorem ipsum dolor s → The Pussycat Dolls
  Lorem ipsum dolor sit amet, c → This page is available
  Lorem ipsum dolor sit amet, consectetur adipisicing elit → This page is half the battle WIN!


I guess it's just a Rorschach test for the Internet.


What I've heard on the grapevine on this is that it's used as a method to defeat internet censorship in countries that are subjected to said censorship.

Not sure if that's true, but just passing that on.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: