Show HN: Write (almost) perfect encryption in an hour

tptacek · on Jan 24, 2015

The author implies (without saying it) that this is a one-time pad. There are two issues with that:

(a) This isn't a one-time pad at all, because the keystream comes from /dev/random. This is, instead, a simple stream cipher keyed from the /dev/random CSPRNG. Since /dev/random is difficult in practice to seed and run deterministically, the only mode you can run it in is "one time pad mode". How close to an actual OTP it is depends on your environment.

(b) One-time pads don't work well in practice. Enthusiasts fixate on the "unbreakable" point without having a full mental model of how a cryptosystem works. That robs them of the insight that (i) only a small part of the cryptosystem is "theoretically unbreakable" with an OTP, and (ii) that part is, in the history of cryptanalysis, the least risky component of the system. In exchange for hardening what was already the strongest component of any cryptosystem, OTP crypto makes grave sacrifices in other parts of the system that are routinely and profitably exploited. It's a terrible tradeoff.

Seconding the (pragmatic) skeptical comments in this thread. Don't use anything like this to encrypt real data. You are safer with any modern block or stream cipher than you are with a ad-hoc stream cipher, which is what this is in practice.

As a learning experience, though: keep at it! This isn't a bad start at all.

If you like/grok how this cipher works, a great next step towards practical cryptography is to learn Salsa20, which is a very simple algorithmic keystream generator that is fast and highly regarded by cryptographers.

But I think practical crypto engineering is a bad next step. The better next step is to take this knowledge and apply it to practical cryptanalysis. Trust me: if you're just getting started with crypto and you're a programmer, you will get 1000x more insight from learning to break systems than you will from building random systems. Here's a bunch of exercises that start in a place that is very compatible with this XOR code:

http://cryptopals.com/

dsacco · on Jan 24, 2015

Your second point is very important for engineers to take to heart if they're looking to develop cryptosystems.

One-time pads are fetishized because they are the only theoretical silver bullet to cryptanalysis, but they have a bunch of problems that prevent them from being serious solutions to real cryptography needs.

1. They are just about the worst compromise between security and utility. The one-time pad digest must be at least as long as the message, and any communication is going to be more or less stateless because you need to generate a new one-time pad for every message (by design) to prevent cryptanalysis. This is great for an ivory tower exercise, not so great for typical use cases.

2. One-time pads have a terrible ideal to implementation ratio: they look great theoretically and are almost never implemented securely. Among other things, it is very difficult to seed them properly.

3. They do not solve the problem of key exchange whatsoever, which is arguably a much larger vulnerability in practice. A one-time pad is alluring for its standalone strength, but in practice it won't help you elsewhere.

Practically speaking, one-time pads don't solve any real problem in cryptography in a meaningful way, and using them practically tends to attract the sort of programmer that goes into cryptography with a dangerously inaccurate belief from the outset: that you can really develop a silver-bullet to cryptanalysis in real-world cryptosystems.

Other than that, completely agree that it's good practice, but there needs to be a README disclosing that this is unsafe crypto (like just about every other repository anywhere).

lawl · on Jan 24, 2015

Oh god. Please don't use that code for anything remotely serious. It's fine as a learning project, but please, PLEASE make that very clear in the readme.md then.

from random.c

    srand(time(NULL));// seed

Here goes your security. All one needs to know is when you generated your keyfile. srand() and rand() ARE NOT secure for crypto applications.

TheJH_ · on Jan 24, 2015

Aaaactually, if you read the code, you'd see that he XORs the bytes from rand() with bytes from /dev/random. This project is horrible, but that's not one of the completely wrong parts.

lawl · on Jan 24, 2015

Oops. You're right. I stand corrected.

hoers · on Jan 24, 2015

>PLEASE make that very clear in the readme.md then.

This.

I wish people dabbling in crypto wouldn't be using phrases like "With this project you can be perfectly sure" or "[...]this encryption is unbreakable even in theory."

mholt · on Jan 24, 2015

This post may draw up a lot of criticism, but if nothing else, kudos to the author for implementing a crypto scheme to understand how it works and for encouraging others to examine and learn.

Now, that isn't to say that we should actually use this library (as-is). You know, don't roll your own crypto for real use. With this particular algorithm, you're putting a lot of faith in the RNG, and it's easy to miss that you need a huge key (or lots of small ones).

Finally, the claim that it's "unbreakable, even in theory" (which comes from the Wikipedia article) to me seems a bit misleading -- isn't the theory usually stronger than any particular implementation? (The word "even" throws me off.)

dllthomas · on Jan 24, 2015

I think the idea is that systems with a finite unicity distance can be "unbreakable, in practice" because brute force would take too long, but in theory if you had enough ciphertext and did exhaust the keyspace you could show that only one possible key produces realistic output. When the key has as many degrees of freedom as the plaintext, every possible valid output has a possible key.

swehner · on Jan 24, 2015

For one-time-pad encryption, both sides have a copy of the same secret key.

The name says "one-time," because if a key is used to encrypt two different plaintexts, information will leak.

In this implementation, the results of two encryptions may be based on the same bytes from the key file; not for all bytes, but for some. This is because it chooses bytes from the file (function generate_random_offsets() at https://github.com/pannous/xipher/blob/master/encrypt.c), then writes both the address of those bytes and the XOR'd value to the result of the encryption.

There is no provision to avoid picking the same offsets/key-bytes from one encryption-run to the next, except that they are chosen at random. Because the format of the encrypted messages is very simple, it can be easily determined for two encrypted messages, which key-bytes they share. Remedies: store between runs, remove 00 from file,define chunks in the file?

It's not too difficult to avoid sharing the secrets between two encryption runs; it is not even necessary to chose the key-bytes from the file at random. One could just use the key-bytes in sequence, send the address of the first key-byte, and store the next available key-byte for the next run. When the shared key-file doesn't have enough random bytes, because it has been used for encrypting too many message, it can error out -- which it should, but this implementation will never do that.

fletom · on Jan 25, 2015

> What if you don't trust the random key generator on your computer? Just xor the generated key with some other files.

> NOTE: You can increase security significantly if you xor/encrypt zipped files, as they already contain very little structure!

> If your key is too small or if you are using it too often, you may be reducing security.

I hope it goes without saying that this thing should not be taken at all seriously.

binwiederhier · on Jan 24, 2015

In regards to this comment: https://github.com/pannous/xipher/blob/b62a3b864d435a1e61066...: /dev/urandom isn't insecure. Both /dev/urandom and /dev/random use CSPRNG. There is a wonderful artical (with diagrams and all) at http://www.2uo.de/myths-about-urandom/

sz4kerto · on Jan 24, 2015

"if(num_cypher_bytes<100000)// or num_bytes"

If num_bytes << num_cypher_bytes (and even more if we know the key length) then the encryption leaks information. https://en.wikipedia.org/wiki/XOR_cipher Otherwise this is a stream cipher and it is really secure but it's inconvenient :)

But it is a good starting point for anybody who's interested in crypto.

striking · on Jan 24, 2015

Unlike a modern cipher like AES, XOR is vulnerable to a known-plaintext attack. If your keys are truly random then you're fine, but otherwise you risk exposing enough random output from your RNG that a seed can be found.

Also, weak RNG. Use arc4random. (And /dev/urandom isn't insecure, either.)

bluedino · on Jan 24, 2015

It's kind of a joke post. "Use a one-time pad! Proven unbreakable! And exchange the huge key on a physical CD with your intended destination!"

nawitus · on Jan 24, 2015

Tinfoil Chat[1] is a semi-practical application of the one-time pad encryption.

The paper mentions the inconvenience of sending the keys:

>Another common argument is that unlike symmetric PSKs, OTP keys also eventually run out and users need to exchange another keyfile. The messages of TFC only consume 420 bytes of key material. The drop in the price of storage space has made the issue negligible: A $10 microSD card can store 4GB of keyfiles to encrypt more than 9 million messages, which is in most cases enough for several years or decades.

1. https://www.cs.helsinki.fi/u/oottela/tfc.pdf

stouset · on Jan 24, 2015

One-time pads are also trivially malleable. Flip a bit in the ciphertext, and the same bit flips in the plaintext.

_clhx · on Jan 24, 2015

Let the end user worry about data integrity. ;)

laurent123456 · on Jan 24, 2015

Maybe I don't get it, but that seems completely unsecure, isn't it? In fact I wasn't sure if this was some kind of parody, especially XORing strings and calling that "almost perfect encryption".

tptacek · on Jan 24, 2015

Virtually every stream cipher is based on the idea of generating a keystream and adding it to the plaintext with XOR. XOR is also used to combine round keys with plaintext in the round functions of block ciphers and hashes.

If you can generate a secure keystream (for instance, from pure random data well in advance of your message encryption), you can use just XOR and nothing else, and that system will be difficult to attack mathematically.

XOR as a concept gets an unjustified bad rap amongst generalist engineers because of all the terrible Rube Goldberg XOR schemes other developers have come up with. It's authentically essential to strong cryptography, but very easy to misuse in weak crypto.

sz4kerto · on Jan 24, 2015

> especially XORing strings and calling that "almost perfect encryption".

If the key size is at least as large as the plaintext AND the key is truly random then it actually is a perfect encryption, because every single bit in the output is going to be random. It is called 'one time pad' -- https://en.wikipedia.org/wiki/One-time_pad

laurent123456 · on Jan 24, 2015

If the key must the same length as the message, and must be different for every message, I guess we can't really call that encryption. May as well communicate the complete message by which ever way the secret key was communicated.

sz4kerto · on Jan 24, 2015

No, think of the following scenario: you and your friend know that you want to communicate in the future about something, maybe the outcome of some future event. You generate a long random key, and somewhere later in the future, you send the information using the pre-shared key.

It's a real encryption, much-much better than basically anything before because it's really unbreakable.

danpat · on Jan 24, 2015

It's a useful technique if you're not ready to transmit the message yet, but you can safely distribute keys. Think of a spy going out into the field. They carry a booklet with the key. As messages arrive later, they progressively go through the booklet to decode them. The messages are perfectly secure over the wire.

It's not useful or practical in many situations that we associate with cryptography over the internet, but it certainly has its applications.

Almost every other cryptographic scheme exists in order to increase convenience, at the expense of some security. The one time pad method is "perfectly" secure (assuming the key is securely generated, and kept safe), everything else is less than perfect (although computationally improbable).

thegeomaster · on Jan 24, 2015

It is encryption, but not very convenient. You forget that there is one important use of encryption, and that is in military operations. A one-time pad would be a good choice in this situation---you give soldiers who need to receive encrypted messages the keys on paper so that they can be teared and destroyed, one by one. For each message, you take the next key, decrypt the message, and then destroy the key so it's never used. Of course, if they compromise the keys on any end, it's game over, but mathematically it's as secure as it gets.

aquark · on Jan 24, 2015

Except in some cases you may have a way of securely communicating the key at an earlier point in time.

For example, you generating a 1TB key before leaving on a trip you could imagine a VPN set up that encrypted 1TB of traffic in a way that was theoretically unbreakable by someone sniffing the traffic only.