If the average consumer thinks the problem bitcasa is solving is a strong enough pain point and use bitcasa's product anyway, then I can see SpiderOak moving into this area regardless of the security issues it brings up. Unfortunately, the average consumer doesn't care enough about security for "implementation details" to sway them if they have a pain point that the product solves - even if those implementation details make a world of difference in terms of privacy. There are already products out there that store files from a user's devices (laptops, desktops, etc.) on the cloud with no mention of not having access to the encryption keys if they do do encryption: https://www.sugarsync.com/ is an example.
A little off topic: I do think that the community has done a good job drilling it into the average user to look for the green bar and padlock in their browser when they go to their bank's website. It means Security. heh.
The encryption key is the hash of a chunk of data, so for identical chunks the encrypted data is identical. Understand.
But suppose I want to decrypt. To generate the key, I need to hash the unencrypted chunk of data. But if I have unencrypted data, I don't really need decryption.
I guess my question is - how does decryption work when I have lost the files on my local machine, but I remember my password?
This comes up often. I took it upon myself to write it out without too many technical details. You can see it here (in particular, the section about uploading and syncing files): http://crypto.stanford.edu/~ananthr/dropbox/
The most obvious workaround for this would be to remember the key. Each user's fileset includes a copy of the key generated from each file, encrypted with that user's private key. The user would request the files, and also the encrypted keys. They would then decrypt said files locally.
[Wild thought] Create a file with a list of H(chunk) forall chunk you own. You encrypt this file with a local password that the storage provider does NOT see and plainly store it.
I re-read the paper, and they seem to be doing what you describe. The list of H(chunk) is called "Chunk Map", and local password is called "dedicated map key"...
.. but after that they get into another round of encryption with the users public key - that's where I get lost.
There are ways to text match against encrypted documents, at least approximately, without requiring homomorphic encryption.
One simple scheme would be to encrypt in ECB mode (which has it's own disadvantages, which I'll ignore for now). Then, say that we have a relatively long string to match, we can encrypt that string with the same key (after trimming to the block size) and simply search for it's encrypted form in the document. If you see a match, and the string isn't an exact multiple of the block size, you need to decrypt a portion of the file (which you can easily do in ECB mode) and perform a more-exact match.
I wouldn't recommend this in a real system, but it's one example of how you can keep data encrypted at rest without requiring complete decryption for string matching.
I don't, on the other hand, know of any results in homomorphic encryption that would allow you to do the same thing with that approach.
While that may be true, text search over documents has certain requirements that differ from the use cases being mentioned for homomorphic encryption over the past two years (since Craig Gentry's paper on fully homomorphic encryption). Although possible over structured data that you'd find in a database, I'm not sure if there's been a clear way to apply homomorphic encryption to search. If there has been progress made in this area, I'd love to know.
In documents, keywords are scattered through the plain text and conjunctive keyword search should be possible without giving way to dictionary attacks: If I search for 'fire truck' I want a sentence 'the truck was firing up' to potentially return as a search result, but at the same time I don't want to encrypt each tokenized word ('fire', 'truck') and pattern-match the ciphertext.
Encrypted search is an active research area. One of the more interesting/cited papers I've come across on the topic is Conjunctive, Subset, and Range Queries on Encrypted Data by Boneh and Waters [1] although there are several others. Boneh also authored an earlier paper on encrypted search involving public key encryption.
All the homomorphic schemes I've seen do addition and/or multiplication. Is there one that does substring matching, or can you somehow use addition and multiplication to search?