> I was under the impression the use of SHA1 was only for hashing and not for a security signature.
This is mostly correct. The commands git commit -S and git tag -s both sign commits and tags, respectively, using GPG. The signature covers only the commit object: the tree's SHA1, the commit/author data, and the commit message, not the entirety of the data.
git's objects' SHA1 is computed, for "file" objects, as "blob " + ascii decimal size of blob + nul + data in the blob. The two PDFs in Shattered are the same size, and thus, have the same git object header. However, naïvely prefixing the two shattered PDFs with their git header results in different hashes; I presume this is b/c the internal state of SHA1 differs from what the constructed data that causes the collision expects. You can see this yourself:
If you commit the two PDFs to git, those are the hashes they will have. They are different. Now, if you take the header into account when computing the collisions, you can create a collision. The paper seems to say that the attack takes a known (and controllable) prefix P, and finds two sets of two 512-bit (64 byte) blocks (different for the two files), M_1^(1) and M_2^(1) for the first file, and M_1^(2) and M_2^(2) for the second file, that cause the internal state of the hash to collide; after than, any (also controllable) suffix S (but the same for both files) can be appended.
This is why the diff[1] is as long as it is: each side of the diff is 128 bytes; the two M blocks.
Thus, if you computed a prefix P that started with the git blob header, then some data for your file, and ran this attack, you should be able to create two files that, when committed to git, collide. The two pieces of data in the header don't cause any trouble: the paper's method allows you to control the output size, mostly, so we can mostly choose any size we want; the type is always "blob" and is thus effectively constant.
Now, from what I can gather from the paper, there doesn't seem to be real control over the portion that differs; that's really what we need to take advantage of this beyond just creating collisions. This is a collision, but it's not what's called a chosen-prefix collision. A chosen-prefix collision lets me choose two different prefixes (which gives the attacker much more control over the differences in content, thus it becomes much easier to craft a "good" and a "bad" version); this attack requires both files to have the same prefix.
Now, here is the worst way I can think of as to how I could get a colliding object to you:
Imagine that I can convince someone you trust to sign a commit; this commit contains, either directly or indirectly via an ancestor commit, an object whose hash we will collide.
Now, if I can later get you to download that commit and all its parents, except I substitute one object's data for another's. The signature is still good: I've not changed the commit object in any way; it references objects by SHA1 hash, and the hash hasn't changed, "only" the data.
Here's another scenario, and you don't need signatures in this scheme; if I push a commit to master w/ the "good" version of the object, but before you pull it, I push to you a branch that contains the bad object, then git writes the bad object to your objects folder, under the colliding hash. You now pull master, but git doesn't pull the "good" version of the object, b/c it already has an object with that hash. Your master is now different; I've effectively poisoned your repo with the bad object.
Now, whether you can pull off this stunt or not, IDK. My point is that a) git's signatures don't cover the entirety of the repository, only a now-very-weak cryptographic hash, and b) git is (I believe) subject to object collision from this. But presently I'm not seeing how it can be maliciously taken advantage of. But then, people are really clever.
This is mostly correct. The commands git commit -S and git tag -s both sign commits and tags, respectively, using GPG. The signature covers only the commit object: the tree's SHA1, the commit/author data, and the commit message, not the entirety of the data.
git's objects' SHA1 is computed, for "file" objects, as "blob " + ascii decimal size of blob + nul + data in the blob. The two PDFs in Shattered are the same size, and thus, have the same git object header. However, naïvely prefixing the two shattered PDFs with their git header results in different hashes; I presume this is b/c the internal state of SHA1 differs from what the constructed data that causes the collision expects. You can see this yourself:
If you commit the two PDFs to git, those are the hashes they will have. They are different. Now, if you take the header into account when computing the collisions, you can create a collision. The paper seems to say that the attack takes a known (and controllable) prefix P, and finds two sets of two 512-bit (64 byte) blocks (different for the two files), M_1^(1) and M_2^(1) for the first file, and M_1^(2) and M_2^(2) for the second file, that cause the internal state of the hash to collide; after than, any (also controllable) suffix S (but the same for both files) can be appended.This is why the diff[1] is as long as it is: each side of the diff is 128 bytes; the two M blocks.
Thus, if you computed a prefix P that started with the git blob header, then some data for your file, and ran this attack, you should be able to create two files that, when committed to git, collide. The two pieces of data in the header don't cause any trouble: the paper's method allows you to control the output size, mostly, so we can mostly choose any size we want; the type is always "blob" and is thus effectively constant.
Now, from what I can gather from the paper, there doesn't seem to be real control over the portion that differs; that's really what we need to take advantage of this beyond just creating collisions. This is a collision, but it's not what's called a chosen-prefix collision. A chosen-prefix collision lets me choose two different prefixes (which gives the attacker much more control over the differences in content, thus it becomes much easier to craft a "good" and a "bad" version); this attack requires both files to have the same prefix.
Now, here is the worst way I can think of as to how I could get a colliding object to you:
Imagine that I can convince someone you trust to sign a commit; this commit contains, either directly or indirectly via an ancestor commit, an object whose hash we will collide.
Now, if I can later get you to download that commit and all its parents, except I substitute one object's data for another's. The signature is still good: I've not changed the commit object in any way; it references objects by SHA1 hash, and the hash hasn't changed, "only" the data.
Here's another scenario, and you don't need signatures in this scheme; if I push a commit to master w/ the "good" version of the object, but before you pull it, I push to you a branch that contains the bad object, then git writes the bad object to your objects folder, under the colliding hash. You now pull master, but git doesn't pull the "good" version of the object, b/c it already has an object with that hash. Your master is now different; I've effectively poisoned your repo with the bad object.
Now, whether you can pull off this stunt or not, IDK. My point is that a) git's signatures don't cover the entirety of the repository, only a now-very-weak cryptographic hash, and b) git is (I believe) subject to object collision from this. But presently I'm not seeing how it can be maliciously taken advantage of. But then, people are really clever.
[1]: https://news.ycombinator.com/item?id=13721633