Per-file metadata increases significantly, but it gets rid of the per piece data (which in bittorrent v1 is 20 bytes of sha1 hash per piece and made up the bulk of the .torrent file).
The .torrent file only stores the merkle tree's root hash for each file, and the torrent client will query it's peers to get the rest of the merkle tree (verifiable against the root hash). The leafs of the merkle tree are the hash of each 16kb block.
Interesting consequences of this:
Piece size isn't baked into the file anymore (and I've seen torrents with 16mb blocks), the client can dynamically chose it's verification piece size by requesting only so many layers of the merkle tree. Or it could skip requesting the tree and verify the whole file at once.
Merkle tree roots will be globally unique. You can scan torrent files for duplicated files and download common files from multiple swarms.
Right, in BitTorrent v1 the size of the .torrent file is O(number of files) + O(number of bytes), but with this it's just O(number of files) with a higher constant factor.
Piece size is still baked into the file (as piece length), and is used for presence bitsets, which are a crucial part of the swarm algorithm. Clients download the rarest pieces first to boost efficiency, and this information is handled as bitsets shared between clients indicating "I have chunk {1, 2, 3, ... 50, 52, ... }".
Merkle tree roots will only be unique for each piece length. Piece length should still correlate with total size, to prevent huge bitsets-- a 16KB piece length on a 64GB torrent would have a 4 million item / 500KB bitset (!), so it could take 500KB of RAM per connected peer to maintain state-- or maybe compressed bitsets make this problem irrelevant in practice?
> You can scan torrent files for duplicated files and download common files from multiple swarms.
This is one of the biggest things I feel is missing from the current protocol and I'm very glad it's in v2 draft. Now when a group of related torrents are repacked into a single torrent all the swarms are complementary instead of competitive. You don't have to choose between seeding the big pack instead of the individual files, just do what you want and the whole swarm still benefits.
The .torrent file only stores the merkle tree's root hash for each file, and the torrent client will query it's peers to get the rest of the merkle tree (verifiable against the root hash). The leafs of the merkle tree are the hash of each 16kb block.
Interesting consequences of this:
Piece size isn't baked into the file anymore (and I've seen torrents with 16mb blocks), the client can dynamically chose it's verification piece size by requesting only so many layers of the merkle tree. Or it could skip requesting the tree and verify the whole file at once.
Merkle tree roots will be globally unique. You can scan torrent files for duplicated files and download common files from multiple swarms.