Per-file metadata increases significantly, but it gets rid of the per piece data...

Scaevolus · on Aug 8, 2017

Right, in BitTorrent v1 the size of the .torrent file is O(number of files) + O(number of bytes), but with this it's just O(number of files) with a higher constant factor.

Piece size is still baked into the file (as piece length), and is used for presence bitsets, which are a crucial part of the swarm algorithm. Clients download the rarest pieces first to boost efficiency, and this information is handled as bitsets shared between clients indicating "I have chunk {1, 2, 3, ... 50, 52, ... }".

Merkle tree roots will only be unique for each piece length. Piece length should still correlate with total size, to prevent huge bitsets-- a 16KB piece length on a 64GB torrent would have a 4 million item / 500KB bitset (!), so it could take 500KB of RAM per connected peer to maintain state-- or maybe compressed bitsets make this problem irrelevant in practice?

the8472 · on Aug 8, 2017

v1: O(path-depth * number of files + number of bytes)

v2: O(log(path-depth) * number of files)

that is assuming some constantish branching factor in your directory structure

> Merkle tree roots will only be unique for each piece length.

Merkle trees are independent of piece size, which means you can use them to dedup across torrents.

Scaevolus · on Aug 8, 2017

Oh, neat! I missed the part where larger piece sizes correspond to higher layers of the tree.

Presumably clients still reconstruct (and store, somewhere) the full Merkle tree to do incremental validation and support queries.

infogulch · on Aug 8, 2017

> You can scan torrent files for duplicated files and download common files from multiple swarms.

This is one of the biggest things I feel is missing from the current protocol and I'm very glad it's in v2 draft. Now when a group of related torrents are repacked into a single torrent all the swarms are complementary instead of competitive. You don't have to choose between seeding the big pack instead of the individual files, just do what you want and the whole swarm still benefits.

computerphage · on Aug 8, 2017

> Or it could skip requesting the tree and verify the whole file at once.

To clarify, this works by the client deterministically reconstructing the tree once they have the whole file, then checking the root's hash, correct?

phire · on Aug 8, 2017

Yeah, it's a deterministic tree of hashes.

Each leaf is the hash of a 16KB chunk. On the next layer up you have a series of nodes which are the hashes of the two leafs below it hashed together.

You add enough layers until you get a single root hash at the root of the tree.