Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Per-file metadata increases significantly, but it gets rid of the per piece data (which in bittorrent v1 is 20 bytes of sha1 hash per piece and made up the bulk of the .torrent file).

The .torrent file only stores the merkle tree's root hash for each file, and the torrent client will query it's peers to get the rest of the merkle tree (verifiable against the root hash). The leafs of the merkle tree are the hash of each 16kb block.

Interesting consequences of this:

Piece size isn't baked into the file anymore (and I've seen torrents with 16mb blocks), the client can dynamically chose it's verification piece size by requesting only so many layers of the merkle tree. Or it could skip requesting the tree and verify the whole file at once.

Merkle tree roots will be globally unique. You can scan torrent files for duplicated files and download common files from multiple swarms.



Right, in BitTorrent v1 the size of the .torrent file is O(number of files) + O(number of bytes), but with this it's just O(number of files) with a higher constant factor.

Piece size is still baked into the file (as piece length), and is used for presence bitsets, which are a crucial part of the swarm algorithm. Clients download the rarest pieces first to boost efficiency, and this information is handled as bitsets shared between clients indicating "I have chunk {1, 2, 3, ... 50, 52, ... }".

Merkle tree roots will only be unique for each piece length. Piece length should still correlate with total size, to prevent huge bitsets-- a 16KB piece length on a 64GB torrent would have a 4 million item / 500KB bitset (!), so it could take 500KB of RAM per connected peer to maintain state-- or maybe compressed bitsets make this problem irrelevant in practice?


v1: O(path-depth * number of files + number of bytes)

v2: O(log(path-depth) * number of files)

that is assuming some constantish branching factor in your directory structure

> Merkle tree roots will only be unique for each piece length.

Merkle trees are independent of piece size, which means you can use them to dedup across torrents.


Oh, neat! I missed the part where larger piece sizes correspond to higher layers of the tree.

Presumably clients still reconstruct (and store, somewhere) the full Merkle tree to do incremental validation and support queries.


> You can scan torrent files for duplicated files and download common files from multiple swarms.

This is one of the biggest things I feel is missing from the current protocol and I'm very glad it's in v2 draft. Now when a group of related torrents are repacked into a single torrent all the swarms are complementary instead of competitive. You don't have to choose between seeding the big pack instead of the individual files, just do what you want and the whole swarm still benefits.


> Or it could skip requesting the tree and verify the whole file at once.

To clarify, this works by the client deterministically reconstructing the tree once they have the whole file, then checking the root's hash, correct?


Yeah, it's a deterministic tree of hashes.

Each leaf is the hash of a 16KB chunk. On the next layer up you have a series of nodes which are the hashes of the two leafs below it hashed together.

You add enough layers until you get a single root hash at the root of the tree.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: