Hacker News new | past | comments | ask | show | jobs | submit login
Building a BitTorrent client from the ground up in Go (jse.li)
692 points by eat_veggies on Jan 4, 2020 | hide | past | favorite | 90 comments



Over the holidays, I challenged myself to learn Go by torrenting the Debian ISO -- from scratch. This post is a bit of a brain dump about everything I've learned over the past week.


> This isn’t the full story

> For brevity, I included only a few of the important snippets of code. Notably, I left out all the glue code, parsing, unit tests, and the boring parts that build character. View my full implementation if you’re interested.

Any chance you'll do independent write ups to cover all the missing bits? Especially the ones that build character. :)


Would also be good to include how BT uses a merkel tree, that's quite a key component in the software.


Does any client support the new standard? I was under the impression that no one did.


It doesn't until the second version, which isn't mainstream.



Wow, I've been looking for a BT tutorial at about this level of abstraction for two years or so. And would have liked to have read it a decade or two back as well.

Wondering if there are any similar docs or a book for implementing a server in Python, and a client in Python or Dart/Flutter? I know libtorrent bindings are a thing but the docs seemed quite dense and I didn't even know where to start—until now that is.



Same same. But just in Javascript more.


Awesome write-up and a great way to dig into the inner-workings of BitTorrent. Nice work!

Glancing at the wiki page for BitTorrent[0], I'm a bit surprised that there isn't more of an effort to create cross-platform libraries and clients using Go or Rust for this. Seems like a perfect use case? For example, synapse [1].

[0] https://en.wikipedia.org/wiki/Comparison_of_BitTorrent_clien...

[1] https://github.com/Luminarys/synapse


This is mature Go bittorrent library:

https://github.com/anacrolix/torrent


I didn't realize this existed, thanks for the link! It'll be a great codebase to look through :)



I think BitTorrent is a very 2000's/early 2010's technology. That's when it feels like it peaked, at least. The future is probably more toward WebTorrents [0] (with WebRTC) and IPFS [1]. And those have great cross platform libraries.

[0] https://webtorrent.io/

[1] https://ipfs.io/


Isn't webtorrent just bittorrent implemented in JS so it can run on browsers?


webtorrent is not compatible with bittorrent because browsers can't speak raw TCP/UDP as required by the bittorrent protocol. They use webrtc, which is not compatible with existing bittorrent clients and requires some centralized components (signalling servers) while bittorrent can operate fully decentralized.

Also running things in a browser is at cross purposes with torrent long-term availability, large-scale (giga or terabyte) file management, efficient IO and so on.


Yes, it is.


As a torrent newb, what's the tldr on on why webtorrent over BitTorrent?


This is totally just my opinion, but the days of the mainstream consumer having a dedicated torrent client installed on their desktop computer (or even as a phone app) are pretty much over.

But as a streaming technology, torrents are still pretty great. And like it or not, being able to run in browsers is a major advantage right now.


Over? With all the new vod services like Netflix, Disney+, Amazon Prime ect ... It all brings back cable souvenirs, meaning than Torrent is stronger than ever.


Probably not what you meant, but I’d think a licensing and technology arrangement that allowed VoD services to deliver content this way (like Skype when it was good) and use customers as a second line CDN would be awesome.


If customers would be subsidizing the corporation's bandwidth costs, I hope they'd be getting a good discount over what a centralized distribution service with the same licensed content would cost.

Donating bandwidth to a community torrent for the general good is not really the same thing as donating bandwidth to a for-profit corporation. I'm much more inclined to do the former than the later.


This would ultimately be in the form of not further raising prices, and become a standard thing. Don’t have >15 Mbps up? No problem, extra $3 per month.


Didn't some mmorpgs do this to distribute updates over the client while you were playing?


Back in the 2010s, yes, not anymore though afaik, I assume that CDNs/traffic got cheap enough to not go that far. Microsoft actually introduced P2P update sharing with Windows Update for Windows 10, I think it's enabled by default.


P2P Windows updates make sense where large numbers of machines in the same LAN may be updated at the same time.


Blizzard used (uses?) it to distribute patches for WoW. Back in the day the patches were quite large (multi GB) compared to available bandwidth at the time.


BBC used to do this with their early implementations of iPlayer. It was very unpopular then because people had expensive bandwidth and the feature wasn't clearly described.

https://en.wikipedia.org/wiki/BBC_iPlayer#'iPlayer_1.0'


This is true for the streaming use case, where you have a browser open and you want to watch a video.

The case the author is going after is downloading a large Debian distro, so in the use-case of large, distributed file distribution there will always be solutions like the author.


How many end users actually download linux distributions? I'd argue that most mainstream consumption is:

1. video streaming

2. centralized sources like news websites

3. sharing in group chats

End users don't really download large files like Debian outside of gaming and OS updates. To that end, Blizzard Downloader [0] uses BitTorrent, and Windows 10 uses some similar p2p system [1]

[0] https://wow.gamepedia.com/Blizzard_Downloader

[1] https://www.pcworld.com/article/2955491/how-to-stop-windows-...


Not anymore since the Battle.net launcher.


WebTorrent is a version of the BitTorrent protocol modified to work over WebRTC so that it can work in a browser.

Someone is actively working with the maintainer of libtorrent (popular implementation of BitTorrent) to get support for it into the library: https://github.com/arvidn/libtorrent/pull/4123


Wow that's a whole lot of progress in very little time. I've been waiting for this crossover since the early days of WebTorrent. Once enough popular clients implement this, in theory all clients could simply move over to using web sockets and deprecate the old methods.


Genuinely curious: Why are the old methods [0] not upto the mark? What extra do WebSockets [1] bring to the table?

[0] What are those? Plain Old TCP Sockets?

[1] You mean, WebRTC, right?


It's not that they aren't good enough. It uses both TCP and UDP (and μTP if enabled). Perfectly fine for it's use case. I guess my thinking is more along the lines of, if everyone is using WebRTC (yes you're right, WebRTC not WebSockets, my mistake), there is no need to use and maintain multiple channels. If WebRTC works well enough, over time, once everyone is able to use WebRTC, you can trim a lot of the TCP code and thus keep maintenance overheads much lower. Especially when more peers can be included in your pool.

This would obviously require majority of the torrent clients to implement WebTorrents as otherwise it wouldn't make sense and you'd exclude more than you would include. If I'm not mistaken, libtorrent is used among many of the open source torrent clients, and so I think merging this code would suddenly give WebTorrent users an enormous amount of peers once they all upgrade. For client implementations upgrading is probably requires next to no code changes.


WebRTC works in browsers. This alone is enough to trump every other consideration these days.


WebTorrent should just work on a browser, no special client necessary. This will allow projects like PeerTube to create websites where the primary content is decentralized by viewers kind of thing.


This will be hard to do relying only on WebTorrent browser based client, because if no one is watching particular video, there will be no seeds.

You close browser = close WebTorrent = you stop seeding

You stop watching video = you stop seeding it

Long and large video file? = if most people watch at half of video, then data from beginning is already out of the buffer = no seed for first half of video.

To this day there is still no support for WebTorrent in other popular Bittorrent clients so they can't connect to each other. But there is some movement in libtorrent library so we will see.


That makes sense. I was wondering how browsers were communicating with UDP peers... Turns out they aren't!

A quick glance at the webtorrent readme says it uses UDP on node.

There is a webtorrent desktop app in beta that does both.[0]

Here's hoping it makes it to libtorrent. Even then, I'd imagine it would take quite a while to make it into the clients running on seedboxes, desktops, etc.


Brave Browser has WebTorrent built in. Just paste a magnet link into the URL bar and it will start downloading/seeding.


Alternatively, one could use https://instant.io from any browser to seed or download. Makes for a good alternative to https://send.firefox.com for really large files.


Opera supported proper torrents back in its Presto days.


instant.io and webtor.io are two great examples of in-browser webtorrent streaming/dowmload sites!


Why would I download a sketchy ass torrent client when I can go to a website, download the torrent in my browser, and be able to rely on my browsers sandboxing against the torrent client? Even better, if it's video I'm torrenting, I can rely on my browsers sandboxing against the torrents contents too.

It's more convenient, it's safer, it sucks for other torrentors because I don't seed as long, but let's be honest, most people downloading torrents don't care.


Why a torrent client would be sketchier than any other software ? There are plethora of good open source clients like qbittorrent, transmission ... even aria2 ! At that rate, are you suspicious of wget too ? Even more of Chrome ? There are a lot of sketchy things going on in that one ! It is the user's responsability to choose in which software he trust, and a bittorent client is not worst than anything else ?


Me, personally, sure I can find a torrent client I trust.

My sibling who isn't a programmer. They would have no clue what is trustworthy or not. Downloading random executables from the internet is not a good idea. How should they know qbittorrent is safe as long as it's downloaded from www.qbittorrent.org but utorrent is basically malware? (Or at least was for awhile).

They already know they can trust firefox and chrome, so it's better for them to just do that.

That's before you get to questions about security of the clients, security of the website you are downloading the torrent clients from, etc.


> Downloading random executables from the internet is not a good idea.

OTOH, `apt search torrent` on Ubuntu probably doesn't recommend any malware. Though their GUI nowadays promotes snaps and I'm not sure how much that is better than random executables.


Yes, but a non-tech-savvy person is probably using Windows or Mac... My sibling in particular is definitely using windows (though they do use WSL for some of their work).


Sure. I thought a bit more about app stores and came up with the idea that Microsoft Store should be not that bad, because they probably have money to hire a lot of moderators. But search for "torrent" on microsoft.com recommends programs I haven't heard of, and search results for "qbittorrent" and "transmission" are outright fishy.


I’m extremely suspicious of chrome. More so than a torrent client. Google is a known evil.. emphasis on evil. It’s basically voluntarily downloading spyware.


Browsers are far more battle-tested than just about any other web-facing application on your computer.

Of course, you could make the personal decision to trust a client, and that is fine. But if you aren’t willing to blindly trust a client, the other guy’s point still stands - browsers are probably just the better choice here from a security POV.


> Browsers are far more battle-tested than just about any other web-facing application on your computer.

They also have a monstrous attack surface because they are "web-facing". A specialized client that only implements one protocol without any connection to the "web" is far easier to reason about and debug.

If you only consider the number of man-years an application has been battle-tested, you imply that design complexity and attack surface doesn't matter. If we account for complexity by using a metric like "(man-years of battle-testing)/(magnitude of attack surface)", a well-tested specialized client that hasn't had many recent bug reports is a much safer choice than anything running in a browser.

> blindly trust a client

That's even worse for the browser: you have to trust several orders of magnitude more code implementing a massive set of interdependent features. Yes, there are probably a lot more people working on fixing bugs in the browser, but there are also a lot of people adding/modifying features and thus creating new bugs.


But a bittorrent client isn't trying to be a browser and all the complex stuff that requires. All it's doing is downloading bittorrent files and having a usable GUI.

And as the original article demonstrated the first half of that is a weekend project.


With webtorrent clients you might just get a miner in your codebase:

https://github.com/DiegoRBaquero/BTorrent/issues/71#issuecom...


A "miner" wastes a bit of my cpu while I have the website open, frankly, who gives a crap? If I do give a crap I'll notice and close the webpage.

I'm much more concerned about "real" viruses like ransomware that the browser does successfully protect against.


Lots of people give a crap, myself included. Why would I run a resource hungry browser process with a huge attack surface and support for dozens of protocols, plugins, etc for the entire length of the transfer just to download some files?

For a dedicated torrent client to infect your PC with a "real" virus, you'd need to download the torrent and execute the file yourself (PEBCAK). I trust my own judgement much more than some random webpage.


Moving the application layer to the browser won't magically solve trust and security problems in the long run. This is a battle OS vendors should resume fighting.


From a user's perspective it "solves it" because you already have to trust the browsers security.

Or in other words torrenting becomes no worse than everything else you do.


Busy box (I think) has a torrent client in it and most Linux distributions ship with that if I remember correctly.

If you’re worried about how “sketchy” it is most busybox applets can be read through and totally understand in a few hours tops.


The internet tells me busybox doesn't provide a torrent client.

But you're missing the point, even if it did it wouldn't be a torrent client non-technical users can use. It's not me, a programmer, who I'm talking about here.


The torrent client isn’t the sketchy part of torrenting.


iPhone.


A decade ago now, I wrote a bit torrent client [1] that started as a project for a software development class.

I think the focus of the class was roadmaps and other parts of the development process for a larger project. Most of the groups picked games or things like that. We chose to build a bit torrent client and I ended up continuing working on it for a year or so.

I was pretty proud of it at the time. I'm sure there are a bunch of implementation details that seemed like a great idea as a Sophomore in college that no longer sound so clever, but the implementation is pretty sound. I used it to download a lot of stuff over the years.

The part I think is most interesting is the code for managing the torrent pieces and writing them into the proper places for the various files within the torrent. (Not necessarily my implementation of it, just the math behind the process) [2]

I remember feeling really clever the first time everything came together and actually downloaded a full torrent. My first bit software development victory.

[1] https://github.com/war1025/Torrent

[2] https://github.com/war1025/Torrent/blob/piecereg/tcl/tm/torr...


I used it to download a lot of stuff over the years.

That's great to hear --- I know there are quite a few developers who would do lots of different projects, but only to the point of getting something that barely works, and then never use it again. Having it in regular use means you have to continue to fix bugs and make improvements, which a lot of developers unfortunately seem to hate doing, although it's often the more valuable skill. "A true craftsman makes his own tools," as the old saying goes.


> The part I think is most interesting is the code for managing the torrent pieces and writing them into the proper places for the various files within the torrent. (Not necessarily my implementation of it, just the math behind the process)

Can you explain a little bit how this is interesting mathematically? On first thought, it seems like you would probably use a lot of pwrite, or if want convenience over safety, mmap all the files and write directly to memory. Not to diminish your achievement, but I didn't immediately see anything interesting here. Can you explain a bit more?


What I meant more-so was the math of a "piece" from a torrent is a fixed number of bytes.

The way the files are laid out for a torrent, they are listed in some order in the initial torrent file, along with their individual lengths.

The data for the torrent is then treated as the contiguous sequence of bytes for the files in the listed order.

That contiguous sequence is then broken into "pieces" of whatever fixed size. So finding which file(s) and what position within those files piece 148, for example, goes to involves some math that is pretty straight forward all in all, but is still rewarding when it's the first time you've done such a thing.


Chapeau!


Strictly leeches (does not support uploading pieces)

Be careful with using this client --- the swarm for a Linux ISO is probably a bit more forgiving, but you may get banned very quickly by the tracker or other clients, because they will definitely notice that.


In the same vein, I started to implement a BitTorrent client in OCaml. https://github.com/phlalx/sawadee

Initially it was a project to learn about the Core/Async libraries. I will put it back to life at some point, but it grew quite big and became very time consuming for a single developer.


Nice! Looks like a good idea to implement something from scratch to improve the understanding of a language and also a tool. Any other ideas about stuff to implement? Thanks.


I built an http server and a markov chain library to learn python. They're both pretty worthwhile weekend-size projects.

I heard cryptopals [0] is really good for learning both crypto and a new programming language. It might be cool to build things like a DNS resolver, an X window manager, gameboy game, or something from coreutils like grep or tar

[0] https://cryptopals.com/


> It might be cool to build things like a DNS resolver

I wouldn’t recommend writing a recursive resolver as a weekend project, to many edge cases that you’ll get frustrated by, but that said, I do use writing an authoritative DNS server as a great way to fully learn a new language. You get file handling, grammar parsing, serialization/deserialization, bitpacking, network I/O, data structures, both tcp and UDP socket handling, deamonization, concurrency, etc etc. Covers most of the more difficult parts of many languages and once you’ve done it once or twice, doing it again for new languages is less about figuring out how to do it, but rather how to do it in X language. At last count, I think I’ve done upwards of twenty languages now.


Nice! Crypto stuff indeed sounds interesting.


I implemented a blockchain-based cryptocurrency for fun. It sounds cliche but the details are surprisingly complicated: what exactly do you use as the criteria to determine whether you should accept a block? What are the ways a malicious actor can try to send you an invalid block? Just how exactly do you confirm a double-spending attempt? How do you pick a block based on which to mine a new block? Etc.

https://github.com/kccqzy/SimpleBlockchain


I noticed my terminal emulator (Terminator) is a python app. I now have a todo item to try and write my own!


Excellent write up and very readable code.


I noticed a few things with the Go code that confused me. I haven't coded in Go in quite some time, so I may be way off.

`copy(buf[1:], h.Pstr)`

In this line, are you copying the entire buffer to a string? Doesn't it overflow into other data elements?

Also, and I may be wrong, in the following line, it appears that you are casting to a []byte when it's already a slice of bytes, which should still be fine.

`peers[i].Port = binary.BigEndian.Uint16([]byte(peersBin[offset+4 : offset+6])`

I really enjoyed the tone and the code. I'm not done with the article, but I love it so far.


thank you for the feedback! It's really helpful.

copy's arguments are like copy(destination, source) so we're copying the string into the buffer. Also, copy will never overflow because it will only copy up to the length of the shortest buffer.

you're right about the unnecessary cast to []byte -- that function used to take a string as arguments, and when I changed it, I didn't change the rest of it. I've removed it.


I played with this it requires go 1.13 and seems to time out and close if one of the trackers is unresponsive for more than 10 seconds. It has no command line options and crashes if no options are present. It also doesnt support magnet links, only torrent files.


@eat_veggies The writeup is very useful. Can you describe the process of learning two concepts simultaneously? Or you knew before the parts of Bitorrent and how to implement it in some other language?


Great write up, very clear explanation about the torrent protocol, and nice to see the rational behind certain code decisions.


Any resources for doing the same thing in python?


Can it be done in node.js under a VM so to isolate the traffic and force it to use vpn etc.


Found it based on comments here:

https://github.com/webtorrent/webtorrent

I guess the VM part is just to use a unikernel then try to find a way to auto use vpn.


Use proxychains for this


i wanted to do the same not that long ago but i abondoned that idea when i could not find UI library i would like to work with and i was not looking to make a cli application.


The blog doesn't discuss it, but the choice of DHT implementation in BitTorrent is Kademlia [0] whilst Chord is more popular, used by Amazon Dynamo and Facebook Cassandra [1]. One of the authors of Kademlia, David Mazières, later co-authored the Stellar crypto-currency protocol [2] with Jed McCaleb of Ripple fame.

That said, quite famously, Bram Cohen, the co-inventor of BitTorrent, failed the Google interviews.

[0] https://news.ycombinator.com/item?id=18711980

[1] https://news.ycombinator.com/item?id=3480480

[2] https://news.ycombinator.com/item?id=16125920


Kademlia, especially with pieces of SKademlia, is much better with untrusted peers and is much more popular in p2p projects such as BT, I2P, Ethereum, IPFS, etc.


Any sources regarding failed interview? Googles (ha-ha) says nothing.


May be the down-voters think I'm a troll? I thought it was an interesting note about the creator of a protocol responsible for a third of all Internet traffic. Or, does it highlight the alleged ineffectiveness of FAANG-style tech interviews?

Btw, here's a ref: https://web.archive.org/web/20200105225938/https://www.there...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: