Hacker News new | past | comments | ask | show | jobs | submit login
New improvements to IPFS Bitswap for faster container image distribution (ipfs.io)
181 points by yankcrime on Feb 27, 2020 | hide | past | favorite | 60 comments



Would be curious to know how using IPFS for internal container distribution compares to using BitTorrent. IIRC BitTorrent has found similar uses in the past.

Also, how well does BitSwap work when the underlying network is congested? Do IPFS nodes do any kind of congestion control?


You can't update a torrent, if the content changes you have to create a new one. IPNS helps with that. And you can't share pieces across different torrents, if some still have the old torrent, they share it separately from the new one even if the differences are minimal.


The article doesn't mention IPNS at all, nor does it talk about the need for mutating an image while it is being shared, so I'm not sure why you think IPNS is even desirable in this use-case?


"Inter-Planetary Name System (IPNS) is a system for creating and updating mutable links to IPFS content. Since objects in IPFS are content-addressed, their address changes every time their content does. That's useful for a variety of things, but it makes it hard to get the latest version of something."

https://docs.ipfs.io/guides/concepts/ipns/

TLDR IPNS are pointers to IPFS content (ie "latest"). If you're tracking your containers and pinning to their versions elsewhere, might not need IPNS.


Thanks, but I already know what IPNS is. My points were that (a) it's not needed for this use-case, and (b) it's a distinct system from IPFS. I think you agreed in your TLDR.


I would think because IPNS is IPFS.. and you asked how IPNS compares to BitTorrent. Maybe I misunderstand your question, but the reply seems totally on topic and a valid answer to your question.


IPNS is a system that runs on top of IPFS. You do not need IPNS to use IPFS.


That's technically correct, but in practice the term "IPFS" is commonly used to refer to both ipfs and its optional feature ipns.


Not in my circles. IPFS is used a lot for storing and archiving files but IPNS is rarely used or mentioned. The only situations I've seen where IPNS would be useful, ENS was used instead as it's more reliable.


> And you can't share pieces across different torrents

So make each layer an own torrent?


That's still not great. Imagine the only difference between two versions of the layer is that you updated a single jar in a 200MB app bundle. The effective difference could be a few tens of blocks, but you still need to redownload the whole thing.

If we can manage the assignment/padding to match ipfs fragments, that could result in a massive saving.


That is a limitation of the Torrent clients.

On top of that, most of them should already be able to understand that certain files already exist; but it seems like it’s more of a file-level feature at this point rather than block-level.

Is there something I am misunderstanding?


Does anyone use IPNS for anything real? It performs terribly whenever I've tried it. It almost turned me off entirely from using IPFS until I realized that it's just an optional extra and the rest of IPFS is still useful without it. I really wonder how many people try out IPFS, run into issues with IPNS, and then write off the whole project because they thought IPNS was a central piece to it. I think the project would do really well to strike all references to IPNS from their getting-started guides, bring up reliable alternatives like DNSlink records (or even ENS), and then maybe bring up IPNS as an optional extra.


that should be irrelevant for container distribution as container images are immutable.


although there was an IP to bittorrent for that :(


Netflix use IPFS? That's quite an endorsement and makes me take it a lot more seriously.


”Netflix and IPFS began collaborating on ways to incorporate peer-to-peer services into Netflix’s developer tooling”

Developer tooling is a pretty limited part of their traffic probably


The volume of traffic doesn't really matter that much. Developer productivity is just as important as the service you're actually selling when you're the size of Netflix. If your N thousand engineers are suddenly unable to work, or slowed down by X%, that's a huge problem. Large companies treat (or should treat) developer tooling issues as seriously as application outages.

If Netflix is using IPFS for anything worth mentioning, it's almost certainly substantive enough to be considered an endorsement.


Of course the service itself is more important. If there are outages in the service, you will lose customers. If developers lose time at most your new features risk delays. If developers are less productive over time you lose a bit of money.

I say this as a developer for a FANG company.

It’s still an endorsement, but not nearly as strong as if the broadcasting was somehow relying on IPFS. As it is, this is probably just some engineering manager that made some non-crucial tool and put that on ipfs.


I don't disagree, but what they use it for, at what scale and at what priority matters a lot


Read the article? They were experimenting with using IPFS to distribute container images across AWS regions. And they do it at Netflix scale...


Didn't they talk about distributing Docker containers for their developers?

I mean, they have many devs, but that's still a few magnitudes below their number of viewers.



One big optimization that could help in some cases for container platforms like Fargate is not downloading the entire image just to run the container. Instead read files (or even just blocks) from network storage on demand.

This is basically how booting from a disk image works on most cloud platforms too.



I'm shocked to see IPFS used to make something Faster...


it's remarkable! a bootstrap technique to improve time coherence -- what an innovation!


I think that was a dig at ipfs's issues with real world usage where a lot of traffic until recently was wasted on metadata and every node used lots of bandwidth to gossip. Meanwhile the actual throughout on a non-tuned node was not great at all.


Has this changed now?


I just realised that "recently" happened at the end of 2018 https://blog.ipfs.io/53-go-ipfs-0-4-18/ - oops.

There was also a change later which turned off nodes being the middleman by default, but not sure which version.

It's supposed to be much better these days, but I haven't tried it again in a few months.


Ah, that agrees with my experience, I noticed the IPFS node behaving much better some time in the last year.


It's awesome to see these kind of improvements on IPFS. A while back as a side project I created an IPFS-backed docker registry which allows you to push and pull docker images from IPFS [0]

[0] https://github.com/miguelmota/ipdr


Yay. I last used ipfs for leeching abandonware around November. Although it had a tough time getting started and it would occasionally freeze up for several minutes, it worked well when it worked. It's seems to be getting better from when I first tried it.


I wasn't aware of this use case, can you post a link/CID (if it's legal)?


> The node sends out a want for each CID to several peers in the session in parallel, because not all peers will have all blocks. If the node starts receiving a lot of duplicate blocks, it sends a want for each CID to fewer peers. If the node gets timeouts waiting for blocks, it sends a want for each CID to more peers.

Trying to recall how the protocol works. Doesn’t this pattern of behavior mean that a lot of machines will end up with the beginning of a file and few will have the end? It sounds like the start of downloading would be very fast and the end would slow down while it hunts for a source

May be why this is only 20% faster than Dockerhub.


I don't see why it'd be ordered for blocks received

If I remember correctly, a node with a full file will send out the blocks in parallel, so leechers should receive blocks effectively in random order

The only reason for some machines having only the start versus the end would be implementation-wise, where maybe you see the want list ordered, and the seeder responds by only shipping the first n blocks it reads in the want list

If the seeder responds with random ordering, you'd avoid the problem of all leechers all having the same blocks


Files are not downloaded sequentially, they're chunked into blocks which are sent in parallel.


Each client downloads in random order, or all clients download in the same order?


"Web 2.0"

Hadn't seen that gem for a while.


3.0, no?


pretty sure we're already at 4.0 with the Internet of Shit (IoT) and Mobile Internet, depending on the one using the buzzword

but thats besides the point of the parent. web 2.0 hasn't really been mentioned in ages.


"the container runtime can be modified to retrieve layers identified by their CIDs"

How do you do this? Exercise for the reader? :)

For the case of distributing containers in a datacenter with P2P, theres also this work:

https://github.com/uber/kraken



I really don't understand why there is a big hype towards IPFS which is still in development stage although, there are other options which are already out and working like Sia - Skynet which is not even getting a fraction of attention IPFS is getting.


Skynet is barely two weeks old fwiw. As more people play around with it and see how strong it is I think it'll get a lot more attention. Lot of crypto projects are already planning to add support in the coming months.


Saving Netflix's bandwith costs by sacrificing your privacy.

IPFS and bittorrent don't do anything to protect the data you are uploading and your IP address.

Case in point: https://iknowwhatyoudownload.com/en/peer/

Now every website you visit, any ad/tracker, any homecalling phone app can tell what movies and contents you watch and when you are at home. For years.


> Saving Netflix's bandwith costs by sacrificing your privacy.

> IPFS and bittorrent don't do anything to protect the data you are uploading and your IP address.

And Netflix are using it across AWS for distributing container images, not touching client devices, unless you know something more than what the article says.

This doesn't have anything to do with customer's privacy.


IPFS/libp2p is meant to be modular in this regard. It's certainly possible to use Tor with IPFS to protect your IP address but this is WIP. https://github.com/hashmatter/libp2p-onion-routing Openbazaar, which uses IPFS can run as a hidden service https://github.com/OpenBazaar/openbazaar-go/blob/master/docs...


Do not visit this site. If you visit it once and then visit it after a while they will fill it with crap that you did not download in order to blackmail you or something I presume. Alternatively they might start tracking you only once you visit it. Even if they are honest it is extremely inaccurate (it had 8.8.8.8 torrenting anime a while ago for example)


Bogus results can simply be the result of ISP IP address recycling which, in my case, is pretty obvious. Besides, why would they wait on an IP address visit to fill it with blackmail material? The suspicion doesn't make much sense to me.


Well, I am on 4G and they list my IP downloading whole movies and games through torrents. Doesn’t make much sense.


How does Netflix using IPFS between their servers sacrifice my privacy?


They're using it internally, not for streaming to customers.


I thought that ipfs is about high availability, fault tolerance, including some resistance against addressed censorship.

It never looked like an anonymizing tool to me; did anybody advertise it as such?


"resistance against addressed censorship" does not work at all when all your traffic is made public.

People can be prosecuted or otherwise harassed for sharing contents on a P2P system.

> It never looked like an anonymizing tool to me; did anybody advertise it as such?

You are confusing "anonymizing" with "leaking a lot of information to the whole world".

They constantly "forget" to tell people about the huge security impact.


Ipfs helps you distribute content which may get taken down. It does not help you evade local police.

For the second scenario, you want another layer which maintains secrecy. (Like the tor transport https://ipfs.io/ipfs/QmYKQvBsbYrRhdaGvQXcEoSam7s5gKVYULfRgNP...)


Well to be fair it would be quite imprudent to have a file system where everyone can see what everyone is doing.

In particular it's pointless to be able to circumvent censorship if you can't do so anonymously.


Circumventing censorship without strong anonymity is not necessarily pointless: you can publish something sensitive from a place where you'd not be prosecuted (e.g. from abroad). The point is to bring the message to those who are denied information.


I guess you mean "Making Netflix save bandwidth". You will get the same amount of data to watch the ninja turtles regardless if you use IPFS or not.


Yes. Edited to clarify.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: