New improvements to IPFS Bitswap for faster container image distribution

jude- · on Feb 27, 2020

Would be curious to know how using IPFS for internal container distribution compares to using BitTorrent. IIRC BitTorrent has found similar uses in the past.

Also, how well does BitSwap work when the underlying network is congested? Do IPFS nodes do any kind of congestion control?

hiccuphippo · on Feb 27, 2020

You can't update a torrent, if the content changes you have to create a new one. IPNS helps with that. And you can't share pieces across different torrents, if some still have the old torrent, they share it separately from the new one even if the differences are minimal.

jude- · on Feb 27, 2020

The article doesn't mention IPNS at all, nor does it talk about the need for mutating an image while it is being shared, so I'm not sure why you think IPNS is even desirable in this use-case?

toomuchtodo · on Feb 27, 2020

"Inter-Planetary Name System (IPNS) is a system for creating and updating mutable links to IPFS content. Since objects in IPFS are content-addressed, their address changes every time their content does. That's useful for a variety of things, but it makes it hard to get the latest version of something."

https://docs.ipfs.io/guides/concepts/ipns/

TLDR IPNS are pointers to IPFS content (ie "latest"). If you're tracking your containers and pinning to their versions elsewhere, might not need IPNS.

jude- · on Feb 27, 2020

Thanks, but I already know what IPNS is. My points were that (a) it's not needed for this use-case, and (b) it's a distinct system from IPFS. I think you agreed in your TLDR.

asdkhadsj · on Feb 27, 2020

I would think because IPNS is IPFS.. and you asked how IPNS compares to BitTorrent. Maybe I misunderstand your question, but the reply seems totally on topic and a valid answer to your question.

jude- · on Feb 27, 2020

IPNS is a system that runs on top of IPFS. You do not need IPNS to use IPFS.

zapita · on Feb 27, 2020

That's technically correct, but in practice the term "IPFS" is commonly used to refer to both ipfs and its optional feature ipns.

Sargos · on Feb 28, 2020

Not in my circles. IPFS is used a lot for storing and archiving files but IPNS is rarely used or mentioned. The only situations I've seen where IPNS would be useful, ENS was used instead as it's more reliable.

hobofan · on Feb 27, 2020

> And you can't share pieces across different torrents

So make each layer an own torrent?

viraptor · on Feb 27, 2020

That's still not great. Imagine the only difference between two versions of the layer is that you updated a single jar in a 200MB app bundle. The effective difference could be a few tens of blocks, but you still need to redownload the whole thing.

If we can manage the assignment/padding to match ipfs fragments, that could result in a massive saving.

heavenlyblue · on Feb 29, 2020

That is a limitation of the Torrent clients.

On top of that, most of them should already be able to understand that certain files already exist; but it seems like it’s more of a file-level feature at this point rather than block-level.

Is there something I am misunderstanding?

AgentME · on Feb 28, 2020

Does anyone use IPNS for anything real? It performs terribly whenever I've tried it. It almost turned me off entirely from using IPFS until I realized that it's just an optional extra and the rest of IPFS is still useful without it. I really wonder how many people try out IPFS, run into issues with IPNS, and then write off the whole project because they thought IPNS was a central piece to it. I think the project would do really well to strike all references to IPNS from their getting-started guides, bring up reliable alternatives like DNSlink records (or even ENS), and then maybe bring up IPNS as an optional extra.

compsciphd · on Feb 28, 2020

that should be irrelevant for container distribution as container images are immutable.

untoreh · on Feb 28, 2020

although there was an IP to bittorrent for that :(

pjc50 · on Feb 27, 2020

Netflix use IPFS? That's quite an endorsement and makes me take it a lot more seriously.

Grustaf · on Feb 27, 2020

”Netflix and IPFS began collaborating on ways to incorporate peer-to-peer services into Netflix’s developer tooling”

Developer tooling is a pretty limited part of their traffic probably

bastawhiz · on Feb 27, 2020

The volume of traffic doesn't really matter that much. Developer productivity is just as important as the service you're actually selling when you're the size of Netflix. If your N thousand engineers are suddenly unable to work, or slowed down by X%, that's a huge problem. Large companies treat (or should treat) developer tooling issues as seriously as application outages.

If Netflix is using IPFS for anything worth mentioning, it's almost certainly substantive enough to be considered an endorsement.

Grustaf · on Feb 27, 2020

Of course the service itself is more important. If there are outages in the service, you will lose customers. If developers lose time at most your new features risk delays. If developers are less productive over time you lose a bit of money.

I say this as a developer for a FANG company.

It’s still an endorsement, but not nearly as strong as if the broadcasting was somehow relying on IPFS. As it is, this is probably just some engineering manager that made some non-crucial tool and put that on ipfs.

nolok · on Feb 27, 2020

I don't disagree, but what they use it for, at what scale and at what priority matters a lot

musingsole · on Feb 27, 2020

Read the article? They were experimenting with using IPFS to distribute container images across AWS regions. And they do it at Netflix scale...

k__ · on Feb 27, 2020

Didn't they talk about distributing Docker containers for their developers?

I mean, they have many devs, but that's still a few magnitudes below their number of viewers.

decentralised · on Feb 27, 2020

Have a look: https://awesome.ipfs.io/

etaioinshrdlu · on Feb 27, 2020

One big optimization that could help in some cases for container platforms like Fargate is not downloading the entire image just to run the container. Instead read files (or even just blocks) from network storage on demand.

This is basically how booting from a disk image works on most cloud platforms too.

rochaporto_ · on Feb 27, 2020

That should be there soon : https://github.com/containerd/containerd/issues/3731

js4ever · on Feb 27, 2020

I'm shocked to see IPFS used to make something Faster...

DaniFong · on Feb 27, 2020

it's remarkable! a bootstrap technique to improve time coherence -- what an innovation!

viraptor · on Feb 27, 2020

I think that was a dig at ipfs's issues with real world usage where a lot of traffic until recently was wasted on metadata and every node used lots of bandwidth to gossip. Meanwhile the actual throughout on a non-tuned node was not great at all.

stavros · on Feb 28, 2020

Has this changed now?

viraptor · on Feb 28, 2020

I just realised that "recently" happened at the end of 2018 https://blog.ipfs.io/53-go-ipfs-0-4-18/ - oops.

There was also a change later which turned off nodes being the middleman by default, but not sure which version.

It's supposed to be much better these days, but I haven't tried it again in a few months.

stavros · on Feb 28, 2020

Ah, that agrees with my experience, I noticed the IPFS node behaving much better some time in the last year.

miguelmota · on Feb 28, 2020

It's awesome to see these kind of improvements on IPFS. A while back as a side project I created an IPFS-backed docker registry which allows you to push and pull docker images from IPFS [0]

[0] https://github.com/miguelmota/ipdr

anonsivalley652 · on Feb 27, 2020

Yay. I last used ipfs for leeching abandonware around November. Although it had a tough time getting started and it would occasionally freeze up for several minutes, it worked well when it worked. It's seems to be getting better from when I first tried it.

stavros · on Feb 28, 2020

I wasn't aware of this use case, can you post a link/CID (if it's legal)?

hinkley · on Feb 27, 2020

> The node sends out a want for each CID to several peers in the session in parallel, because not all peers will have all blocks. If the node starts receiving a lot of duplicate blocks, it sends a want for each CID to fewer peers. If the node gets timeouts waiting for blocks, it sends a want for each CID to more peers.

Trying to recall how the protocol works. Doesn’t this pattern of behavior mean that a lot of machines will end up with the beginning of a file and few will have the end? It sounds like the start of downloading would be very fast and the end would slow down while it hunts for a source

May be why this is only 20% faster than Dockerhub.

setr · on Feb 27, 2020

I don't see why it'd be ordered for blocks received

If I remember correctly, a node with a full file will send out the blocks in parallel, so leechers should receive blocks effectively in random order

The only reason for some machines having only the start versus the end would be implementation-wise, where maybe you see the want list ordered, and the seeder responds by only shipping the first n blocks it reads in the want list

If the seeder responds with random ordering, you'd avoid the problem of all leechers all having the same blocks

tylersmith · on Feb 27, 2020

Files are not downloaded sequentially, they're chunked into blocks which are sent in parallel.

hinkley · on Feb 27, 2020

Each client downloads in random order, or all clients download in the same order?

tyingq · on Feb 27, 2020

"Web 2.0"

Hadn't seen that gem for a while.

jungong · on Feb 27, 2020

3.0, no?

411111111111111 · on Feb 27, 2020

pretty sure we're already at 4.0 with the Internet of Shit (IoT) and Mobile Internet, depending on the one using the buzzword

but thats besides the point of the parent. web 2.0 hasn't really been mentioned in ages.

tln · on Feb 27, 2020

"the container runtime can be modified to retrieve layers identified by their CIDs"

How do you do this? Exercise for the reader? :)

For the case of distributing containers in a datacenter with P2P, theres also this work:

https://github.com/uber/kraken

humblebee · on Feb 27, 2020

This might be related: https://github.com/docker/distribution/pull/2906

psKama · on Feb 27, 2020

I really don't understand why there is a big hype towards IPFS which is still in development stage although, there are other options which are already out and working like Sia - Skynet which is not even getting a fraction of attention IPFS is getting.

Taek · on Feb 27, 2020

Skynet is barely two weeks old fwiw. As more people play around with it and see how strong it is I think it'll get a lot more attention. Lot of crypto projects are already planning to add support in the coming months.

marshmellowtest · on Feb 27, 2020

Saving Netflix's bandwith costs by sacrificing your privacy.

IPFS and bittorrent don't do anything to protect the data you are uploading and your IP address.

Case in point: https://iknowwhatyoudownload.com/en/peer/

Now every website you visit, any ad/tracker, any homecalling phone app can tell what movies and contents you watch and when you are at home. For years.

shakna · on Feb 27, 2020

> Saving Netflix's bandwith costs by sacrificing your privacy.

> IPFS and bittorrent don't do anything to protect the data you are uploading and your IP address.

And Netflix are using it across AWS for distributing container images, not touching client devices, unless you know something more than what the article says.

This doesn't have anything to do with customer's privacy.

Acrobatic_Road · on Feb 27, 2020

IPFS/libp2p is meant to be modular in this regard. It's certainly possible to use Tor with IPFS to protect your IP address but this is WIP. https://github.com/hashmatter/libp2p-onion-routing Openbazaar, which uses IPFS can run as a hidden service https://github.com/OpenBazaar/openbazaar-go/blob/master/docs...

dependenttypes · on Feb 27, 2020

Do not visit this site. If you visit it once and then visit it after a while they will fill it with crap that you did not download in order to blackmail you or something I presume. Alternatively they might start tracking you only once you visit it. Even if they are honest it is extremely inaccurate (it had 8.8.8.8 torrenting anime a while ago for example)

hombre_fatal · on Feb 27, 2020

Bogus results can simply be the result of ISP IP address recycling which, in my case, is pretty obvious. Besides, why would they wait on an IP address visit to fill it with blackmail material? The suspicion doesn't make much sense to me.

heavenlyblue · on Feb 29, 2020

Well, I am on 4G and they list my IP downloading whole movies and games through torrents. Doesn’t make much sense.

detaro · on Feb 27, 2020

How does Netflix using IPFS between their servers sacrifice my privacy?

yjftsjthsd-h · on Feb 27, 2020

They're using it internally, not for streaming to customers.

nine_k · on Feb 27, 2020

I thought that ipfs is about high availability, fault tolerance, including some resistance against addressed censorship.

It never looked like an anonymizing tool to me; did anybody advertise it as such?

marshmellowtest · on Feb 27, 2020

"resistance against addressed censorship" does not work at all when all your traffic is made public.

People can be prosecuted or otherwise harassed for sharing contents on a P2P system.

> It never looked like an anonymizing tool to me; did anybody advertise it as such?

You are confusing "anonymizing" with "leaking a lot of information to the whole world".

They constantly "forget" to tell people about the huge security impact.

viraptor · on Feb 27, 2020

Ipfs helps you distribute content which may get taken down. It does not help you evade local police.

For the second scenario, you want another layer which maintains secrecy. (Like the tor transport https://ipfs.io/ipfs/QmYKQvBsbYrRhdaGvQXcEoSam7s5gKVYULfRgNP...)

contravariant · on Feb 27, 2020

Well to be fair it would be quite imprudent to have a file system where everyone can see what everyone is doing.

In particular it's pointless to be able to circumvent censorship if you can't do so anonymously.

nine_k · on Feb 28, 2020

Circumventing censorship without strong anonymity is not necessarily pointless: you can publish something sensitive from a place where you'd not be prosecuted (e.g. from abroad). The point is to bring the message to those who are denied information.

knocte · on Feb 27, 2020

I guess you mean "Making Netflix save bandwidth". You will get the same amount of data to watch the ninja turtles regardless if you use IPFS or not.

marshmellowtest · on Feb 27, 2020

Yes. Edited to clarify.