Open Protocol for Resumable File Uploads

kvz · on Nov 22, 2018

Was just casually (well ok maybe it’s more compulsive than that) browsing HN and was pleasantly surprised to find tus on the front page. I’m one of the core contributors and happy to answer questions. Although it’s late here so it may take a few hours while I’m asleep :)

geekodour · on Nov 23, 2018

I have not looked into tus properly yet. but how does this compare with bittorrent seeding and can both be combined somehow?

kvz · on Nov 23, 2018

People ask that more yes, on the surface they have a lot in common. Both can be used to transmit huge files, both can chunk files up and only transmit remaining parts, and pick up and resume at a later point in time, and (in case of tus optionally with the Concat extension) send these chunks simultaneously.

Tus however works as a thin layer on top of HTTP, so it’s easy to drop into existing web sites/load balancers/auth proxies/firewalls. BitTorrent ports are often closed off on airports/hotels/corporate networks. But websites work. And if you can access a website, you will be able to upload files to it with tus.

Another difference is that tus assumes classic client/server roles. The client uploads to the server. Downloading is done via your regular http stack and not facilitated by tus. BitTorrent facilitates both uploading and downloading in single clients. It is more peer-to-peer and decentralized in nature, where tus clients typically upload to a central point (like: many video producers upload to Vimeo. Not very contrived as Vimeo adopted tus).

There are more differences (Discoverability, trackers, pull vs push, pulling from many peers at once) but the comment is getting very long so I hope this already helps a bit :)

Happy to dive deeper into this at request tho :)

chillaxtian · on Nov 22, 2018

S3 Multi-Part Upload API can be used to chunk an object into smaller parts, which can succeed or fail independently.

https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview....

kvz · on Nov 23, 2018

Yes that is very helpful. Our s3 storage backend for tusd uses it, and our https://uppy.io file uploader does too, usable directly from the browser (so you can choose to not use tus at all with it). S3 resumable uploads do come with a few limitations that make some people still choose tus tho:

* chunks need to be >5MB which can be problematic on flaky/poor connections (rural areas, tunnels, clubs/basements, people on the move switching connections all the time)

* your s3 bucket needs to allow write by the world, or you need to deploy signature authentication

* there’s an s3 vendor lock-in some might worry about

* not an open protocol, no chance of advancing it with the community

That said, that still leaves a large audience for direct s3 resumable uploads and I’m thankful aws offers it!

michaelmior · on Nov 23, 2018

As far as vendor lock in, it seems like there are a large number of other vendors supporting the S3 API, so this doesn't seem like a huge concern.

kvz · on Nov 23, 2018

That’s a fair point. And I guess with e.g. Minio you could selfhost too.

S3 is great and in fact, at Transloadit we deploy a content ingestion network (reverse cdn) of many regional tusd servers, close to our customers’ end users, but they all ultimately save to S3 using multipart. We’re happy S3 customers.

So why the extra layer. Because this let’s us offer resumability below 5MB, lower regional latencies, roll our own auth, and switch to a different cloud provider without introducing breaking changes at the customer facing side (assuming the new cloud bucket provider does not offer an S3 compatible interface, or even just a slightly incompatible one)

Ultimately you’re still locked-in with AWS protocol-wise, and there’s no community platform for advancing it, so addressing any of these issues is going to be hard.

michaelmior · on Nov 23, 2018

My comment wasn't intended to a dig at tus. Like the work you all are doing!

kvz · on Nov 23, 2018

eps · on Nov 22, 2018

If I read the spec correctly, PATCH method is actually more of APPEND, no?

It would seem logical and practical to allow PATCH to modify any part of a resource that is already present on the server and/or to extend it by appending. This would also make the whole thing useful beyond resuming of interrupted uploads, e.g. to allow for rsync-style updating of existing files.

kvz · on Nov 23, 2018

Yes, APPEND is not an official HTTP method though. Allowing to modify parts at any location makes things a little bit more complex and comes with some overhead. If you do need to upload multiple chunks simultaneously, you can opt into our Concat extension however, which does exactly that. Our latest blog posts has some images to illustrate.

eps · on Nov 23, 2018

What overhead is that exactly?

My point is that you appear to be pushing for adoption of an extension that handles one specific use case for PATCH, when a more general extension is trivially possible with little to no extra effort.

kvz · on Nov 23, 2018

(I hope I understand your proposal correctly, I fear I might not, if so please clarify, but) more chunks come at the expense of more requests. After a connection drop each separate chunk needs to be renegotiated and transmitted. For some use cases that trade off is well worth it, like when latency is low, but tcp settings or QoS policies won’t let you saturate single connections, so tus does offer ‘sending multiple chunks by default and in parallel’, as an opt-in, via the Concat extension.

If your question is why not make Concat the default mode of operation, the additional roundtrips are the reason. For fragile connections these are often very costly, and we want tus to really shine in those situations, by default. If your users are all operating on big tubes, you’ll likely want to deploy Concat, but that’s not an assumption we want to make.

digianarchist · on Nov 22, 2018

The HTML5 FileAPI has been around for a few years now yet a lot of sites don't support resumable uploads. I know it adds a bunch of complexity server side as you have to restitch those pieces together but it makes for a good user experience.

kvz · on Nov 22, 2018

I hope with a client like https://uppy.io and a server like tusd, it’s much more manageable these days. Less boilerplate writing and more battle tested components for sure.

aiCeivi9 · on Nov 23, 2018

Slight Offtopic - why after so many years Chrome & Firefox have so poor support for resuming interrupted file downloads? In case of Firefox I am almost sure it was better in past. I have to use 'wget -c' or https://www.freedownloadmanager.org/ for bigger filles.

chrisrhoden · on Nov 23, 2018

As I suspect you may already know, this is dependent on the server 1) indicating support for byte range requests and 2) correctly implementing it.

I don't think I have noticed Firefox getting worse at this over time, but I'm not downloading large files every day. Would you be willing to share where you're noticing this?

nikeee · on Nov 23, 2018

It depends on the server, which has to implement HTTP Ranges [0]. Servers like nginx and Apache 2 should suport it. I'm not certain about the whole Node.js and Go backends out there. I think the support in Firefox does not have changed.

[0]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requ...

speeq · on Nov 22, 2018

See also: https://news.ycombinator.com/item?id=10591348

ioquatix · on Nov 23, 2018

There is a ruby implementation too: https://github.com/janko-m/tus-ruby-server

kvz · on Nov 23, 2018

Love the work that Janko is doing in our ecosystem! There are implementations for most major languages. So a tus server could even just be some php code that you install with composer and add to your existing Apache setup.

amelius · on Nov 23, 2018

Finally a Request-For-Comments that actually contains a comments section!

treve · on Nov 22, 2018

This came a long way since 2013. Congrats, looks very robust now!

kvz · on Nov 22, 2018

Thank you for the kind words!

JdeBP · on Nov 23, 2018

Zawinski's Law needs some revision. Not only do WWW apps expand until users can chat asynchronously, but WWW protocols expand until they incorporate ZMODEM. (-:

silvestrov · on Nov 23, 2018

Tus-Version: 1.0.0,0.2.2,0.2.1

seems like over design. The list will get very long over time.

Just use a single integer instead and have the header include min and max version supported. E.g.

Tus-Version: 1-4

meaning it supports version 1 thru 4. No reason to be able to say version 1 and 4 but not 2 and 3.

kvz · on Nov 23, 2018

We are discussing this very topic here https://github.com/tus/tus-resumable-upload-protocol/issues/... — it has stalled a bit so I would be very happy to see you or other interested/concerned HN readers weigh in. People sharing concerns on GitHub is the main way the protocol has progressed.

aaaaaaaaaab · on Nov 22, 2018

What’s wrong with HTTP PUT with Content-Range?

treve · on Nov 22, 2018

  > An origin server that allows PUT on a given target resource MUST send
  > a 400 (Bad Request) response to a PUT request that contains a
  > Content-Range header field (Section 4.2 of [RFC7233]),

https://tools.ietf.org/html/rfc7231#section-4.3.4

zzo38computer · on Nov 22, 2018

Maybe that should be fixed, then. HTTP PUT with the range specified seem to me it would be sensible.

treve · on Nov 22, 2018

Responding with 400 Bad Request is actually something that was added after some servers allowed Content-Range on PUT and others didn't.

It was never standard, but the end-result was that some clients assumed PUT + Content-Range would work, which meant that some servers would apply the change while others would ignore the header and overwrite the entire resource with the chunk.

There's no sane way to add support for this header and make older servers behave correctly, so now we have better facilities for this.

The standard way is to use PATCH + a mimetype that describes the update + perhaps using Accept-Patch to find out what formats are available. It's extremely doubtful that Content-Range for PUT will ever be standard. If there's going to be a future standard, it's likely PATCH based.

It could be possible with PUT and a new 'Expect' header, but not sure if that gives any advantages now over PATCH.

zozbot123 · on Nov 23, 2018

> There's no sane way to add support for this header and make older servers behave correctly, so now we have better facilities for this

KISS. Endow the "400 Bad Request" server response with a special header that acts like a cookie or nonce, with the semantics "this server does support Content-Range uploads and won't corrupt your resource". If the client resends the PUT + Content-Range request with the correct cookie/nonce added to it, it has acknowledged this semantics in turn, and the upload can now go through. This adds a roundtrip, but it's still trivial compared to what's being proposed here, and keeps the semantics of PATCH open for more complicated cases.

irishsultan · on Nov 23, 2018

You still need to update all those old servers to stop ignoring the Content-Range and abort in it's presence.

aaaaaaaaaab · on Nov 23, 2018

Or do a HEAD on the resource you want to resume uploading (this is recommended to find out how many bytes have actually went through) and if the response contains a "Accept-Range: true" header then the client can resume the upload.

zzo38computer · on Nov 24, 2018

You are right about those things, and some of the proposed solutions (that one and others).

It look to me PATCH is actually better; perhaps one of the patch formats can be a partial patch, for example if the Content-Type of the PATCH request is application/partial-content-patch then the first line of the body is the contents of the Content-Range header. In my opinion, this look better to me than the other replies to the message that this message is in reply to (although I admit anything I write may be mistaken; I am not perfect).

dcbadacd · on Nov 22, 2018

Uhh, 206 partial content??

wtfrmyinitials · on Nov 22, 2018

206 is for downloads, not uploads.

dcbadacd · on Nov 22, 2018

Oh, right. I misread the title.

eximius · on Nov 23, 2018

Is there a TL;DR? I see the whole spec is there but I don't have time to read it just this second.

Does it use anything fancy like fountain codes or does it just renegotiate chunks each time or something else?

kvz · on Nov 23, 2018

The latter.

1. The client POSTs, this allocates a unique Location which the server returns and

2. the client saves this (e.g. in localStorage) along with local file identifiers so it can be looked up later and can

3. query that URL to check how many bytes were already received, and then

4. PATCH the remaining bytes

Repeat step 3 & 4 on failures/resumes.

emersion · on Nov 23, 2018

You basically just send the offset when you resume the upload.

gsich · on Nov 22, 2018

rsync?

Yes I know this is mainly for browsers.

kvz · on Nov 22, 2018

Yes for browsers it’s cheaper to build upon http, and it let’s you move through airport/hotel/corporate firewalls without problems.

Tus is also used in datacenters for high throughput & reliable transmissions. Probably in most cases rsync is a sensible choice, but sometimes maybe you already have tus, http based auth, loadbalancing, etc in place that you want to leverage, or maybe you want to avoid exchanging ssh secrets