Hacker News new | past | comments | ask | show | jobs | submit login
Protobuf-ES: Protocol Buffers TypeScript/JavaScript runtime (buf.build)
205 points by jacobwg on Oct 31, 2022 | hide | past | favorite | 183 comments



I was curious how this performed server-side versus protobufjs. (what we're currently using) I hastily wired it up to protobufjs's benchmark suite. (https://github.com/protobufjs/protobuf.js/tree/master/bench) The suite is pretty ancient so getting buf-compiled esm added was a challenge.

Granted, the benchmark is created for protobufjs and they probably optimize against it. Protobuf-ES was about 5.1x slower than protobufjs for encoding and 14.8x slower than protobufjs for decoding.

This was run on my M1 with node 16.14; not particularly scientific, etc, etc.


Any benchmarks against other PB implementations? Would love to see numbers against the JVM, and than against C++ or Rust.

My gut feeling is anyway that PB in JS is likely a failure against just using JSON (which the JS runtime implements in efficient native code).


No, but anecdotally the others are probably a lot faster. And you can do even better than that with zero-copy serializations.

But protobuf has very wide support and is decent enough in js, at least server-to-server. Having a schema is very valuable, and there are substantial size wins over JSON, even with gzip.


Protobuf is really easy to write a working but slow version for.


I gave up on protobufs years ago. The protobuf team has no idea how to write PHP and JS libraries. I got segfaults from using the PHP extension. The built-in toJSON would return invalid JSON (missing braces for binary types). Ridiculous stuff.

I really just prefer to use JSON for everything. It's much easier to debug and observe traffic (browser Network tab). I like JSON-RPC, very simple spec (basically one page long). I don't like REST.

All that said, I'm really glad to see the community take things into their own hands.


It's much easier to debug and observe traffic (browser Network tab).

The DX for JSON things is much better. The UX for protobufs is much better (faster, less data over the wire, etc). Which you optimize for is up to you, but there isn't a straightforward "Use this tech because it's the best one."


> faster, less data over the wire, etc.

I've always wondered about this. Firstly, I'm fairly sure clientside JSON parsing is significantly faster than protobuf decoding but even data over the wire: JSON can be pretty compressible so surely the gains here are going to be marginal. Surely never enough benefits to UX to warrant the DX trade off, right?


protobuf parsing is far faster- it's a binary protocol. The underlying code is highly optimized and has to handle about 1/10th the total bytes. In computer, reducing memory access is often the best way to optimize.

PB can always be decoded to a text representation if you need to inspect it.


JS __is__ dumb at handling binary. The overhead is significant. The first thing to do when optimizing a Nodejs program is always replace loops that iterate through individual byte of binary with some native(wasm?) equivalent. JSON on the other hand isn't affected by this overhead. Because JSON.parse is a native method on every platform.

I once doing a mixing of two buffers that contains PCM. A simple task that take two number, average and put into another buffer. The native implementation is about 10x fast than the one I wrote with JS (Or consume 10X less cpu time).

A native Protobuf is definitely going to beat a native JSON implementation. A JS Protobuf if also likely to beat a JS JSON implementation.

But a JS Protobuf to native JSON? I doubt.


Do you have any links showing protobuf is faster?

There's nothing in your comment that hadn't already been said before by sibling commenters but as far as I've seen in the real world JSON appears to be faster in practice. Which is all that counts.

Yours and the many other commenters making the same assumption (it's binary ergo it must be fast) make a really good case for PB's adoption being rooted in theoretical assumptions rather than real-world benefit.

I get it. It makes sense that it should be faster. Nothing is self evident though. You gotta measure it.


I don't really get your attitude here. In particular, I'm not disagreeing that JSON parsers in the browser could be currently faster than protocol buffers.

I'm saying that computer science and hardware dictate that protocol buffers are faster for a wide range of reasons. That part's not in question- smaller data encodings have better cache use, and require far fewer dictionary (hash table) lookups at parse time, as well as far length work parsing strings. If you want to argue against my point there I don't know what to say.

If it was a priority to write a blindingly fast protocol buffer parser in JS, it's almost certain than an expert could write a faster one than a similar JSON parser.


> I don't really get your attitude here

My attitude here is that I made a specific observation: JSON is likely to be faster or at least negligibly slower in browsers in practice.

Everyone is replying either with theoretical speed comparisons or server-to-server non-JavaScript benchmarks, which don't seem relevant to my very specific observation I made up top...


If you had bothered to benchmark it, you'd have realized that lots of protobuf libraries are actually surprisingly slow.


> In our tests, it was demonstrated that this protocol performed up to 6 times faster than JSON.

https://auth0.com/blog/beating-json-performance-with-protobu...


The 6 times faster benchmark from that article is describing a Java server and Java client.

This thread is about protobuf vs JSON in a JavaScript environment.

The article you linked _does_ talk about JavaScript environments, too, but the numbers are much less impressive.


Json parsing is orders of magnitude slower than protobuf decoding.


I did some brief googling after reading your comment and I did find one article showing clientside protobuf being faster than JSON[0]. However they didn't isolate parsing - the only thing they measure is total request time to a Java spring application, so the JSON slowdown will include the Java JSON serialisation overhead as well as the request size/network overhead. My instinct is that these two will heavily favour protobuf making the JSON parse still likely to be faster.

It also shows a difference of 388ms (protobuf) vs 396ms (JSON) which is pretty negligible. Certainly not orders of magnitude.

Do you have other sources?

[0] https://auth0.com/blog/beating-json-performance-with-protobu...


Oh come on... how can one assume a binary somehow TLV-encoded format is not faster than parsing strings (generall json schemaless btw, the dynamicity also adds on top, while yes, protobuf also has variable sized containers). It is like you would claim parsing a string to an int is having no overhead over a straight int (yes I know proto ufs still require the varint decoding, still huge difference).

It id also not only the speed but also size is usually a magnitude off (and no, compression doesn't cut it and trades size again for computation).

Sure, if size and speed do not matter it is strange that you had considered protobuf at all.. but claiming they are never needed just means you have never been to resource constrained systems?

What you cite there, I assume most of that 400ms has nothing to do with the message encoding at all btw..


(a) You're making assumptions based on rule of thumb, I'm talking about real world usage: your points make sense in theory but don't necessarily reflect reality

(b) I'm talking about a narrow & specific case. PB may outperform JSON in most cases but I'm very specifically referring to browsers where JSON is native (& highly optimised) whereas PB is provided by a selection of open source libraries written in javascript. So that domain heavily favours JSON perf-wise.


> You're making assumptions based

No, not at all... coming from embedded where apeed, memory size and also bandwidth did count, json was actually.not just worse, but just wouldn't have been feasible (because our protobufs already barely fit memory and MTU constraints).


One important thing to consider with JSON is that a lot of people really, really care about JSON performance -- optimsing parsing in assembler, and rewriting internal datastructures just to make serialising + deserialising JSON faster.

I'm sure given two implementations of equal quality protobuf would easily outperform JSON, but I can also believe the JSON implementation in (for example) v8 is very, very hard to beat.



I just benchmarked it on my computer -- the protobuf is twice as fast (well, 1.8x), which is good, but I don't think I'd use that as a basis for choosing the technology I use.

Of course, I might use protobuf because I prefer it in my code to JSON, and it certainly is faster (if only twice).


Have you stepped through protobuf processing code? There's a lot of special cases, ifs, branches here and there. Protobufs within protobufs. Its not like its a size, then 100 floats packed together, theres more overhead than youd think. (Not to mention the client side allocations etc etc) I use protoc compiled to wasm for protobufs and it is fast, but theres a lot of wasm overhead to execute that code.

Json parsing is also a lot of special cases, error testing, but the v8 team has spent a huge amount of time optimising json parsing (theres a few blog posts on it). Im not assuming either way, but it's definitely as cut and dry as one would assume.


Stepped through? Yes..as I hinted, coming from an embedded environment, and measured compared highly optimized json parsing code (that even had much limitations, like very limited nesting, no lists) vs nanopb => clear winner on all points (memory reqs, performance, encoded size) - which is really not that surprising?


There are two ways to encode a repeated field (100 floats, but could also be any size up to the limits of repeteating fields): "Ordinary (not packed) repeated fields emit one record for every element of the field." That means type, value, type, value, etc"

However, "packed" fields are exactly a length followed by a byte array of the typed data. This was an oversight in original proto2 which is unlikely to be corrected, but packed the default in proto3.


100 (or any N) floats prefixed by a size is exactly what you would get from `repeated float f = 1 [packed=true];`


They didn't assume, you did. They showed some real data and you reacted emotionally.


If there's a JSON parser faster than a PB parser (for the same underlying data content) it just means the JSON parser was optimized more. By every rule in computing, PB parsing is far faster than JSON for every use case for a simple reason: the messages use less RAM, and therefore, moving the data into the processor and decoding it takes less time.


Theoretical performance doesn't matter in UX, only real world. Yes conceptually it's possible to make protobuffs faster than json, but someone still has to build that. Fast native json parsers already exist, that's the benchmark protobuffs has to beat significantly to make the worse DX worth it.


I believe the answer is „it depends”: https://medium.com/aspecto/protobuf-js-vs-json-stringify-per....


yes, sure it depends on the implementation, as the poster above said. You need to compare similarly optimized implementstions.. but really: no surprise?!?


How can JavaScript code (PB decoder) be faster than native code (JSON parser)?


Much, much less processing to do. Most of pb decoding is just reading bytes until you fill your data structure.


It's protocol 101, pb is a binary protocol with known schema so of course it has to be faster than json for encoding/decoding. Now it does not means that it's going to be faster all the time, it depends of the maturity of the library / language but on paper yes it is faster.


> it does not means that it's going to be faster all the time, it depends of the maturity of the library / language

I feel like I'm having to repeat myself a lot here as noone seems to have read the original comment correctly: we're talking about one specific language in one specific known environment here. Noone is claiming that JSON outperforms PB in general: only that it does in browsers, where it's actually relevant for UX.


It’s relevant for UX throughout the entire stack.

Where I’m working now, we have a REST API for users to interact with and every call behind the scenes is proto. As we deal with quite large objects, the benefits of avoiding repeated serialization and deserialization add up quickly.

From the user’s perspective we have a performant app, and much of this is possible due to proto.


Thank you. Finally someone answered my original question.

So it sounds like the trade-off can be worthwhile in some cases: particularly for large objects where serialisation is a significant serverside bottleneck.

I'm curious: you say PB helps avoid "repeated serialisation/deserialisation": how? In my mind, architecting an app that uses JSON/PB on the wire serialisation happens once on output & deserialisation happens once on input. For both transfer formats. Surely you wouldn't be passing massive json strings around your app in memory?

Also curious which is the bigger bottleneck for your large objects: input or output. How large is large?



First two links are Go, so not relevant to client-side.

Third link is also server-side, but since it's NodeJS it's at least close enough / more relevant to client-side perf.

Here's the benchmark from the third link:

    benchmark        time (avg)             (min … max)
    ---------------------------------------------------
    encode-JSON  342.37 µs/iter   (311.93 µs … 1.19 ms)
    decode-JSON   435.9 µs/iter   (384.44 µs … 1.41 ms)
    encode-PB    946.43 µs/iter   (777.38 µs … 3.13 ms)
    decode-PB    770.79 µs/iter   (688.99 µs … 1.78 ms)
    encode-PBJS  696.75 µs/iter   (618.43 µs … 2.43 ms)
    decode-PBJS  455.36 µs/iter   (413.66 µs … 1.09 ms)

showing JSON to be significantly faster


ahh yea, i'm not sure why the rest of my comment didn't upload. i was going to say that i thought the common use case for protobufs was to more ergonomically communicate between microservices?

in any case, that's the only time i've ever seen it used in production. the first link is a go benchmark that i felt represented why someone would use it for those purposes, the second was linked to show that despite numerous (successful!) attempts to make deserializing/serializing data faster and smaller, JSON is still the most heavily used and i would wager it's mostly due to how easy it is to use as far as browsers are concerned. the third was a link to justify that claim and show that js-land is much, much different than go-land as far as proto's and JSON encoding/decoding are concerned!


Java to Java uncompressed in that article is 6x faster per that article.

So yeah not a whole order of magnitude. I was using my experience as a guide where JSON parsing is a huge compute hog and Protobuf is not.

I've never experimented w/ Javascript or compression or any of the other things in that article, I guess YMMV.


I specifically referred to clientside in my original comment, so not talking about java to java.

Clientside is always going to be the pertinent metric for UX since it's processed on the user's device.


at least in the frontend (without WASM), it depends.

a few months ago i tested https://github.com/mapbox/pbf and while it was faster for deep/complex structs vs an unoptimized/repetative JSON blob, it was much slower at shallow structs and flat arrays of stuff. if you spend a bit of time to encode stuff as flat arrays to avoid mem alloc, JSON parsing wins by a lot since it goes through highly optimized C or assembly, while decoding protobuf in the JS JIT does not.

of course it's not always feasible to make optimized over-the-wire JSON structs if you have a huge/complex API that can return many shapes of complex structs.


At pbf speeds, decoding is usually no longer a bottleneck, but bandwidth might be when comparing with gzipped JSON. Also, one of the primary advantages of pbf is being able to decode partially and lazily (and adjust how things are decoded at low level), which is very important in use cases like vector maps.


> At pbf speeds, decoding is usually no longer a bottleneck, but bandwidth might be when comparing with gzipped JSON.

we were streaming a few hundred float datapoints spread across a dozen(ish) flat arrays over websocket at 20-40hz and needed to decode the payload eagerly. plain JSON was a multi-factor speedup over pbf for this case. but it's fully possible i was holding it wrong, too!

even when your "bottleneck" is rendering/rasterization (10ms), but your data pipe takes 3ms instead of 1ms, it's a big effect on framerate, battery, thermals, etc.

i'm a big fan of your work! while i have you here, would you mind reviewing this sometime soon? ;)

https://github.com/mourner/flatbush/pull/44


protobufs have a great property of having a schema (and then generating code). Which means that it's pretty easy to setup a system where accidental change of API fails CI tests for mobile apps and web.

This is doable with JSON, but I've never seen a JSON based setup actually work well at catching these kind of regressions.


OpenAPI?


Assuming your developer time is contained improved DX often also leads to better UX (more features). So even if you are optimizing for UX you may well be better with JSON.


also leads to better UX (more features)

More features is not a measure of better UX. In many cases (most cases!?) it's the opposite.


Sorry; I meant more polished features as much as more by count.


I don't develop in JS so can't comment on DX there, but I've found the DX to be pretty good when using protobuf in other languages.

That's mostly been down to having IDE autocompletion for data structures and fields once the protobuf code's been generated.

For many JSON APIs I've worked with there's only been human readable documentation, making them more error prone to work with (e.g. having to either craft JSON manually for requests, or writing a client library if one doesn't already exist).


There's also msgpack. Best of both worlds.


So does that make GraphQL the best then? JSON + faster/less data over the wire.


Not when you count the DX of the backend developers. Good luck making a performant GraphQL backend that doesn't suffer the N+1 problem, and have fun whitelisting the GraphQL queries produced by your frontend, because attackers will be supplying their own queries with no regards to performance.


Best experience I had with GraphQL was a B2B app where we had a fair amount of users, as well as the "backoffice" app also powered by GraphQL. Bad users we could just ban (the user base were great folks but could barely operate a computer, so it was fine).

Backend was with Absinthe+Elixir, so it was great (if I had to do it again today I would instead use Liveview, this was in 2017 where I had to retrofit a React app into something useable).

Public user facing is a different story, the last major one I saw was Tableau, though they are also business facing where they can just ban bad users. Github also has deprecated their GraphQL endpoints[0].

[0] https://github.blog/changelog/2022-08-18-deprecation-notice-...


Re: GitHub, that deprecation notice appears to be for GitHub Packages specifically. I don't see a deprecation notice on the general API: https://docs.github.com/en/graphql


> Bad users we could just ban

To be fair, it sounds like that would just make the DX wonderful no matter which stack you were using?


GraphQL has a DataLoader (to avoid N+1) and query complexity utilities to avoid those issues.


I know. Good luck implementing it performantly while also considering filtering, pagination, etc. It's doable of course, just not nearly as easy as people like to make it sound.


GraphQL isn’t magically faster. The equivalent endpoint in rest will be faster as you won’t need to translate the query to your backend persistence. GraphQLs benefits are not execution speed.


> JSON + faster

Only if you have a very competent backend team, who, apart from dataloader, will have to figure out caching.

> /less data

Graphql responses tend to be pretty deeply nested.


Apollo's Federation makes caching much easier to reason about as you can now selectively cache sub-query pieces at the service level for that specific responsible subgraph.


I think protobuf really works well on the backend and specifically with compiled languages like Go or C++ as per seen by the usage at Google and adoption of gRPC for Go based cloud tooling. Beyond that it's a huge failure. The generated code and usage for other languages is not idiomatic. In fact it's a hindrance and you can see that by the lack of adoption except by the largest orgs who are enforcing it using some sort of grpc-web bridge with types for the frontend. Ultimately you can just convert proto to OpenApi specs and do a much better job at custom client libs with that.

I'm not a frontend dev. Most of my time was spent on the backend but what I'll say is I much prefer the fluidity and dynamic nature of JavaScript and the built in ability to deal with JSON that naturally become objects. All the type stuff is easy to do but with docs you can get away with not needing it.

My feeling. Protobuf lives on for gRPC server side stuff but for everywhere else OpenApi is winning.


It's worth checking out our take on a lot of these problems: https://buf.build/blog/connect-web-protobuf-grpc-in-the-brow...


Yea I'm aware of that. I wish you guys the best of luck. I tried a lot of this with Micro. I think it's the right direction especially if you can simplify the tooling. The hard part is just the adoption curve but I think you have a lot of funding to find your way through that.


JSON parsing is a minefield, especially in cross-platforms scenarios (language and/or library). You won't encounter those problems on toy project or simple CRUD applications. For example, as soon as you deal with (u)int64 where values are greater than 2^53, a simple round-trip to javascript can wreak silent havoc.

See http://seriot.ch/projects/parsing_json.html

Protobuf support for google's first-class citizen languages is usually very good, i.e. C++, Java, Python and Go. For other languages, it depends on each implementation.


Though you're not wrong, in what common cases are integers larger than 2^53 required?


Timestamps in nanoseconds is one.


That's not common, JS's built in Date doesn't even support nanoseconds.


I guess it depends in which domain you work? In system programming, "clock_gettime" gives you nanoseconds. If you work with GPS timestamps, you have nanoseconds.

Could it be that JS's Date doesn't support nanoseconds because it cannot represent them, which is the issue we are talking about here?

Don't get me wrong, I understand this is not something that everyone uses every day, but to me it's a pretty straightforward example that can happen in a wide range of situations. It certainly happened to me/colleagues several times in several companies.


Nice article


As always, each protocol/data format has it's place. You need to maximize the amount of data you send in each packet? Then protobuf is better than JSON. Need to support large amount of clients without any fuzz? Then JSON is better. Wanna pass around data you don't know the schema of? JSON again.

Contexts matters, there is no silver bullets, everything has trade offs and so on, and so on.


JSON messages in a compressed websocket stream are surprisingly tiny. Bigger than compressed protobuf packets but not by much, and much smaller than uncompressed protobuf packets.


Yeah, which is probably fine in most cases but sometimes not (maybe the overhead is just 1.5x, but if you're doing thousands of messages per second (not the usual API<>browser communication for web users)) and then it matters. Again it's trade-offs and highly contextual.


Honestly, gzipped json is likely much smaller than uncompressed protobuf.

If you were going to use a binary protocol, why choose one that has no partial parsing/toc these days. There are much better alternatives IMO (flatbuffers being one of them)


> Honestly, gzipped json is likely much smaller than uncompressed protobuf.

Likely not. See here for a comparison: https://nilsmagnus.github.io/post/proto-json-sizes/

Btw, binary formats can also be compressed though it typically won't yield the same compression ratio as similar json would since there will be less repeation in the binary format.


Or, we could have done a comparison with large strings and see the opposite result. Silly benchmark is silly (or should I say, specific)


> Wanna pass around data you don't know the schema of? JSON again.

This is a false flag. If you don't know the schema on the receiving (or sending, for that matter) side, then you can't do anything with the data, other than pass it on. If you _do_ know what it looks like, then it has an implicit schema whether you call it a schema or not.


At the time, we needed interop with C. So that's why we chose protobufs. But it was a nightmare to work with in other languages. Including C++ for cross platform desktop apps where cross compiling became a problem too.

JSON in C is unfortunately way harder than in other modern languages (e.g. Go which makes it a breeze with struct tags and a great stdlib).


Surely the technical requirements of my specific use case are applicable to any use case.


The problem I see with JSON is its limited set of “native” types. I really wish it had specified support for proper numeric types (int, uint, various widths) and not just doubles. A timestamp type would be great as well.

What I really like about Protocol Buffers is that you must write a schema to get started. No more JSON.stringify anything. Everything else sucks though.


I think we could remove about a quarter of all Javascript programming time if JSON had a native Date type.


Hi there, I am the primary maintainer of the PHP library as of the last few years. I have heard that there used to be a lot of crashes; the code was almost completely rewritten in 2020 and is in a much better state now. If you find a segfault and you have a repro, file a bug and we will fix it.


I recommend Capnproto. Parsing time is zero, you can pretend you're a Microsoft programmer in the early 90s and just use the in-RAM struct as your wire format. Maybe it doesn't make sense for in-browser JS applications (though WASM is a different story) but for IPC and RPC in the general case, all parsing and unparsing does is generate waste heat.

ALWAYS favor a binary format unless you have a really good reason otherwise.


Capnproto is designed by Kenton, a former Google engineer who did a lot of work with protobufs at Google. I see Capnproto as the spiritual successor of protobuf, fixing many issues in protobufs.

Also, Capnproto is quite extensively used in some Cloudflare products.


I like protobufs but I was also disappointed at the JS protobuf options. I disliked both the JS object representation and RPC transport.

grpc-web in particular requires an Envoy proxy which seems absurdly heavyweight. I ended up using Twirp because Buf connect wasn't yet released or planned.

I rolled my own JS representation. The major differences from Connect:

- Avoid undefined if the message is not present on the wire and use an empty instance of the object instead. For recursive types, find the minimal set of fields to initialize as undefined instead of empty.

- Transparently promote some protobuf types, like google.protobuf.Timestamp to a proper Instant type (from js-joda or similar library). This makes a surprisingly large difference on reducing the number of jumps from the UI to the API.


What about tRPC?


I would use tRPC if I used TypeScript in the backend. But I use PHP, so it's not viable.


your problem is that you're using PHP


Bad take. Modern PHP is great.


lmao


Why should usual developers use protobuf instead of json? You are just making your life harder

If using compression the size is in the same ballpark (protobuf can be between 20% and 50% smaller). For 99% of users it should not make a difference. https://nilsmagnus.github.io/post/proto-json-sizes/#gzipped-...


JSON/REST does not declare its schema, it's like talking about type vs dynamic typed language.


A subjective opinion, but it's much easier to read some documentation and checking maybe an OpenAPI spec than having to deal with protobuf.

You also have solutions like GraphQL that define a schema, or you can publish some kind of schema (a good thing to do) but use JSON instead of a binary format.


I prefer reading proto files with services over OpenAPI yaml. Here's the pet store example to compare.

- https://github.com/project-flogo/grpc/blob/master/proto/grpc...

- https://github.com/OAI/OpenAPI-Specification/blob/main/examp...


Type checking and schema are two different things.

Both your application and documentation will assume certain state to be true, but only with type checking you can actually verify if it's correct.


Protobuf also does not declare its schema. Message parsers can be generated from a schema, but that's also true for REST over JSON. Even ad hoc REST APIs often have better self-declaration of resource types than protobuf.

(I still like protobuf, but the schemas are a terrible reason to like it.)


It has everything to do with automatic validation on both sides and little to do with the transfer size.


But you can do automatic validation fairly easily with JSON Schema. You don't need to choose a binary format to get validation.

The principle benefit is that you can use the schema to define the data format, which means you can pack the data in more tightly (you don't need a byte to say "this is an object" if you know that the input data must be an object at this point). That's a big benefit in certain situations, but if you're using this sort of stuff just to get validation then you're probably better off using JSON Schema and having a wire transfer format that you can read easily without additional tools.


Introducing a binary format for payload validation is like shooting yourself in the foot because you have an itch.


I disagree, especially if your goal is to reduce the payload size. Another point to consider is that not all programming languages deal well with JSON, adding support for it can be just as painful.


Any programming language you can teach to process protobuf or other binary formats you can teach to process JSON. The hardest part of the JSON spec is UTF-8 support and if your programming language doesn't have strong UTF-8 support in 2022 you have other, presumably larger problems.


For 99% of users it should not make a difference.

The link you included shows that protobufs are at least 15% better for all users, and as much as 57% better for cases where the data is small. Doesn't that mean for 100% of users it will actually make a difference?

Your users might not care about the difference but it will be there.


Usually when visiting a website, saving a few kilobytes on the client side on requests to backend does not make any difference.


Agreed. ProtoBufs slows code iteration velocity tremendously. Saving 100 milliseconds on page load while reducing developer efficiency by 30% = a net worse product for end users.


A feature that never ships has value for 0% of users.

Actually realizing that speed up for your users will take time away from delivering features.

Engineering is a trade off, always will be.


Haha awesome response.


> 57% better for cases where the data is small

You don't optimize things for the cases when they are fast. (Unless the gain is a couple of orders of magnitude; certainly not for a 50% speedup.)

The 15% gain is the one that matters. On practice, it comes at the expense of a more complex (thus larger, negating some of it) and less reliable system. It is very rare that this trade-off is worth it.


You'd also have to compare this against the download size of the protobuf library itself.


protobuf is much more concise and readable than OAS. You can define API contracts in protobuf and still serve JSON APIs via the standard-ish gRPC/JSON transcoding enabled by google.api annotations.


To talk to a server that doesn't speak json.


This only makes sense if you have a server that someone else put together that for some reason only speaks protobuf. I'm not aware of any language ecosystem that has protocol buffers but no json support, so if you're building a server from scratch this isn't a good reason to use protobufs.

And if you are faced with a server that only speaks protobuf, the same question applies to the original devs: why did they make that decision?


For non-niche use cases that is a bad developer experience.

If you are designing your own solution that uses protobuf instead of JSON say goodbye to a range of useful tools that the whole industry uses. From testing to automation it will be harder at every step, and you will have to find custom solutions instead of usual no-customization solution that works OOTB with JSON.

It is a good way to frustrate your developers and generate sometimes brittle solutions related to testing/automation/infrastructure.


If PB becomes popular enough the tooling will materialize. The only difference is the schema needs to be discovered (somehow) with PB.


Choosing a technology because you're betting it will become popular and therefore better than it is now is a bad move (except for hobby projects). You should only choose to use a new tech if you will derive enough in its current form to justify the cost. Otherwise you may well end up stuck with technology that you were never happy with and that never hit the popularity thresholds you hoped for.


That's one of the core problems Buf solves.


The schema isn't part of the message. This is what people are complaining about when comparing it to json. Debugging is harder because it is impossible to know what is in the message without having the proto information available.

My suggestion/strawman is to add request/response debug headers that include this in a standardized way. Then tooling can start to pick this up (eventually). The developer experience can then start to approach json.


I would advise to be pragmatic and not risk. You are using it now, and my experience was worse with protobuf than with JSON.


I would advise you not to assume you know what pragmatism means for the problems of others.


For sure. There's a niche for protobuf. I was advising to be pragmatic regarding the tooling, since that was the context.


I am using it for sending data between game server and client. Encoding the messages in JSON would be just silly, although I wonder what is the standard in the game industry.


Protocol buffers are used in Dark Souls 3, Pokemon GO, Hearthstone and I'm sure many other games.


we use it at https://woogles.io for pretty much all communication (server-to-server and client-to-server). I do loathe dealing with the JS aspect of it and am very excited to move over to protocol-es after reading this article (and shaving off a ton of repeated code and generated code).


Moving to es?

Isn't it slower than protobufjs?


Your case is one of those 1% if you have a real time game where a fraction of a second is important.


Large blocks of data. (Eg 10,000 floats)

Otherwise personally json wins


nothing to do with the size, but with having robust schemata.


I keep trying to understand and use protobuf but every time I look at it and its API (this article included) I get more confused and have absolutely no idea how to implement it.

I can't tell whether I'm just dumb or a really terrible developer, or if the docs or the thing itself is really hard to use?


There are a few tricks to make them successful:

1. Your schema is the source of truth.

2. The protoc should generate code as part of your build (try not to check in generated proto code if at all possible).

3. Use generated code to output bytes/parse bytes (this depends on your HTTP/RPC library).

The other trick is that you should use the exact same (!) schema file for your frontend and backend projects. This means that changing it should trigger regeneration of generated code for your clients and servers and then run CI on them.

So if you accidentally introduce a breaking API change, the CI for broken client will fail before you deploy it.


> The other trick is that you should use the exact same (!) schema file for your frontend and backend projects. This means that changing it should trigger regeneration of generated code for your clients and servers and then run CI on them.

You do not need to have the exact same schema file, in fact protobuf is carefully designed to avoid needing this. You need to follow some rules about what to do when fields are added or removed:

* Generally, roll out the server side first then, once that is complete, start rolling out the client afterwards.

* If a field is added (on the server side), make sure that it can be ignored on the client side, so old clients are not impacted. For example, don't add a "units" field that changes the meaning of existing "temperature" field (previously had to be fahrenheit, now can be celsius or fahrenheit). Instead add a separate field "temperature_celsius" and send both. (You can always remove the old one later on the server if new clients don't need it and you have 100% finished roll out of clients.) Note that receiving unexpected field data is not an error in protobuf, so the extra field won't cause any problems so long as it's not a problem at application level.

* You can equally remove a field so long as the client isn't relying on it (in this case you may need to roll out client update first). More accurately (with proto3 syntax) it will appear as empty/zero so this needs to be OK.

* You can't change a field's type e.g. from integer to double (or from one message type to another, but just adding a field to a message according to the above is OK). If you want to do that, go through a controlled process of adding a new field with the new type you want then removing the old field.

* You are free to reorganise the order fields appear in the proto file but don't renumber the fields - the field number is what defines it in the binary encoding. In particular, if you remove field number 2 (for example) you should leave a gap (fields 1, 3, 4,... remaining) rather than renumbering the remaining ones to be contiguous.

Depending on the application, it is often actually a good idea to have a completely separate copy of the proto file in the client and server applications, with the client proto typically lagging behind the server one.


I can empathize. I was the same way at first. What is it that you find confusing? Perhaps we can help clear it up or link you to helpful documentation (or improve our own docs).


Maybe what could be added is a debug header when using grpc. If it is present, the proto schema is sent with each request / response. Then the tooling can be enhanced to look for this.

I suspect this would not be much heavier than json so it could be always left on for those who are ok with the overhead.

Win win?


Protobufjs is good, but I can't use it because it's only a protobuf library, not a gRPC library. I end up having to use grpc-web, with all the problems it comes with.

I was hoping Buf could solve that problem... Maybe in the future! :)


They already have! Connect (https://github.com/bufbuild/connect-web) is what you're looking for, as it's grpc-web compatible.


The same reason along with the fact that you had to generate code, as well as usually needing to convert it to a class afterward was the reason I wrote my own typescript-native binary serializer[0] (mostly based on C-FFI for compatibility) a few years ago.

[0]: https://github.com/i404788/honeybuf


Shameless plug to my project Phero [0]. It’s a bit like gRPC but specifically for full stack TypeScript projects.

It has a minimal API, literally one function, with which you can expose your server’s functions. It will generate a Typesafe SDK for your frontend(s), packed with all models you’re using. It will also generate a server which will automatically validate input & output to your server.

One thing I’ve seen no other similar solution do is the way we do error handling: throw an error on the server and catch it on the client as if it was a local error.

As I said, it’s only meant for teams who have full stack TypeScript. For teams with polyglot stacks an intermediate like protobuf or GraphQL might make more sense. We generate a TS declaration file instead.

[0] https://github.com/phero-hq/phero


tRPC is another similar library.

https://trpc.io/docs/v10/quickstart


There’re some key differences though, one being you can use plain typescript types to define your models, instead of a validation lib like zod :)


Zod gives you a lot more of control on the schema.


That’s an interesting point, can you give an example of that?


You can use any custom validation there is for any field.


I really wish someone would create a fork/variant of Protocol Buffers for all the folks that are not afraid of versioning message schemas. With actual guarantees as to what is in the message and what isn’t.

I recently had to select a suitable data serialization format for a sort-of-distributed application. The goal was to force teams to declare/discuss contracts up-front, so schema-based. It was a very frustrating experience. All the formats with expressive schemas and type systems somehow don’t have a concept of required fields, or not anymore. How can you create a robust system like this?


Protocol buffers used to have required fields. I worked at a company that kept using an old version specifically to keep having them.

I’ve also talked to Googlers about this. My understanding is that having required fields created a robustness problem for them.

My guess is that it’s the same situation as for basically everything else: the problems Google is trying to solve for, and the constraints on the solutions to those problems, are just different from what the rest of us are dealing with. That doesn’t make protocol buffers bad, but it maybe does make the level of mindshare Google gets in this corner of the technology world bad.

That said, 100% agreed. Proto2 solves for that one use case (required fields), but not others such as "I want to talk to this API from the browser." I would love to have a solution that prioritized being easy to implement and support over being as heavily engineered - and, consequently, difficult to understand and use effectively - as FAANG-scale technologies tend to be.


Requirements change. I’ve seen 10+ year old proto files at google still being used by services that are still being actively developed.

Over those ten years your data model has evolved. Your first client, the one whose data model you were imitating when you added the required field, was deprecated five years ago and turned down last year. You have a dozen or a thousand client services, each using you as a backend in a slightly different way. Are you sure every single one of them is going to require that field? Are you sure the field will even be semantically meaningful for their use case?


Those are exactly the kinds of things I was thinking of when I suggested that proto is designed for Google scale.

It’s not that smaller companies never have these problems. It’s that, at smaller companies, the ways in which they manifest themselves and the cost/benefit ratios tend to favor different solutions to these problems. For example, the company I was at that stuck with Proto2 so they could keep required fields, all the engineers worked in a single room, and could resolve questions about the needs of all a protocols consumers by simply standing up and saying, "Hey everybody, …"


My understanding is that the powers that be within Google have decided that validating messages is outside the scope of schemas and serialization. protoc-gen-validate provides a portable way to perform validation: https://github.com/bufbuild/protoc-gen-validate

The problem with required fields is it kicks the can down the road when you want to deprecate a field. Keeping everything optional is much, much better for everyone in the long run.


The problem is with default values: they are not sent over the wire. You cannot determine whether that false was deliberate or the sender just forgot to set it to true.

I understand that "elastic" contracts may make some stuff easier. They do not help in forcing developers to create a message correctly, unfortunately.

Still, it's great to see someone is tackling the validation rule topic. One of my stakeholders is very… enthusiastic about validation. Just goes to show how bad software engineering is in practice in this org.


Er no, that's not true.

If you set the value of a field, then it will be serialized with that value. It doesn't matter if the value is the default for that field.


No, it is, at least for C# and the default/"official" code generator.

The docs (proto3) say this: “Also note that if a scalar message field is set to its default, the value will not be serialized on the wire.”


That's true for "singular" fields, but not "optional".

https://developers.google.com/protocol-buffers/docs/proto3#s...

If you don't like that, don't use "singular".


optional or repeated are not semantically correct for a required field though. I’d rather not pollute the contract this way. It doesn’t lend itself to automatically generating documentation from the schema either.

Guess I’ll take another look at proto2 then.


Validation is important, but the serialization layer is the wrong place to put validation logic.

A protobuf is a low-level abstraction, like a struct or record type in your favorite programming language. You want validation logic to be a separate layer on top. You don't want it so coupled to parsing/serialization that you literally cannot parse/serialize something that doesn't validate.

Protobuf has a rich facility for adding custom annotations to anything (messages, fields, etc). These annotations are the right place to put validation predicates. That will let anyone build a validation layer on top of protobuf, for example: https://scalapb.github.io/docs/validation/


In general, I agree. The problem is that serialization (at least with proto3) is “lossy”. As I mentioned elsewhere in the discussion, proto3 messages discard certain information in the name of efficiency. The end result (message semantically invalid) does not change, of course, but the “why” could.


Yes, proto3 "singular" fields are a big problem. But now that proto3 supports "optional" fields (which remember the difference between unset and explicit 0) you can use true optional fields for all new messages going forward: https://developers.google.com/protocol-buffers/docs/proto3#s...


Proto2 has required.

> required: a well-formed message must have exactly one of this field.

https://developers.google.com/protocol-buffers/docs/proto#sp...


Proto3 gained "required" back relatively recently. It's even in use in Opentelemetry proto files.


Thrift (both fbthrift and apache thrift) has required fields


You’re right! Dunno why I dismissed Thrift in my research, but I definitely missed this fact.


I'm seeing a lot of comments here along the lines of "why even bother with protobuf (just use JSON, you fools!), and gRPC is a pain anyway!" There are two very very critical things to know if you find yourself thinking either of those two things:

1. To me the compression that protobuf offers, while cool, is not what I find most important about it (or, rather, is not the feature that I like the most). What is the most important (for me) is that the sterilization _forces your server and client to have a typed API contract_. This is a pretty big deal. Imagine you're in a situation where you spent months laboring over your OpenAPI config, just to later realize that

> _oops! my OpenAPI config doesn't actually represent what my server is actually doing, and now my OpenAPI generated user documentation is wrong, and my TypeScript types I generated from the OpenAPI config are wrong too, which means I'll get runtime `TypeError`s on the frontend.

This is a pretty common situation for people to find themselves in, and it's a tough spot to be in because... well... you can't ever escape. Because no one is really reliably generating actual running servers from OpenAPI configs, you hit this situation reallllly fast. Protobuf, on the other hand, is legitimately a spec-first workflow (by force -> which to me is a very good thing). Take a look at Postman's state of the API report, and you'll see that even their customers are only doing spec-first 10% of the time (and, this is supposed to be the cream of the crop in cutting-edge spec-first development). Why? Because it's really really really hard to do in practice. But protobuf makes it easy. That's a big deal.

2. gRPC and protobuf are not as deeply tied as you might think. You can use protobuf without gRPC at all, and in fact a lot of people are doing this. You can even do it over WebSockets, or anything else. Unfortunately, because the two technologies are often mentioned in the same breath we tend to deeply associate them. Years ago I had a project I needed to pick a wire compression thing for (it was just for fun) and since I was tired of always just using messagepack over and over again, I evaluated other options. I picked FlatBuffers in the end, but I wish I could go back in time and ask my past self why I didn't pick Protocol Buffers instead. I KNOW I remember looking into protobuf, but I think I just assumed that in order to do protobuf well you had to also use gRPC and since I wasn't up for learning gRPC, I gave up on protobuf. That's the beauty of what tools like Protobuf-ES are changing -> they're making protobuf accessible to everyone.

..and besides, there are other things that protobuf unlocks(namely the "reflection" API (very much like GraphQL's introspection API if you're familiar with that), but also things like documentation generation and enforcement of backwards-compatible changes) but the above are the two most appreciated by me.


> What is the most important (for me) is that the sterilization _forces your server and client to have a typed API contract_.

I solved this by generating JSON schemas from my TypeScript files, and then enforcing those schemas on my endpoints.

Using the magic of code generation (actually just templated strings) I am then able to create client libraries in whatever languages are needed.

The unfortunate part of this is being limited by what TypeScript offers, no run time validation of the shape of fields, e.g. no way to say "this number has to be less than 120". JSON Schema is actually a lot more powerful than TypeScript, so going from TS->JSON Schema actually reduces expressiveness.

But then again, the vast majority of programming languages don't allow for actually good descriptions of types. :(


re: TypeScript doing "this number has to be less than 120".. As a fun aside in fact you _can_ write this kind of type today (since 4.5 when they added tail-recursion elimination on conditional types). You can even do things like `Range<80, 120>` to clamp to a range. If something like "Negated types" ever happens (https://github.com/Microsoft/TypeScript/pull/29317) it'll make even more options available.

Also, if you haven't checked it out, typescript-json-schema has some REALLY powerful validation it can do for things like your example (https://youtu.be/HHTDCY5uh_M?t=1379). You can do stuff like this

    export interface Shape {
       /**
        * The size of the shape.
        *
        * @minimum 0
        * @maximum 120
        * @TJS-type integer
        */
        size: number;
    }
and regarding the part about generating JSONSchema from TypeScript files. I totally know what you mean, I think that approach really has great potential, but there's still nothing _enforcing_ that contract. If you care about the frontend and backend always being in sync (i.e. through deploys of the backend/frontend, but also in situations like many people find themselves in with dashboard web apps where users many not often refresh their page through many backend and frontend deploys), then you'll probably benefit from a tool like protobuf or GraphQL or any others that tries to help with this validation.


> Also, if you haven't checked it out, typescript-json-schema has some REALLY powerful validation it can do for things like your example (https://youtu.be/HHTDCY5uh_M?t=1379). You can do stuff like this

Oh wow I didn't realize TS JSON Schema was capable of all that!

> but there's still nothing _enforcing_ that contract.

Sure there is, the schema on my REST endpoint. Express and AJV check all messages coming in.

It requires changes be non-breaking changes of course, new fields need to be optional, same as protobuf.

Removing a field means making it optional, removing it from all calling code, then removing it from the schema entirely. Real pain, but that is the price you pay if you want required fields. Only really doable if you can also control all the code, but since I'm working on internally facing services only, not an issue. :)


The protobuf javascript generator and runtime really shines when your stack is compiled using the closure compiler; it shrinks down to very little code. Unfortunately, very few people outside google use this.

The protobuf-gen-es plugin sounds nice, I'll definitely be checking it out.

However, I don't understand the desire for people to write protoc plugins in anything but C++, and I'm not even a C++ developer (I much prefer js/ts for many things). Instead of a single c++ source file with a builtin cool templating library which compiles trivially to a tiny binary, you inherit a huge dependency chain that includes the typescipt compiler and nodejs runtime. Totally unnecessary IMHO.


Tech Island.


Genuine question for someone who's used Protobufs in production. How do you avoid the sort of schema violation problems that CORBA and XML had back in their day? Are the schemas in each request?


We have rules around changing the schema. You can only add new fields and deprecate old fields. You can't rename or reuse fields. All new fields must be optional to make schema evolution possible.

In practice, this is pretty similar to how you maintain compatibility with old clients in JSON. But with protobuf, you have the schema file to make code reviewing this kind of change easier.


I don't know what schema violations you refer to, but XML has the issue that everything is a string until it isn't. The wire encoding of Protobuf has more type information, so I'd say it's half-including the schema compared to XML. Similar to ASN.1 binary encodings, but while ASN.1 (DER, CER) allows you to override the actual wire type, Protobuf always says an integer is an integer on the wire. That means you can do way more parsing and validation in library code without knowing the schema.

What's not in there is protection against violations such as sending multiple values for non-repeated fields (the latest value wins) or accidentally making non-backwards-compatible changes to the schema, e.g. reusing field numbers or changing types. The internal Google styleguide for Protobuf is a valuable resource for the collective avoidance of foot-firearms, and it's a bit sad it doesn't seem to be published externally.

I don't know of any existing linter that could check for backwards compatibility , but I am interested in one. Might break out one I wrote to a separate library. There aren't too many situations that commonly causes breakages. One reason may be that developer names (text identifiers) are separate from on-wire/persistent names (integer tags). This leaves some wiggle-room for improving your development environment by incrementally clarifying names without breaking backwards-compatibility with already compiled code.


> I don't know of any existing linter that could check for backwards compatibility , but I am interested in one.

Buf, from the OP, does this with `buf breaking`.


Predefined. Looking forwards to all the lies in this thread...

Theoretically, a protobuf schema allows addition and removals. In practice. schema-nazist will destroy your life and change whatever cause the only little thing that separates them from the peasants is that royal like power of making breaking schema changes in prod and pinning that incident on you. Your code did not even use that data probably.


Arrow Flight RPC (and Arrow Flight SQL, a faster alternative to ODBC/JDBC) are based on gRPC and protobufs: https://arrow.apache.org/docs/format/Flight.html:

> Arrow Flight is an RPC framework for high-performance data services based on Arrow data, and is built on top of gRPC and the IPC format.

> Flight is organized around streams of Arrow record batches, being either downloaded from or uploaded to another service. A set of metadata methods offers discovery and introspection of streams, as well as the ability to implement application-specific methods.

> Methods and message wire formats are defined by Protobuf, enabling interoperability with clients that may support gRPC and Arrow separately, but not Flight. However, Flight implementations include further optimizations to avoid overhead in usage of Protobuf (mostly around avoiding excessive memory copies

"Powered By Apache Arrow in JS" https://arrow.apache.org/docs/js/index.html#powered-by-apach...


What's with your slightly on-topic but mostly off-topic comments? Your post history almost looks like something written by GPT-3, following the same format and always linking to a lot of external resources that only briefly touch the subject.


It looks like a bot which scans a comment, identifies some buzz phrases, and quotes those lines, replying with generic linked information (e.g. wikipedia) about those phrases.


No.

Read: https://westurner.github.io/hnlog/

If you have a question about what I just took the time to share with you here, that would be great. Otherwise, I'm going to need you on Saturday.


These are called citations to academics. Without citations, one is just posting their opinions on the internet.

I prepared all of these buddy. These links are very relevant.

Given the form of the way I've been accused, I won't be spending time defending anything for yas.


I find this discussion fascinating as my only protobuf experience is with IoT devices on bandwidth-limited connections where every byte matters, so hearing JS web frontend people talk about the same protocol to transfer "large chunks of real time enduser data" is an interesting far opposite experience.

Something "interesting" in the overall IoT protobuf experience is software licensing.

I can write software under, GPL, or whatever, to access or otherwise interoperate with a REST API implemented using, BSD, or whatever, no problem as long as it follows reasonably close to generic REST standards. Just look at your swagger API docs or whatever system you use, it'll interoperate just fine.

Ditto "most JSON stuff" just write your proprietary code to generate a JSON looking like some text documentation for a server under, perhaps, the MIT license, or whatever other license, and interoperability across licenses is easy and legal.

Things get weird with protobuf .proto3 files where the server and ANY clients need to use the same .proto3 file to compile to whatever language they'd like to use. So what license do YOU apply to your .proto3 files to make sure other people can (or intentionally cannot) redistribute your .proto3 or compile software against it, etc?

You can, intentionally or accidentally, implement all kinds of lockdown and restriction on APIs using protobufs via innocent or intentional copyright and license selection for a .proto3 file.

For better or worse, protobufs are a tool invented primarily to enable the prevention of interoperability. If it was just about saving bytes on the wire we'd be doing IoT using zipped up JSONs or similar technology.


Very good to have a smaller runtime. Some of the stated benefits of the new generator are silly and the comparison with ts-proto doesn't point out the actual improvement (generating d.ts files means your CI doesn't spend time type-checking and compiling ts sources), but nice to see this new alternative.


Oh, nice!

Another issue with the standard implementation is that the TypeScript declaration files are missing lots of stuff. E.g. deserialization functions actually take a jspb.ByteSource (which can be Uint8Array, base-64 string and others), but the declaration files use Uint8Array.


While I enjoy protobufs, it's one of those things I feel I have to re-learn every time I write a new implementation. Never quite get to grips with them in the same way as, say, regex.


About time. Looking forward to using this.


It is unfair to compare ts-proto to this. ts-proto has a gRPC generator.

> https://github.com/stephenh/ts-proto


protobuf-es does as well! https://github.com/bufbuild/connect-web


Its totally not obvious if this is usable for a non "connect" server side implementation (Grpc.net etc)?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: