Hello. I didn't invent Protocol Buffers, but I did write version 2 and was responsible for open sourcing it. I believe I am the author of the "manifesto" entitled "required considered harmful" mentioned in the footnote. Note that I mostly haven't touched Protobufs since I left Google in early 2013, but I have created Cap'n Proto since then, which I imagine this guy would criticize in similar ways.
This article appears to be written by a programming language design theorist who, unfortunately, does not understand (or, perhaps, does not value) practical software engineering. Type theory is a lot of fun to think about, but being simple and elegant from a type theory perspective does not necessarily translate to real value in real systems. Protobuf has undoubtedly, empirically proven its real value in real systems, despite its admittedly large number of warts.
The main thing that the author of this article does not seem to understand -- and, indeed, many PL theorists seem to miss -- is that the main challenge in real-world software engineering is not writing code but changing code once it is written and deployed. In general, type systems can be both helpful and harmful when it comes to changing code -- type systems are invaluable for detecting problems introduced by a change, but an overly-rigid type system can be a hindrance if it means common types of changes are difficult to make.
This is especially true when it comes to protocols, because in a distributed system, you cannot update both sides of a protocol simultaneously. I have found that type theorists tend to promote "version negotiation" schemes where the two sides agree on one rigid protocol to follow, but this is extremely painful in practice: you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code. Inevitably, developers are pushed towards hacks in order to avoid protocol changes, which makes things worse.
I don't have time to address all the author's points, so let me choose a few that I think are representative of the misunderstanding.
> Make all fields in a message required. This makes messages product types.
> Promote oneof fields to instead be standalone data types. These are coproduct types.
This seems to miss the point of optional fields. Optional fields are not primarily about nullability but about compatibility. Protobuf's single most important feature is the ability to add new fields over time while maintaining compatibility. This has proven -- in real practice, not in theory -- to be an extremely powerful way to allow protocol evolution. It allows developers to build new features with minimal work.
Real-world practice has also shown that quite often, fields that originally seemed to be "required" turn out to be optional over time, hence the "required considered harmful" manifesto. In practice, you want to declare all fields optional to give yourself maximum flexibility for change.
The author dismisses this later on:
> What protobuffers are is permissive. They manage to not shit the bed when receiving messages from the past or from the future because they make absolutely no promises about what your data will look like. Everything is optional! But if you need it anyway, protobuffers will happily cook up and serve you something that typechecks, regardless of whether or not it's meaningful.
In real world practice, the permissiveness of Protocol Buffers has proven to be a powerful way to allow for protocols to change over time.
Maybe there's an amazing type system idea out there that would be even better, but I don't know what it is. Certainly the usual proposals I see seem like steps backwards. I'd love to be proven wrong, but not on the basis of perceived elegance and simplicity, but rather in real-world use.
> oneof fields can't be repeated.
(background: A "oneof" is essentially a tagged union -- a "sum type" for type theorists. A "repeated field" is an array.)
Two things:
1. It's that way because the "oneof" pattern long-predates the "oneof" language construct. A "oneof" is actually syntax sugar for a bunch of "optional" fields where exactly one is expected to be filled in. Lots of protocols used this pattern before I added "oneof" to the language, and I wanted those protocols to be able to upgrade to the new construct without breaking compatibility.
You might argue that this is a side-effect of a system evolving over time rather than being designed, and you'd be right. However, there is no such thing as a successful system which was designed perfectly upfront. All successful systems become successful by evolving, and thus you will always see this kind of wart in anything that works well. You should want a system that thinks about its existing users when creating new features, because once you adopt it, you'll be an existing user.
2. You actually do not want a oneof field to be repeated!
Here's the problem: Say you have your repeated "oneof" representing an array of values where each value can be one of 10 different types. For a concrete example, let's say you're writing a parser and they represent tokens (number, identifier, string, operator, etc.).
Now, at some point later on, you realize there's some additional piece of data you want to attach to every element. In our example, it could be that you now want to record the original source location (line and column number) where the token appeared.
How do you make this change without breaking compatibility? Now you wish that you had defined your array as an array of messages, each containing a oneof, so that you could add a new field to that message. But because you didn't, you're probably stuck creating a parallel array to store your new field. That sucks.
In every single case where you might want a repeated oneof, you always want to wrap it in a message (product type), and then repeat that. That's exactly what you can do with the existing design.
The author's complaints about several other features have similar stories.
> One possible argument here is that protobuffers will hold onto any information present in a message that they don't understand. In principle this means that it's nondestructive to route a message through an intermediary that doesn't understand this version of its schema. Surely that's a win, isn't it?
> Granted, on paper it's a cool feature. But I've never once seen an application that will actually preserve that property.
OK, well, I've worked on lots of systems -- across three different companies -- where this feature is essential.
Sounds like you're pretty peeved about this, is your manager on you because your peers are out-delivering you?
You realize even if you're onboard the agentic coding hype train, you don't have to just blithely paste tickets to the agent and let them rip. You can have a long conversation about design and architecture, and have them write their own implementation plan based on that, then watch them ticking items off the list and review code for changes as they're completed while the agent forges ahead. A lot of times you don't even need to have a long conversation about this stuff, just write a readme that very clearly outlines what you're doing and how to do it, and the agent will read it and do just fine.
It's frustrating that this thread seems to be focused so heavily on people sitting around resting and vesting.
Having been inside Google (and multiple other FAANGs) this is generally untrue, and focusing on this element of the problem misses a much larger productivity problem:
Most engineers at Google aren't "sitting around doing nothing", they are very busy shipping projects that do not matter. Their days are filled with doing work that will not move the needle on any metric that matters to the company, but they are far from idle.
The misallocation of labor is a far bigger problem than said labor slacking off, and management must own it.
Google doesn't need their engineers to fly into startup mode, work 12 hour days, or never surf Reddit on company time. Their labor is severely under-utilized because they are assigned to zero/negative-impact projects or duplicative projects (hey, somehow you gotta ship 5 chat apps at the same time, right?)
Part of the problem is that Google's upper management refuses to engage with the product at all. Entire orgs are given very broad OKRs like "increase DAUs by 10%" without virtually no guidance as to what features management is interested in. Authority to ship features also rests close to the leaf nodes of direct line-managed teams. The expectation is that teams are entrepreneurial and invent features, implement them, and ship them all without direct upper management involvement.
The result is a bunch of bad product that doesn't do anything positive for the company, were never soberly evaluated by upper management prior to building, and would never have passed the smell test if it did. This, above all other factors, is why Google produces so much product that it then has to scrap. This is the main cause of Google's low labor productivity - not because people are sitting around drinking coffee and eating free food - but because they are assigned to projects that do not pass muster, and there is an almost-comical aversion to validating product ideas before they are implemented.
The single biggest thing Google can do to improve its labor productivity isn't cracking down on slackers, it's forcing its management to actually engage with product definition so entire orgs don't burn years on things that don't matter.
As a maintainer, it is his choice which patches to accept. If you're not happy with his decisions, choose another project, fork it, or pay someone to do it for you.
"I started a manufacturing company in Little Elm, about 35 miles north of Dallas, to produce the first-ever automatically retracting syringe to eliminate the risk of nurses contracting HIV through accidental needle sticks. The syringe received rave reviews from nurses, hospital executives and public health officials, a major grant from the National Institutes of Health and robust private investment. But when my partners and I tried to sell it to hospitals, we were told time and time again that even though it was a better product — a lifesaving product — they weren’t able to purchase it. The primary supplier of syringes, which controlled 80 percent of the market, structured an arrangement with a vast network of hospitals that essentially closed our industry to new firms for good."
A market in which buyers are not free to choose better products is not a free market.
A market in which new entrants cannot compete fairly against established players is not a free market.
A market in which innovators have to get permission and pay established players for "access" (think ISPs) is not a free market.
And yes, a market in which economic and political power is concentrated in large corporations geographically clustered in a handful of giant metropolitan areas... is also not a free market.
Those corporations have both strong incentives and the means to change the rules of competition to their advantage.
This article appears to be written by a programming language design theorist who, unfortunately, does not understand (or, perhaps, does not value) practical software engineering. Type theory is a lot of fun to think about, but being simple and elegant from a type theory perspective does not necessarily translate to real value in real systems. Protobuf has undoubtedly, empirically proven its real value in real systems, despite its admittedly large number of warts.
The main thing that the author of this article does not seem to understand -- and, indeed, many PL theorists seem to miss -- is that the main challenge in real-world software engineering is not writing code but changing code once it is written and deployed. In general, type systems can be both helpful and harmful when it comes to changing code -- type systems are invaluable for detecting problems introduced by a change, but an overly-rigid type system can be a hindrance if it means common types of changes are difficult to make.
This is especially true when it comes to protocols, because in a distributed system, you cannot update both sides of a protocol simultaneously. I have found that type theorists tend to promote "version negotiation" schemes where the two sides agree on one rigid protocol to follow, but this is extremely painful in practice: you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code. Inevitably, developers are pushed towards hacks in order to avoid protocol changes, which makes things worse.
I don't have time to address all the author's points, so let me choose a few that I think are representative of the misunderstanding.
> Make all fields in a message required. This makes messages product types.
> Promote oneof fields to instead be standalone data types. These are coproduct types.
This seems to miss the point of optional fields. Optional fields are not primarily about nullability but about compatibility. Protobuf's single most important feature is the ability to add new fields over time while maintaining compatibility. This has proven -- in real practice, not in theory -- to be an extremely powerful way to allow protocol evolution. It allows developers to build new features with minimal work.
Real-world practice has also shown that quite often, fields that originally seemed to be "required" turn out to be optional over time, hence the "required considered harmful" manifesto. In practice, you want to declare all fields optional to give yourself maximum flexibility for change.
The author dismisses this later on:
> What protobuffers are is permissive. They manage to not shit the bed when receiving messages from the past or from the future because they make absolutely no promises about what your data will look like. Everything is optional! But if you need it anyway, protobuffers will happily cook up and serve you something that typechecks, regardless of whether or not it's meaningful.
In real world practice, the permissiveness of Protocol Buffers has proven to be a powerful way to allow for protocols to change over time.
Maybe there's an amazing type system idea out there that would be even better, but I don't know what it is. Certainly the usual proposals I see seem like steps backwards. I'd love to be proven wrong, but not on the basis of perceived elegance and simplicity, but rather in real-world use.
> oneof fields can't be repeated.
(background: A "oneof" is essentially a tagged union -- a "sum type" for type theorists. A "repeated field" is an array.)
Two things:
1. It's that way because the "oneof" pattern long-predates the "oneof" language construct. A "oneof" is actually syntax sugar for a bunch of "optional" fields where exactly one is expected to be filled in. Lots of protocols used this pattern before I added "oneof" to the language, and I wanted those protocols to be able to upgrade to the new construct without breaking compatibility.
You might argue that this is a side-effect of a system evolving over time rather than being designed, and you'd be right. However, there is no such thing as a successful system which was designed perfectly upfront. All successful systems become successful by evolving, and thus you will always see this kind of wart in anything that works well. You should want a system that thinks about its existing users when creating new features, because once you adopt it, you'll be an existing user.
2. You actually do not want a oneof field to be repeated!
Here's the problem: Say you have your repeated "oneof" representing an array of values where each value can be one of 10 different types. For a concrete example, let's say you're writing a parser and they represent tokens (number, identifier, string, operator, etc.).
Now, at some point later on, you realize there's some additional piece of data you want to attach to every element. In our example, it could be that you now want to record the original source location (line and column number) where the token appeared.
How do you make this change without breaking compatibility? Now you wish that you had defined your array as an array of messages, each containing a oneof, so that you could add a new field to that message. But because you didn't, you're probably stuck creating a parallel array to store your new field. That sucks.
In every single case where you might want a repeated oneof, you always want to wrap it in a message (product type), and then repeat that. That's exactly what you can do with the existing design.
The author's complaints about several other features have similar stories.
> One possible argument here is that protobuffers will hold onto any information present in a message that they don't understand. In principle this means that it's nondestructive to route a message through an intermediary that doesn't understand this version of its schema. Surely that's a win, isn't it?
> Granted, on paper it's a cool feature. But I've never once seen an application that will actually preserve that property.
OK, well, I've worked on lots of systems -- across three different companies -- where this feature is essential.