After working (internally) with Ion and related tooling, I'd say that I was opposite of fan. Protobuf strength is in good tooling/codegen around it (especially inside of Google), and with Ion you just have "superset of json".
Ion never had nice code wrappers around serialized structures, and most of the time, especially with rich structures it was frustrating experience.
In 2013 I was generating code from SDL definitions which could also be used for data validation during serialization/deserialization. I had plenty of “model” packages which were just SDL definitions, build config, and maybe some unit tests to validate the schema constraints. (Edit: SDL is “schema definition language” which was a schema definition tool written for Ion with definitions in Ion.)
These were used in services and reactors which never touched raw Ion (at least not in any way different from Coral or BSF).
Full disclosure, I spent a lot of my free time working on Ion, both the supported implementations as well as my own. The additional data types are worth it alone, imho. Having to use JSON for most things now I’m frustrated at what is “missing”.
I built the first version of IntelliJ plugin[1] to make working with the reactor stuff easier. Doesn't look like there have been much improvements to it.
Might it be though? Protobuf's tooling seems like a byproduct of the fact that you can't read protobuf and it's strict and type safe enough that you can generate lots of things.
Ion is readable and (seemingly) not very strict about schema. Seems like that would not readily incentivise additional tooling.
if it is an "easy to produce or consume in language X" it does not mean it is canonical - it means that language X has an extension that allows to do so. Is there a place in protobuf spec or documentation mentioning this to be a part of the protocol?
If you're working against a schema that means presumably there is a schema, and that defeats essentially the whole purpose of using a self-describing format like Ion. At that point, use something like protobuf that is schema-ful.
I think that talking about Ion, without talking about PartiQL, is not setting people up with proper context.
PartiQL is AWS's specification for a parser/query language that is compatible with standard SQL, but can query semi-structured or unstructured data (think JSON, Parquet, CSV/TSV etc)
I looked pretty deeply into this, but failed a bit short of understanding what they meant when "if your query engine supports PartiQL." Does that mean writing a new DB that delegates incoming queries to PartiQL? Not sure.
Anyways, they use it in Quantum Ledger DB, and a few other internal projects:
ion precedes PartiQL by many years, maybe even a decade. The sole reason ion exists is to make parsing json faster and less ambiguous so that a few specific edge cases are handled efficiently. So far so good, right?
The problem is that it spreads, like an infection, to surrounding services. Inside Amazon there are literally hundreds of libraries that duplicate standard json libraries in various languages but support ion instead of json. All of this is just to deal with interoperability.
Ion is slower than protobufs and less universally understood than json. Honestly it's just an annoyance.
Yeah It’s more like a spec, they expect you to write the query optimizer, analyzer basically a new query engine. They offered a document based “reference” implementation on Kotlin though so I think expert could follow it.
Point is AWS might not make up its mind yet on whether this does more harm to their DB business or not
You can query anything with this, exactly the same way. It's kind of wild I've never heard it talked about tbh.
I went looking for solutions to multi-datastore querying when I fiddled with a business intelligence side project. Pretty useless to only be able to query one type of data, and too time consuming to implement individual mappings.
Apache Calcite and Metabase Query Language (quite the exact usecase there for Metabase, haha) were the only things I could find.
The communication gap between Businesses using COBOL and Academic institutions teaching IT is the reason for the hatred for the language among young computer graduates. Whether you love or not you live with and support your family, so teach the youngsters to love COBOL, for their grandmother is not going to die any sooner!
How does this compare to protobuf, thrift, msgpack etc?
It’s roughly the same vintage as protobuf and thrift, from google and Facebook respectively, so perhaps it’s just Amazon’s equivalent, which they just never released as quick as the others did?
Obvious pros and cons, or yet another serialization format with no obvious benefits over anything else?
Just from reading their page and being familiar with the formats you mentioned:
vs. protobuf: ion is self describing, vs needing a schema
vs. thrift: similar, thrift needs a schema to interpret a binary file
both thrift and protobuf are really binary formats, though they have a canonical textual representation, it's not actually used to serialize. Sounds like ion supports serializing as text as a first class concept.
vs. msgpack: ion has a corresponding text format, whereas msgpack is only binary. Additionally, ion has a symbol type, msgpack doesn't.
I think the biggest benefit here is that it's a new chance for a format that fixes some of json's rough edges to gain critical mass. There's probably nothing ultra special about it that hasn't been solved in other formats, but maybe the timing will be right and everyone will just adopt it as a json replacement (sort of how people just gave up on xml and switch to json seemingly overnight). It's impossible to predict stuff like that.
Edit: upon noticing that it was released in 2016, it seems less likely everyone will jump on the ion bandwagon ...
If I'm not mistaken, there were plenty of text protobuf files internally used for a lot of things, and much much less anything less (okay, xml was prevalent for our team, maybe due to being java-inclined). Even seen examples of text protos pushed through the command line (it's possible, but need to get it right)
There are some painpoints that are being addressed:
1) timestamp : I have had issues with a round-tripping timestamp representation quite a bit
2) decimal : currency is denoted in decimal rather than float and shows the Amazon retail heritage. This is very useful.
3) symbols : I've had cases where symbol table/dictionary would have made big difference in serialized size
Re time stamp and decimal, probably no surprise that it is used heavily by QLDB, where having a very clear time for a change is important and a common use case is logging debits and credits as a financial ledger.
I don't know. It was common knowledge for me in college (as in it was taught as part of the curriculum) but as far as I can tell in the intervening 30+ years that knowledge seems to have been lost and relearned many times over.
cash values should be represented in fixed precision to maintain the integrity of the transaction and your book, while the prices for securities represent something different.
In securities transactions, the quantity and quote are critical. You aren’t buying securities from Plaid, right?
If you try to liquidate or resize based on the Plaid quote, your brokerage or counterparty is going to provide a totally different quote, and one from a system engineered to provide quotes aligned exactly to the market standards.
It seems much more directly comparable with CBOR/JSON as they mention it a lot https://amzn.github.io/ion-docs/guides/why.html#dual-format-... . I use CBOR quite a bit. It sounds like it doesn't really offer too much different in the binary form other than in the textual form it maintains better types than JSON and the textual version matches the binary version (where JSON / CBOR are mismatched in terms of types). So, seems nicer as a cohesive textual/binary format. I'd be interested in seeing how well packed the data is in Ion vs CBOR.
As others have already pointed out, this was released in 2016 and already discussed on HN [0], and seemingly hasn't taken the world by storm since. But just glancing at the amzn Github activity, and it looks like the docs and the tooling [1] are recently and frequently updated (including a new CLI in Rust [2])?
Can anyone currently at Amazon shed some light on how prevalent Ion is internally?
I left Amazon a bit over a year ago, after being there seven years. It always struck me as a combination of "not invented here" syndrome and a solution in search of a problem. It has no real world benefits over JSON, the tooling is limited, but you inevitably have to deal with some other team that regrets choosing it and now it's their API. I'm so happy I never have to look at it ever again, and seeing this post today is a real throwback to wasted engineering effort. Just let out go, Amazon.
Depends on the part of Amazon but it is pretty prevalent in Retail. The fact that it is both binary and self describing makes it pretty good for data at rest. You can still parse and understand that archival transaction data from 8 years ago.
The support for S-Expressions is both a blessing and a curse. The ability to write logic with native data structures in it is fundamentally interesting, but it leads to lots of reinvention of somewhat crappy Lisp implementations.
The tooling ecosystem has been slowly improving outside of JVM, particularly the latest JS implementation.
In a vacuum, the support for type annotations, timestamps, decimals and binary serialization make it superior to JSON for use cases where self describing data is appropriate.
Looks nice. I saw that there is no PHP implementation yet. Doing it and publish it on Github would give me something, besides a "kudos" from Amazon? I am not asking for a position at Amazon, but maybe an interview?
The easiest way to get an interview at Amazon is to get a referral. If you can demonstrate competent programming abilities and have a half decent attitude, it shouldn't be too hard to get a referral from someone at Amazon, regardless of what projects you have under your belt.
I was thinking about spinning up a support library for Haskell.. but it’d be a pretty serious investment of time when everyone’s employment is cut back or up in the air already. It would be nice to get a crate of sanitiser or something.
Who's going to maintain it? Are you just doing that for an itw or are you offering real support to the library? That's the reason why people are paid to work on software vs someone on its free time.
Now I'm confused, are you saying Amazon uses Hack internally which compiles to PHP? The Hack website doesn't have much info and I'm not familiar with it. There's clearly an Amazon Github repo for an an AWS SDK written in PHP, but you're adamant that Amazon does not use PHP at all. So which is it?
But if you take the initiative to open source a client library in PHP and it gets the attention of AWS it absolutely could result in an interview.
If you are interviewing and you whiteboard your solution in PHP they won't hold it against you. The language is less important than the concepts. Granted, if the only language you know is PHP that could be a risk in your career. I think that holds true for any developer, though.
Interesting they don't have a kotlin or swift version. Do their iOS clients just communicate with plain json? Are they all secretly written in javascript?
> The following timestamp encoded as a JSON string requires 26 bytes
> ...
> This timestamp requires just 11 bytes when encoded in Ion binary
So, we just use JSON, and our solution to this problem has been to pass 64 bit unix timestamps around. It doesn't provide arbitrary precision, but for most use cases it is more than enough practical range & precision to get the job done. And of course we store & transmit everything as UTC, so there is no weirdness around needing to store additional timezone information. To give you an idea, our database columns are named things like CreatedUnixTimestamp.
It is also trivial to compare 64-bit timestamps without conversion, so any SQL storage of these as integers should yield massive speedups to queries against these types - Assuming you are coming from some more complex datatype like a string or byte array.
> So, we just use JSON, and our solution to this problem has been to pass 64 bit unix timestamps around.
Passing an integer does not have the same semantics as passing a timestamp. Relying on out-of-band info to parse a document is a problem in the making.
> but for most use cases it is more than enough practical range & precision to get the job done.
Parsing s-expressions would also get the job done, even if it's a primitive s-expression that only supports cons cells and a string data type. However, people find value in enabling the parser to validate booleans, arrays, and objects.
ION is just a logical next step. Timestamps are quite naturally a fundamental data type in comm between web services, particularly in binary form.
Pass 64-bit Unix timestamps around as JSON numbers? That's a bad idea, seeing as they're 64-bit floats. You're better off formatting your 64-bit integers as strings.
53 bits of usable range is plenty for our purposes. Our serializer & database are not hobbled by the limitations of javascript, so the representation is only compromised as it is processed at the end client. This is not a concern for us.
For reference, MAX_SAFE_INTEGER can represent something around the year 285428751.
You can have whatever timezone you want if you store/transmit things as UTC. The final client device javascript should be the point at which the conversion to local time occurs, because the browser is best aware of the correct timezone.
Everything on the server is just done in terms of UTC. I actually cannot think of a reason I would want to process a timestamp in terms of local time on the server.
What has bugged me a lot with JavaScript that it lacked standard presentation of dates and decimals (like money), making it feel inferior for application development. Happy to finally seeing this addressed on both JavaScript tself and then also in serialisation formats.
(Though looks like Ion is not solely targeting JS, but I make an assumption it is nice to consume Ion data in frontend)
Nope, backend if anything. For example, their new QLDB product uses it to get consistent hashing of documents on account of Ion being a canonical format.
For the public API, customers want JSON, so they get JSON. Internally there's Coral, and something like Coral/Protobuf outright superior for the use case of an API where a schema can be distributed in advance. The only real use case for Ion is when you have data that's already JSON-formatted for whatever reason and you want to compress it for storage or transit.
I was hoping to see a UUID type, since so many people choose either unreadable base64 or wasteful strings. It looks like 0x12341234_1234_1234_1234_123412341234 should convey the bits, but it won't pprint or validate the way a dedicated type would. Ditto for IPv6 addresses.
An interesting point - I browse with JavaScript disabled. The example at the bottom of the page rendered for me without newlines, in a manner that meant the thing rendered in a completely unparsable way due to comments like:
// Field names
This experience has reminded me why JSON is such a great format.
And having a whinge while I'm writing, "superset of JSON" is basically false advertising even though it is true; JSONs refusal to admit that line breaks are a thing is a major feature. I don't care it if it is technically correct and useful to some customers, if line breaks matter it is inappropriate to talk about a format's relation to JSON because people will get the wrong idea. The JSON brand is so strong because it is nigh-impossible to get wrong. This format gets screwed up - eg, for people who don't like JS.
I think "superset" is a clear relationship. It means "legal JSON is legal Ion", just like "legal JSON is legal YAML". I don't think it's inappropriate to point that out. In fact, it's an excellent feature.
Any of the elements of an ISO 8601 time element can have decimal fractions added to any number of digits. But only the lowest element (according to Wikipedia because I don't have the actual standard in front of me). But you can definitely have a timestamp of 2020-07-23T12:37:55.758145Z
The standard acommodates timezones as offsets from UTC, because it's a representation of a timestamp, not a local time at a particular geographical location. So things like daylight savings time periods are not relevant.
This is awfully negative. JSON explicitly does not declare the represented range of floats or integers, and doesn’t have a distinct arbitrary-precision decimal type. I haven’t read the Ion spec, only the description, but since it’s advertising arbitrary precision, presumably any implementation that does not support that is not a correct implementation at all.
In practice that means having (or adding) support for arbitrary numbers and decimals in the languages/platforms they want to cover. I am skeptic they would do that in C for example.
If I recall correctly, Ion preceed even Google's protobuf, and is 20+ year old technology. This isn't result of "yet another standard" but parallel evolution
Ion is actually two formats, with Ion data having a canonical representation both in binary and in human-readable text. The text format's file extension is ".ion" and the binary format's file extension is ".10n", and I think that's the entire motivation.
Ion never had nice code wrappers around serialized structures, and most of the time, especially with rich structures it was frustrating experience.