Transit: JSON Data Interchange Format

lars512 · on Oct 14, 2015

I couldn't get the why of the project from the Github page alone. Rich Hickey's post introducing it a year ago is clearer:

http://blog.cognitect.com/blog/2014/7/22/transit

JSON has become the go-to schemaless format between languages, despite being verbose and having problems with typing. Transit aims to be a general-purpose successor to JSON here.

Stronger typing allows the optional use of a more compact binary format (using MessagePack). Otherwise it too uses JSON on the wire.

Anyone who knows more, please correct me.

gioele · on Oct 14, 2015

> JSON has become the go-to schemaless format between languages, despite being verbose and having problems with typing. Transit aims to be a general-purpose successor to JSON here.

XML has become the go-to schemaless format between languages, despite being verbose and having problems with typing. JSON aims to be a general-purpose successor to XML here.

brianberns · on Oct 14, 2015

Yes, except that XML has a schema (XSD).

nmeofthestate · on Oct 14, 2015

Ahem. http://json-schema.org/

orless · on Oct 14, 2015

Quoting http://json-schema.org/latest/json-schema-core.html: "This Internet-Draft will expire on August 3, 2013."

orless · on Oct 15, 2015

... which, by the way, does not mean that JSON Schema is bad or unusable. I actually quite like it, it has certaint elegance. Just not really standard.

tel · on Oct 14, 2015

That's an exterior thing. It's perfectly reasonable to talk about XML the schemaless markup language.

zcdziura · on Oct 14, 2015

History repeats itself, after all.

vocaro · on Oct 14, 2015

XML is not schemaless. It was designed to be extensible through schemas. It's in the name, after all: eXtensible Markup Language.

XML also doesn't suffer the problems of a limited set of pre-defined types that JSON has. Its type system is extensible through schemas.

dragonwriter · on Oct 14, 2015

> XML is not schemaless.

Yes, it is, though there are a variety of schema schemes that can be used with XML.

> It was designed to be extensible through schemas.

It was designed to be extensible through custom tags (in fact, all tags are custom tags); it can be optionally be restricted through the use of schemas, not extended.

> XML also doesn't suffer the problems of a limited set of pre-defined types that JSON has.

Yes, it does, with an even smaller number of types. (Strings, with a couple different representations.)

> Its type system is extensible through schemas.

No, it isn't. Its structure can be constrained through schemas (XML Schema, Relax NG, etc.), which can provide a means of modeling a different type system on top of the data structure supported by XML, but that isn't really "extending" XMLs type system any more than using a Haskell compiler to generate x86 machine code is "extending" the type system of x86 machine code; its just providing a completely separate type system on a different layer.

Also, no matter how you view this, its not a distinction between XML and JSON, since you can do the same thing with JSON, and just as there are a variety of different schema standards that do this for XML, there are a variety that do the same thing for JSON.

brandonbloom · on Oct 14, 2015

Another important feature is stream-oriented caching of values. Essentially some range of opcodes in the stream language are reserved for referring to a cache of recently seen values. It's a "dynamic" encoding feature that's super beneficial for common payloads, like JSON API responses with lots of similar keys. This is something a protobuf/thrift-like should steal, as it can often be more efficient than vanilla field-number + value encoding.

jimktrains2 · on Oct 14, 2015

> maps (with arbitrary scalar keys, not just strings)

Floating point keys just seem like a terrible idea.

dragonwriter · on Oct 14, 2015

> Floating point keys just seem like a terrible idea.

I don't see why floating point (with a defined precision) keys are any worse fundamentally than any finite-domain scalar keys. The applications for which they are useful may be limited, and they may be the wrong choice for some uses, but that's true of anything.

jimktrains2 · on Oct 14, 2015

> floating point (with a defined precision)

If it's fixed-precision it's not floating point, it's basically an integer we interpret with a decimal point somewhere when we display it. If you didn't mean fixed-precision, then I'm not sure how actual floating points could sanely act as a hash key.

dragonwriter · on Oct 14, 2015

> If it's fixed-precision it's not floating point

I'm not referring to fixed point, but fixed-size representation.

> If you didn't mean fixed-precision, then I'm not sure how actual floating points could sanely act as a hash key.

I don't see what the problem is. Certainly, there's problems if you do certain floating point calculations and blindly look for the result as a key in a mapping, but that's not a problem with floats as keys, its a problem with that particular usage pattern.

jimktrains2 · on Oct 14, 2015

> I'm not referring to fixed point, but fixed-size representation.

I'm still not sure what you mean as any IEEE floating point (which in reality all anyone uses) is fixed-size. If you decide to truncate, round, do w/e to this value, you may still not get exactly the same answer for similar computations or even the same computation on different machines. There isn't a reason you can't (well, shouldn't) do float_val == 0.0 -- it won't work in all circumstances.

Please explain your method of making sure float_val == 0.0 while retaining the essence of being floating point (i.e. not using fixed-point).

tel · on Oct 14, 2015

Even with a defined precision they're weird in that they "ought to" be modeling the reals and therefore lack computable equality.

Which makes it hard to write a well-defined hash.

khgvljhkb · on Oct 14, 2015

Am I the only one amazed by what the Clojure community and core team are conjuring up?

Doing client-side programming with things like CLJS, Figwheel, Reagent and core.async feels miles ahead of what we have in moden-js-land (es6/7, babel, webpack, React, promises).

If you were to start a startup today, would you be comfortable going with something like Clojure/script?

Mikera · on Oct 14, 2015

Absolutely happy with my choice of Clojure as a startup (doing data science, analytics, some front-end work).

Biggest wins for us are:

- ClojureScript / figwheel as an awesome front end development combination.

- The combination of functional programming with immutable data structures

- Lisp "magical powers" (macros, interactive REPL etc.)

- Ability to exploit the Java library ecosystem whenever you need it

Biggest downside = Lack of types.

escherize · on Oct 14, 2015

I run my startup on Clojure and cljs.

Everything you've mentioned has been awesome for us. The arrows we take as pioneers are things like having to roll libraries for email-as-a-service, stripe, etc. Granted it's usually easy but can be time consuming.

grayrest · on Oct 14, 2015

The Clojurescript community are ahead but not by that much. To list specifics, cljs->babel+and immutable lib, figwheel->react-hot-loader, reagent->react 0.14 pure components, core.async->js-csp (or async/await).

In terms of non-component organization, I believe re-frame is a significant improvement over redux. Reactions are good when you don't control the endpoints, splitting out the reducers into pure functions is good, but adding middleware on them is the real win. On the other hand, it wouldn't be that hard to adapt the model to redux.

The next phase of organization is integrating Relay/Falcor concepts. David Nolan gave a talk about this in Om Next at NYC Clojure last month and there is a video. Om Next as presented is very compelling if you're on Datomic and less so otherwise.

As to your startup question:

I've been a full time cljs dev on a b2b app at Reuters for the past 9 months. I took up the job specifically because I wanted to write cljs. I had been involved with the Clojure community (I care about state) but only working on toy projects for the previous ~4 years.

My experience with Clojurescript is that it was less of an improvement on modern js than I was expecting. The biggest advantages are protocols and the standard library being both rich and standard. Nice to haves but non-critical are native syntax for immutable maps and multimethods. I guesstimate I write ~10% less code in cljs versus js but you're ultimately writing the same stuff.

Problems I've run into:

Full build time for this app is long. Our app is in the 15k LoC range across ~150 files and cold compile is 140s on a 2012 MBA. It's annoying but incremental compilation times are sub-second after some build config tweaking.

We have one component in particular that tends to get lost when switching branches and the missing namespace forces a fresh compile. Our cljs version is from June so this may be fixed. I've also spent a number of hours debugging problems that turned out to be stale build issues.

I tried a couple times unsuccessfully to get Emacs (cider) to connect to a figwheel repl. After a few evals things simply become unresponsive. Just using figwheel is good enough but I miss the in-editor repl. Haven't tracked down the reason, could be my lack of emacs knowledge.

If you're using core.async, the main loop has a try/catch/rethrow. This causes Chrome Dev Tools to break in the outer loop instead of actually at the problem. You have to explicitly err.stack in the console (which is not source mapped) and don't have access to the locals unless you manually set a breakpoint at the error and reload. You also get to learn to read the JS representation of Clojure literals. None of this is impossible and if you're working in a tight loop you tend to have a pretty good idea of what the error is without jumping through debugger hoops but if you're doing something like switching branches or refactoring it's annoying.

I like Reagent but I've had a number of times where its behavior doesn't match my expectations. In particular, figuring out what part of the vdom is invalidated on a ratom change caused me problems. There's a gotcha that sequences must be forced with doall or you'll get weird behavior. At the moment I have a very expensive reaction (list processing ~6k items) that's getting run 8 times in response to a single key change in the source ratom so I'll be tracking that down tomorrow.

I don't consider this list a reason to not adopt Clojurescript. I can make a similar list for the Babel stack.

As for the question of would I be comfortable, I like writing clojurescript but I'd only really recommend it if you're committed to full stack Clojure. It takes a number of weeks for a new frontend hire to ramp up on the language. I've discussed this with the other frontend specialist on the team and our consensus is that cljs is a better language but we're not that much more productive in the language compared to ES2015 so I'm not really convinced the weeks of ramp up time are worth it. Our experience with hiring has been that we've had very few candidates but they've all been skilled.

Scarbutt · on Oct 14, 2015

The Clojurescript community are ahead but not by that much. To list specifics, cljs->babel+and immutable lib, figwheel->react-hot-loader, reagent->react 0.14 pure components, core.async->js-csp (or async/await).

Immutable, js-csp, etc... were all inspired by Clojurescript, I guess that's the point the grandparent was trying to make.

retrogradeorbit · on Oct 14, 2015

> If you're using core.async, the main loop has a try/catch/rethrow. This causes Chrome Dev Tools to break in the outer loop instead of actually at the problem.

OMG! Is that what is causing that behaviour?! It's frustrating (though not a show stopper) and I was unsure of the cause. Thanks for pointing that out. I will have a deeper look.

creshal · on Oct 14, 2015

Reinventing XML, one data type at a time.

andyjohnson0 · on Oct 14, 2015

My thoughts exactly. JSON is great for Javascript clients, but if you're dealing with clients written in multiple languages then there is already a good language-neutral serialisation format: XML. Just because it's not fashionable (with some) doesn't mean it doesn't work.

Edit: So why the downvotes? How about a conversation instead?

oskarth · on Oct 14, 2015

The downvotes are because it comes across as a shallow, middlebrow dismissal.

An interesting and useful criticism would first engage with the strongest arguments of Rich Hickey (creator of Transit and edn). If you find something in his Language of the System talk (https://www.youtube.com/watch?v=ROor6_NGIWU) that you either (a) disagree with (b) think XML solves already, I and many others would certainly be interested in having that conversation.

Note that this doesn't mean the exposition of "Why Transit" can't be better, but that calls for constructive criticism on how explain the ideas better, or a question made in qood faith. What it doesn't call for is a hostile reply saying in effect "pff, already been done already, stop reinventing the wheel".

calibraxis · on Oct 14, 2015

Transcript of that talk: https://github.com/matthiasn/talk-transcripts/blob/master/Hi...

rjbwork · on Oct 14, 2015

I used to agree, but I've lately come over to the JSON side, I think. It's easier to read by a human, and it doesn't have this weird "should it be a node or an attribute" thing. Single things are properties, many things are arrays.

And now with Schemas and editor support for them, I think it is an acceptable replacement personally.

xg15 · on Oct 14, 2015

Also whitespace handling. XML is still based on SGML and its idea that a document is basically a single string of text with certain substrings "marked up" with metadata. This assumption is of course completely false for the overwhelming majority of XML use cases. Yet, it still influences many design decisions and processing steps in the XML toolchain, the most prominent being that simply formatting an XML document may change its contents.

JSON is a lot more "honest" in this respect, in that it's core data model is already useful for many applications, even without additional standards bolted on top of it. (Though those exist, this article being one of them)

usrusr · on Oct 14, 2015

Explicit arrays are what I consider the most important advantage of JSON over XML. In XML, every element is repeatable by default and only out-of-band specification can restrict cardinality. Two bytes extra per array, that convey so much structural self-documentation that is just not there in an XML document.

So JSON succeeded because XML is not verbose enough? Really did not see that one coming.

andyjohnson0 · on Oct 14, 2015

Nodes for data, attributes for metadata.

I think you made a good argument, although I personally prefer the more mature XML tooling and metadata support for versioning.

krick · on Oct 14, 2015

It's hardly answers the question, as strictly speaking there's no such thing as "metadata". Everything you can pass to another person can be described as "data", whether it is a color of your coat or the statement that you are describing the color of your coat. Basically, you can think of any attribute of an object as "metadata" as long as it can be only one per object and doesn't have any attributes itself, and both things can change easily and depend on the point of view.

On the contrary, I don't remember one single case when the object was described easily using "node/attribute" separation, but couldn't be described as easily using JSON. In fact, I don't even think it's possible, as you can always make two children for each object: "attributes" and "nodes".

So I guess it actually is unnecessary complication and not the benefit of using XML over JSON.

HolyHaddock · on Oct 14, 2015

That's fine in theory, but it's not a universal heuristic, and even then, different smart well-intentioned people independently applying it won't necessarily come to the same result.

dragonwriter · on Oct 14, 2015

> Nodes for data, attributes for metadata.

Its a nice soundbite, but it ends up being less than useful in practice, because all metadata is data, and almost any data can be viewed as metadata, a distinction which is both subjective and strongly influenced by the use to which a consumer is putting the data rather than being determined on the basis solely of the inherent nature of the data.

andyjohnson0 · on Oct 14, 2015

Its a rule of thumb (heuristic). Judgement is still required.

(A soundbite is something altogether different.)

pjc50 · on Oct 14, 2015

http://programmers.stackexchange.com/questions/61198/if-xml-...

http://blog.codinghorror.com/xml-the-angle-bracket-tax/

http://nothing-more.blogspot.co.uk/2004/10/where-xml-goes-as...

SOAP's serialisation of RPC predated the popularity of JSON, and has largely been replaced by JSON for web-RPC through REST APIs. Why? Because it's needlessly verbose and complicated.

(I do worry that new serialisation formats are being developed in a vacuum, and we'll reinvent ASN1 or something)

vvanders · on Oct 14, 2015

Every XML parser I've seen is garbage on the performance side of things.

Personally I'm a fan of YAML for both being fast and human readable.

creshal · on Oct 14, 2015

JSON is pretty great to parse in a variety of (web-relevant) languages.

And it's a good data exchange format if you're okay with its loose typing.

But if I need a strongly-typed, extensible markup language, I'd think really hard about inventing my own…

wtbob · on Oct 14, 2015

Sure, XML may be decent as an extensible markup language, but what if you need a strongly-typed data exchange format?

dragonwriter · on Oct 14, 2015

> what if you need a strongly-typed data exchange format

I'm not convinced that "strongly-typed" is an attribute that can meaningfully be possessed by a data exchange format. "Strongly-typed" is about allowed actions in processing data, not about the inherently-static format of data exchange or serialization.

dec0dedab0de · on Oct 14, 2015

I'm going to risk sounding thick here, but why would you need a strongly-typed data exchange format? I always thought the beauty of JSON was that it forced the sender to organize the data in a generic way, which allowed the receiver to interpret it however needed.

prodigal_erik · on Oct 14, 2015

Without a strongly-typed data exchange format, you constantly have to write code that makes assumptions you hope the sender followed. Timestamps are ISO formatted. No, timestamps are seconds since 1970. No, they're floats including milliseconds. Blobs are base64. No, hex. Money is a string. No, it's a float. No, it's an int in micros. An empty array is different than an omitted field. No, they're the same. No, multiple values are comma-delimited strings.

creshal · on Oct 14, 2015

> Money is a string. No, it's a float. No, it's an int in micros

No, it's a decimal number with attribute-dependant accuracy settings, with at least three different accuracy levels used in the same object used for different purposes.

(Just something I came across today at work. Thankfully, my SOAP library handles that automatically…)

dec0dedab0de · on Oct 14, 2015

But no matter what format any bit of data is in, it should still be documented. Then any consumers of the data are going to have to convert it into a format that suits their language and application.

wtbob · on Oct 15, 2015

> But no matter what format any bit of data is in, it should still be documented.

That's what types are: enforced documentation.

dec0dedab0de · on Oct 15, 2015

That's what types are: enforced documentation.

That's an interesting way to think about it, but I'm not sure it applies here. From the example given about money being a string or a float. A type would just enforce either or, but the consumer of the data would still need to know what it is, and convert it to whatever is necessary for them.

icebraining · on Oct 14, 2015

Turtle!

Well, I can dream..

hyperpallium · on Oct 14, 2015

XML is eXtensible (via DTD or XS), but JSON is a better Object Notation than any Markup Language.

aerique · on Oct 14, 2015

Hopefully something better than XML comes of it before the enterprise gets its grubby hands on it!

togusa · on Oct 14, 2015

Enterprise here. We like XML because it's explicit.

   { number: 1.0000000000000000000000000000001 }

Try parsing the above JSON consistently in several languages without a consistent schema definition.

oneeyedpigeon · on Oct 14, 2015

True, any problem domain that requires accuracy to 31 decimal places might not be best suited to JSON. I wonder how many decimal places would be required to represent the percentage of projects that would actually be affected by that...

togusa · on Oct 14, 2015

That's not the problem I'm outlining. It is that semantics around type handling are implied rather than specified by schema.

oneeyedpigeon · on Oct 14, 2015

I don't understand your point. The semantics around type-handling are very clearly defined in the JSON spec: a number is simply a series of digits (plus decimal point, - sign, etc.). If you choose to interpret that JSON in a language with less than infinite precision, then you need to accept that some data loss can occur. That will happen if you're mushing an XML string into a number type, too.

orless · on Oct 14, 2015

I'm also not sure I'm getting you. In JSON, types can be unambiguously deferred by syntax/via gramar. You can't mistake JSONString for JSONNumber/JSONNullLiteral/JSONBooleanLiteral, the type is absolutely clear from the syntax. So I don't see how a schema would help here. What would the schema express? Surely not "this is a number" as it is clear without the schema. What then?

And what exactly do you mean by "semantics around type handling are implied", especially in `{ number: 1.0000000000000000000000000000001 }` case?

aldanor · on Oct 14, 2015

It's not valid JSON to start with.

togusa · on Oct 14, 2015

I know. This was intentional and I'm glad someone noticed it finally.

Run it through JSON lint ( http://jsonlint.com/ ), then fix the error (unquoted number text), then run it again and watch the data loss occur due to my original point...

nadam · on Oct 14, 2015

Then use this instead:

{ number: "1.0000000000000000000000000000001" }

How is it worse than XML?

zamalek · on Oct 14, 2015

As soon as you are making a schema for JSON you've eliminated XML with no good justification. More and more, people are re-inventing XML on top of JSON: schemas, namespace, the works. JSON is great because when you don't need all the things that XML solves it's extremely succint and readable. Use any single copied feature from XML and it turns out being more verbose and less readable than XML.

Use the right tool for the job.

raydev · on Oct 14, 2015

"As soon as you are making a schema for JSON you've eliminated XML with no good justification"

Sure, if you don't consider "human-readable" a good justification. Some of us do.

togusa · on Oct 14, 2015

People moan about XML readability somehow forget that it uses approximately the same syntax as HTML. Imagine HTML in JSON. Nope...

dragonwriter · on Oct 14, 2015

> People moan about XML readability somehow forget that it uses approximately the same syntax as HTML

No, they don't.

> Imagine HTML in JSON.

Yes, sure, JSON is a crappy text markup language, and would be much less readable than HTML for that purpose.

OTOH, readability when used as a markup language for content consisting largely of prose text and readability when used as a structured serialization format for data that doesn't mostly consist of large blocks of annotated prose isn't necessarily the same thing.

orless · on Oct 14, 2015

The point is, XML is quite good for unstructured, semi-structured and strongly-structured data and is more versatile from this point of view.

dragonwriter · on Oct 14, 2015

> The point is, XML is quite good for unstructured, semi-structured and strongly-structured data

That's a highly-subjective and controversial point.

(To me, XML is the Java of data languages -- its a lot worse than the best alternative considered on its own for almost any purpose -- though the best alternative will vary by purpose -- but it has a fairly wide range of uses for which its not intolerably bad, and its often a better choice than its inherent features would suggest because of the strength and maturity of the ecosystem around it.)

zamalek · on Oct 14, 2015

> best alternative

For 100% of the XML feature-set I don't actually know of a viable alternative. If you are using XML for the right reasons and the right way (rare) there is currently little or nothing that can replace. That being said, there are a vanishingly small amount of problems that actually require XML - namespaces and extensibility are two of them.

dragonwriter · on Oct 14, 2015

> For 100% of the XML feature-set I don't actually know of a viable alternative.

Real problems rarely need 100% of the XML feature set to solve. The breadth of the feature set is why there are lots of problems for which XML is a tolerable solution based on its inherent features (which in turn is a big factor in why it has such a large ecosystem), but they often don't make it the best solution (especially before considering the ecosystem, which is important in choosing a tool, but not a reason to avoid developing a new alternative, since any new alternative is going to start with an ecosystem disadvantage, but with adequate inherent value should be able over time to gather an ecosystem of tools.)

orless · on Oct 15, 2015

I should have probably said "equally good".

togusa · on Oct 14, 2015

See: https://news.ycombinator.com/item?id=10385610

nadam · on Oct 14, 2015

Ok, now I see that you are complaining that the JSON Schema language is not strong enough. In this case I think people should create better new schame languages for JSON, as I find that the JSON syntax is better suited to data than XML, which is better suited to marked up text. (maps, arrays and atomic literals are fundamental data concepts, while tags, attributes, free text are closer to mark-up. )

togusa · on Oct 14, 2015

The thing with XML is it's actually much lower level. There are no types inferred by an XML document. It's literally just chunks of data. JSON defines strings, boolean, maps, arrays, numbers.

It's conceptually easier to think about JSON with simple data sets but it's terribly inflexible and you have to think about how things are represented inside strings. A couple of thought exercises on JSON:

1. How do you represent an image inside JSON?

2. How do you represent a reference to another part of the data in JSON (consider a DAG for example)?

3. How do you represent an ordered set or an unordered set in JSON?

4. How do you represent an unsigned value in JSON?

neoeldex · on Oct 14, 2015

Try creating that number is several languages.

#/usr/bin/python n = 1.0000000000000000000000000000001 print n # 1.0

//Javascript n = 1.0000000000000000000000000000001; console.log(n); // 1

icebraining · on Oct 14, 2015

  from decimal import Decimal
  n = Decimal("1.0000000000000000000000000000001")

just because float is the default, doesn't mean you have to use it. JSON doesn't have anything else, though.

orless · on Oct 14, 2015

JSON does not have float (nor double) either. JSON has JSONNumber.

progx · on Oct 14, 2015

Feel free to write a schema definition for json data (or use the existing). Why should this not be possible?

togusa · on Oct 14, 2015

Well in this case, JSON schema doesn't specify any more fundamental types other than "number". On the receiving end of a wire or network contract, how do we pick a storage type for "number"? We can't because the constraints of the type are undefined. Ergo JSON schema isn't a strong schema language. Be explicit is really important when defining contracts.

Type this in your address bar for an illustration:

    javascript:alert(1.0000000000000000000000000000001);

So where can we go here. Yep:

   { number: "1.0000000000000000000000000000001" }

which means we then break the encapsulation boundary of the metadata. Then we have a wire contract that says "this is a string" and a separate semantic contract that says "this is a decimal".

XML:

   <number>1.0000000000000000000000000000001</number>

Schema:

   <xs:element name="number" type="xs:decimal"/>

This is just one example. We can also serialize and deserialize complex self-relational composite types transparently at both ends of the channel.

This is a real world problem we encounter in the financial sector every day.

tel · on Oct 14, 2015

Otoh, what's the difference between

    { "number": "1.0000000000000000000000000000001" }

along with

    { "number": { "type": "ModelReal", precision: 64 } }

versus

    <number>1.0000000000000000000000000000001</number>

along with

    <xs:element name="number" type="xs:decimal"/>

In each case you have text data given meaning by an external semantics enforced via a schema. The real and unavoidable downside is that JSON actually contains a really lousy primitive. It wouldn't be bad except practically every implementation of JSON automatically performs a lossy coercion to IEEE floats.

orless · on Oct 14, 2015

Well, the `xs:decimal` example is quite interesting. In the XML Schema specification on xs:decimal (http://www.w3.org/TR/xmlschema-2/#decimal), you'll see the following note:

All minimally conforming processors must support decimal numbers with a minimum of 18 decimal digits (i.e., with a totalDigits of 18).

You have 32 decimal digits in 1.0000000000000000000000000000001, so "minimally conforming processors" are, according to spec, actually free to drop everying after 17th place after comma in this case.

You say "constraints of the type are undefined". I don't see how "double-precision 64-bit format IEEE 754 value" from ECMAScript spec is lesser defined than "decimal numbers with a minimum of 18 decimal digits".

togusa · on Oct 14, 2015

JSON doesn't define the number format or precision which is a major issue. It's not IEEE754, its a string of digits. What you have done is prove the point that it's implementation specific, in this case JavaScript.

You're right about decimal precision however so I conceded there but the precision and capability is defined.

orless · on Oct 14, 2015

I just see no big difference between xs:decimal and JSONNumber. None of them is defined precisely enough to guarantee unambiguous handling of numbers like 1.0000000000000000000000000000001.

Tepix · on Oct 14, 2015

On Firefox, the maximum number of digits that yield values different to 1 are:

javascript:alert(1.000000000000001);

dustingetz · on Oct 14, 2015

If Transit is just a human readable XML that is easy to use, that's a great thing because people will actually use it!

Skinney · on Oct 14, 2015

Not really. Transit doesn't care about the underlying data-layer. It could be JSON, could be msgpack, could be XML. Currently, only JSON and msgpack backends exist. XML is probably ruled out due to performance.

Skinney · on Oct 14, 2015

Why the downvote? XML could also benefit from automated tag-shortening (in my last place of employement, we encoded XML-tags and attribute-keys manually).

escherize · on Oct 14, 2015

The most novel use of transit is in the Sente [1] library for clojure/script. It is an abstraction over long-polling / websockets that lets us treat it as a core.async channel (which is like a go-block in Go).

It's worked awesome for updates, and using Transit to keep the transmissions minimal has let us focus on the API for a realtime system.

[1] - https://github.com/ptaoussanis/sente#sente-channel-sockets-f...

retrogradeorbit · on Oct 14, 2015

In a project I tried both chord and sente and settled on chord. Chord was much simpler.

It was a point to point system. Sente seemed to have better support for point to multipoint (like a chat app). It was overkill for what we were doing and chord fit the bill nicely.

https://github.com/jarohen/chord

mukundmr · on Oct 14, 2015

Why choose this over Google's Protocol buffers? https://github.com/google/protobuf

mstade · on Oct 14, 2015

One reason for using something like Transit in browser-land is performance, believe it or not. According to tests we did (6+ months ago, so take with some salt) there was significantly less latency in deserializing Transit, which matters more than size on the wire in some cases. This is because Transit is just JSON with some semantics added, so in the simple case it's just a `JSON.parse` away from being a usable data structure in Javascript. Of course, if you actually want to make use of Transit features, you'll want a reader, but these can be made very performant (especially if you don't have to support all features) so latency can still be kept very low.

There are other reasons as well, but deserialization latency is a big one.

jzwinck · on Oct 14, 2015

One issue with Protobufs is they do not always enable optimal performance (latency, power). Better than JSON but worse than some others. Compared to MessagePack, Protobufs require a schema, which is not always convenient. Compared to Cap'n Proto, Protobufs are less CPU efficient. Trade offs all around.

zamalek · on Oct 14, 2015

It looks like Transit can be late-bound - like JSON: you don't need to know or negotiate the schema ahead of the serialization.

mateuszf · on Oct 14, 2015

One reason is compatibility - it can be encoded to json and thus is handled by http proxies, etc. It is also human readable in encoded form.

icebraining · on Oct 14, 2015

Protobuf is just a serialization format; you can send it over HTTP.

nly · on Oct 14, 2015

Not sure I like the idea of cramming ASCII type tags in to the encoded JSON.

I'm more partial to the way Avro does it, where the encoded JSON remains type-tag and cruft free, and a separate schema (also JSON) is used (and required) to interpret the types, or encode to the correct binary encoding.

Skinney · on Oct 14, 2015

Transit also allows you to use msgpack, but JSON is more performant on the web.

Another feature Transit has is that it caches identical keys (can also cache values with some additional code), giving you a smaller footprint.

nly · on Oct 14, 2015

Sure, but all that comes at the huge disadvantage of throwing away the ability for someone to just grab a JSON library and access your data adhoc. Now you have to fiddle with stripping away tildes and cache references and such.

At the same time, I don't see how you can do anything sensible with Transit except throw it in to a hashmap/dict alongside runtime type information. This is natural for dynamic languages, but, without a schema, you're left to write you're still left to write your own error-prone structural validation, and can't fully leverage the efficiency of static languages. MsgPack has this problem as well.

Optimising the JSON representation just doesn't seem worth it to me. Avro for example encodes enums by exposing the type names as JSON keys. It sucks, and it's something I'd change, but at the end of the day you're building on a poor transport. Caching reminds me a lot of DNS label compression, or a poor mans preset zlib dictionary.

Skinney · on Oct 14, 2015

The great thing about Transit is that I can extend the data layer with new types (like JodaTime). Even without that, I really like Transit's builtin support for sets, keywords, symbols, lists and vectors (which really helps in a clojure environment).

Another great thing (again, for someone using functional languages) is that Transit can ensure identity for equal values.

When it comes to structural validation, that is something I use prismatic.schema for. Of course, it helps that I can share the validation code between backend and server (using Clojure and ClojureScript).

When using plain JSON, I had to convert types myself after decoding/encoding, which was error-prone and gave a surprising amount of bugs (especially regarding dates).

Transit solves a big problem for me, while still retaining the easy to read syntax which JSON has, and also has great tooling in browsers.

Also, caching isn't only about smaller size. It allows for faster parsing of the initial data, allowing Transit to be just as fast as plain JSON for certain payloads, even considering that it has to expand from cache and decode values.

sandij · on Oct 14, 2015

These posts by the author of transit-js clear up some things and go in depth about performance:

http://swannodette.github.io/2014/07/23/a-closer-look-at-tra... http://swannodette.github.io/2014/07/26/transit--clojurescri... http://swannodette.github.io/2014/07/30/hijacking-json/

fnordsensei · on Oct 14, 2015

I recently used this in a project where I simply wanted a typing guarantee that JSON can't provide (i.e., that a timestamp really is a timestamp when it arrives on the other side, not a string like in JSON). It's very easy to use, more or less just a drop-in middleware.

kayamon · on Oct 14, 2015

I can't help but wonder if it isn't simpler just to use gzipped JSON. I'd be interested to see a size comparison of the two. It seems like they're going to an awful lot of work to hand-roll a suboptimal text compression scheme here.

easytiger · on Oct 14, 2015

It's the whole FAST/FIX clusterfuck all over again. Use a binary format and stop cocking around. You can't serialise/deserialise it faster and JSON is a string based mess. No low latency path can absorb it. Even influxdb have removed JSON support because it was their slowest part of their critical path that could not be optimised. Complete lack of mechanical sympathy + laziness is why JSON is popular outside of the web world

unimpressive · on Oct 14, 2015

>Complete lack of mechanical sympathy + laziness is why JSON is popular outside of the web world

I generally prefer JSON as a serialization format for a variety of reasons beyond what you cited:

1. Simplicity/plain text. Even a total newb could read a JSON file and probably puzzle out how the data works. This is important if I'm providing an export function or otherwise expect my users might want to shop somewhere else to deal with this data.

2. Forward compatibility/future proofing. If somebody finds my JSON file in twenty years with no documentation, (1) ensures that it's not going to be lost to them and they'll be able to extract the data with minimal difficulty.

3. Cross platform. It might not be the best format, but it's going to be supported in pretty much any language I would ever want to add support for the data to in a new system.

moomin · on Oct 14, 2015

Well, I think you need to look at the intended use cases for transit. On the browser, transit serialisation is faster than binary precisely because it uses the fast paths. For server to server communication, it can use msgpack.

There's also a) it's fully schemaless and b) you can add new primitives. Which you may or may not need. But if you're looking for something that does a) and b) while being fast for browser communication, transit's currently your best option.

zapov · on Oct 14, 2015

Easy there tiger ;)

What does it even mean that something is string based?

Here: https://github.com/eishay/jvm-serializers/wiki a JVM bench where JSON is pretty competitive with binary formats [and results would be event better if project owners actually merged PRs ;( ]

waxjar · on Oct 14, 2015

I don't think I would enjoy debugging a service that talks in a binary format.

to3m · on Oct 14, 2015

I've never done it with JS. It's not really a problem in C, though. If there's a lot of data, you're no better off with text; if there's not much, reading binary data isn't too bad. You've usually got other problems on top of decoding the actual data - like, why is this data coming in the first place? Why isn't the code accepting it properly? Why isn't the other end listening? The data encoding is just the tip of the iceberg.

Either way, the advantages of binary often outweigh the difficulty of having a human read the data. As the saying goes, code is executed far more often than it's read.

(Most binary formats have some fairly obvious possible text representations, at least for key fields (they're just encoding ints, floats, strings, bitfields, etc.). When you're having difficulty, or you're simply curious, you can print them out. If the format is any good, malformed messages are easy to check for in code - this code is not harder to debug than any other. If anything it's actually easier than with some kind of text format.)

waxjar · on Oct 14, 2015

Sure, but it's not very convenient.

I regularly MITM a connection between a mobile app and a HTTP server, when something isn't going quite right. A look at the JSON they exchange exposes the problem in under a minute more often than not. If I want to test something out quickly, I simply modify the incoming / outgoing JSON by hand. It's rare to get the syntax wrong and involves no context switches. I can then go back to the code, find the relevant section and make the changes I need to make. I find this a very convenient way of debugging and I don't think it would be as nice to do this with a binary format.

falcolas · on Oct 14, 2015

That ship is at the edge of the harbor, sails raised, and helmed by HTTP2.0. Binary formats are becoming the default, not the exception.

waxjar · on Oct 14, 2015

True. Fortunately, we can be reasonably sure there will be multiple good tools to debug something talking over HTTP 2.0. With something like Transit, I'm not so sure.

djKianoosh · on Oct 14, 2015

debugging the data, no. debugging the code, depending on how the representation looks like, probably wouldn't be too bad...

dantiberian · on Oct 14, 2015

From the announcement blog post:

Transit is self describing using tags, and encourages transmitting information using maps, which have named keys/fields which will repeat often in data. These overheads, typical of self-describing formats, are mitigated in Transit by an integrated cache code system, which replaces repetitive data with small codes. This yields not only a reduction in size, but also an increase in performance and memory utilization. Contrast this with gzipping, which, while it may reduce size on the wire, takes time and doesn't reduce the amount of text to be parsed or the number of objects generated in memory after parsing.

- http://blog.cognitect.com/blog/2014/7/22/transit

hyperpallium · on Oct 14, 2015

> The extension mechanism is open, allowing programs using Transit to add new elements specific to their needs. Users of data formats without such facilities must rely on either schemas, convention, or context to convey elements not included in the base set,

The extension mechanism is writing handlers in all languages communicated with, since its stated purpose is cross language value conveyance.

In contrast, a schema language allows extensions to be described once, in one language.

I was expecting this to be a sort of macros for data notation (an inline schema language), but it seems more like an extendible serialization library.

dragonwriter · on Oct 14, 2015

> In contrast, a schema language allows extensions to be described once, in one language.

No, it doesn't. A schema language allows that to be done for the syntax, but requires the semantics to be implemented for each language. Schema languages often include, as part of their specifications, core types which must be supported; when an application restricts itself to these core types and types whose only important semantics are derived from them (e.g., restricted subsets of core types in most cases), then the fact that every full implementation that supports the core types will already have this work done for every language means no additional work is necessary. But that's not a product of a schema language preventing types to be implemented for all host languages, that's a result of the fact that a predefined set of core types accompanying the schema language means that all implementations are required to have already done the work of implementing the core types for all languages.

Skinney · on Oct 14, 2015

Do you have an example of such a schema language?

Perceptes · on Oct 14, 2015

Previous HN discussion from when it was announced last year: https://news.ycombinator.com/item?id=8069346

agentgt · on Oct 14, 2015

The main advantage it seems for Transit is that JSON is fast for many different clients. Otherwise I'm not sure I would ever use it for internal services considering there are so many other probably faster options (Protobuf,Avro,SBE ... etc).

Are folks using it for internal services?

nablaone · on Oct 14, 2015

I think i saw it before:

http://www.lispworks.com/documentation/HyperSpec/Body/02_dh....

agopaul · on Oct 14, 2015

So it's basically a set of libraries used to marshal/unmarshal objects without using a schema or can it also be used as an RPC library?

maweki · on Oct 14, 2015

So the python implementation is 2.7 only...

latenightcoding · on Oct 14, 2015

Very interesting (just commenting to read it later)

icebraining · on Oct 14, 2015

You can just upvote it, then go to https://news.ycombinator.com/saved?id=latenightcoding

Or use bookmarks :)