Hacker News new | past | comments | ask | show | jobs | submit login
Transit: JSON Data Interchange Format (github.com/cognitect)
164 points by dedalus on Oct 14, 2015 | hide | past | favorite | 118 comments



I couldn't get the why of the project from the Github page alone. Rich Hickey's post introducing it a year ago is clearer:

http://blog.cognitect.com/blog/2014/7/22/transit

JSON has become the go-to schemaless format between languages, despite being verbose and having problems with typing. Transit aims to be a general-purpose successor to JSON here.

Stronger typing allows the optional use of a more compact binary format (using MessagePack). Otherwise it too uses JSON on the wire.

Anyone who knows more, please correct me.


> JSON has become the go-to schemaless format between languages, despite being verbose and having problems with typing. Transit aims to be a general-purpose successor to JSON here.

XML has become the go-to schemaless format between languages, despite being verbose and having problems with typing. JSON aims to be a general-purpose successor to XML here.


Yes, except that XML has a schema (XSD).



Quoting http://json-schema.org/latest/json-schema-core.html: "This Internet-Draft will expire on August 3, 2013."


... which, by the way, does not mean that JSON Schema is bad or unusable. I actually quite like it, it has certaint elegance. Just not really standard.


That's an exterior thing. It's perfectly reasonable to talk about XML the schemaless markup language.


History repeats itself, after all.


XML is not schemaless. It was designed to be extensible through schemas. It's in the name, after all: eXtensible Markup Language.

XML also doesn't suffer the problems of a limited set of pre-defined types that JSON has. Its type system is extensible through schemas.


> XML is not schemaless.

Yes, it is, though there are a variety of schema schemes that can be used with XML.

> It was designed to be extensible through schemas.

It was designed to be extensible through custom tags (in fact, all tags are custom tags); it can be optionally be restricted through the use of schemas, not extended.

> XML also doesn't suffer the problems of a limited set of pre-defined types that JSON has.

Yes, it does, with an even smaller number of types. (Strings, with a couple different representations.)

> Its type system is extensible through schemas.

No, it isn't. Its structure can be constrained through schemas (XML Schema, Relax NG, etc.), which can provide a means of modeling a different type system on top of the data structure supported by XML, but that isn't really "extending" XMLs type system any more than using a Haskell compiler to generate x86 machine code is "extending" the type system of x86 machine code; its just providing a completely separate type system on a different layer.

Also, no matter how you view this, its not a distinction between XML and JSON, since you can do the same thing with JSON, and just as there are a variety of different schema standards that do this for XML, there are a variety that do the same thing for JSON.


Another important feature is stream-oriented caching of values. Essentially some range of opcodes in the stream language are reserved for referring to a cache of recently seen values. It's a "dynamic" encoding feature that's super beneficial for common payloads, like JSON API responses with lots of similar keys. This is something a protobuf/thrift-like should steal, as it can often be more efficient than vanilla field-number + value encoding.


> maps (with arbitrary scalar keys, not just strings)

Floating point keys just seem like a terrible idea.


> Floating point keys just seem like a terrible idea.

I don't see why floating point (with a defined precision) keys are any worse fundamentally than any finite-domain scalar keys. The applications for which they are useful may be limited, and they may be the wrong choice for some uses, but that's true of anything.


> floating point (with a defined precision)

If it's fixed-precision it's not floating point, it's basically an integer we interpret with a decimal point somewhere when we display it. If you didn't mean fixed-precision, then I'm not sure how actual floating points could sanely act as a hash key.


> If it's fixed-precision it's not floating point

I'm not referring to fixed point, but fixed-size representation.

> If you didn't mean fixed-precision, then I'm not sure how actual floating points could sanely act as a hash key.

I don't see what the problem is. Certainly, there's problems if you do certain floating point calculations and blindly look for the result as a key in a mapping, but that's not a problem with floats as keys, its a problem with that particular usage pattern.


> I'm not referring to fixed point, but fixed-size representation.

I'm still not sure what you mean as any IEEE floating point (which in reality all anyone uses) is fixed-size. If you decide to truncate, round, do w/e to this value, you may still not get exactly the same answer for similar computations or even the same computation on different machines. There isn't a reason you can't (well, shouldn't) do float_val == 0.0 -- it won't work in all circumstances.

Please explain your method of making sure float_val == 0.0 while retaining the essence of being floating point (i.e. not using fixed-point).


Even with a defined precision they're weird in that they "ought to" be modeling the reals and therefore lack computable equality.

Which makes it hard to write a well-defined hash.


Am I the only one amazed by what the Clojure community and core team are conjuring up?

Doing client-side programming with things like CLJS, Figwheel, Reagent and core.async feels miles ahead of what we have in moden-js-land (es6/7, babel, webpack, React, promises).

If you were to start a startup today, would you be comfortable going with something like Clojure/script?


Absolutely happy with my choice of Clojure as a startup (doing data science, analytics, some front-end work).

Biggest wins for us are:

- ClojureScript / figwheel as an awesome front end development combination.

- The combination of functional programming with immutable data structures

- Lisp "magical powers" (macros, interactive REPL etc.)

- Ability to exploit the Java library ecosystem whenever you need it

Biggest downside = Lack of types.


I run my startup on Clojure and cljs.

Everything you've mentioned has been awesome for us. The arrows we take as pioneers are things like having to roll libraries for email-as-a-service, stripe, etc. Granted it's usually easy but can be time consuming.


The Clojurescript community are ahead but not by that much. To list specifics, cljs->babel+and immutable lib, figwheel->react-hot-loader, reagent->react 0.14 pure components, core.async->js-csp (or async/await).

In terms of non-component organization, I believe re-frame is a significant improvement over redux. Reactions are good when you don't control the endpoints, splitting out the reducers into pure functions is good, but adding middleware on them is the real win. On the other hand, it wouldn't be that hard to adapt the model to redux.

The next phase of organization is integrating Relay/Falcor concepts. David Nolan gave a talk about this in Om Next at NYC Clojure last month and there is a video. Om Next as presented is very compelling if you're on Datomic and less so otherwise.

As to your startup question:

I've been a full time cljs dev on a b2b app at Reuters for the past 9 months. I took up the job specifically because I wanted to write cljs. I had been involved with the Clojure community (I care about state) but only working on toy projects for the previous ~4 years.

My experience with Clojurescript is that it was less of an improvement on modern js than I was expecting. The biggest advantages are protocols and the standard library being both rich and standard. Nice to haves but non-critical are native syntax for immutable maps and multimethods. I guesstimate I write ~10% less code in cljs versus js but you're ultimately writing the same stuff.

Problems I've run into:

Full build time for this app is long. Our app is in the 15k LoC range across ~150 files and cold compile is 140s on a 2012 MBA. It's annoying but incremental compilation times are sub-second after some build config tweaking.

We have one component in particular that tends to get lost when switching branches and the missing namespace forces a fresh compile. Our cljs version is from June so this may be fixed. I've also spent a number of hours debugging problems that turned out to be stale build issues.

I tried a couple times unsuccessfully to get Emacs (cider) to connect to a figwheel repl. After a few evals things simply become unresponsive. Just using figwheel is good enough but I miss the in-editor repl. Haven't tracked down the reason, could be my lack of emacs knowledge.

If you're using core.async, the main loop has a try/catch/rethrow. This causes Chrome Dev Tools to break in the outer loop instead of actually at the problem. You have to explicitly err.stack in the console (which is not source mapped) and don't have access to the locals unless you manually set a breakpoint at the error and reload. You also get to learn to read the JS representation of Clojure literals. None of this is impossible and if you're working in a tight loop you tend to have a pretty good idea of what the error is without jumping through debugger hoops but if you're doing something like switching branches or refactoring it's annoying.

I like Reagent but I've had a number of times where its behavior doesn't match my expectations. In particular, figuring out what part of the vdom is invalidated on a ratom change caused me problems. There's a gotcha that sequences must be forced with doall or you'll get weird behavior. At the moment I have a very expensive reaction (list processing ~6k items) that's getting run 8 times in response to a single key change in the source ratom so I'll be tracking that down tomorrow.

I don't consider this list a reason to not adopt Clojurescript. I can make a similar list for the Babel stack.

As for the question of would I be comfortable, I like writing clojurescript but I'd only really recommend it if you're committed to full stack Clojure. It takes a number of weeks for a new frontend hire to ramp up on the language. I've discussed this with the other frontend specialist on the team and our consensus is that cljs is a better language but we're not that much more productive in the language compared to ES2015 so I'm not really convinced the weeks of ramp up time are worth it. Our experience with hiring has been that we've had very few candidates but they've all been skilled.


The Clojurescript community are ahead but not by that much. To list specifics, cljs->babel+and immutable lib, figwheel->react-hot-loader, reagent->react 0.14 pure components, core.async->js-csp (or async/await).

Immutable, js-csp, etc... were all inspired by Clojurescript, I guess that's the point the grandparent was trying to make.


> If you're using core.async, the main loop has a try/catch/rethrow. This causes Chrome Dev Tools to break in the outer loop instead of actually at the problem.

OMG! Is that what is causing that behaviour?! It's frustrating (though not a show stopper) and I was unsure of the cause. Thanks for pointing that out. I will have a deeper look.


Reinventing XML, one data type at a time.


My thoughts exactly. JSON is great for Javascript clients, but if you're dealing with clients written in multiple languages then there is already a good language-neutral serialisation format: XML. Just because it's not fashionable (with some) doesn't mean it doesn't work.

Edit: So why the downvotes? How about a conversation instead?


The downvotes are because it comes across as a shallow, middlebrow dismissal.

An interesting and useful criticism would first engage with the strongest arguments of Rich Hickey (creator of Transit and edn). If you find something in his Language of the System talk (https://www.youtube.com/watch?v=ROor6_NGIWU) that you either (a) disagree with (b) think XML solves already, I and many others would certainly be interested in having that conversation.

Note that this doesn't mean the exposition of "Why Transit" can't be better, but that calls for constructive criticism on how explain the ideas better, or a question made in qood faith. What it doesn't call for is a hostile reply saying in effect "pff, already been done already, stop reinventing the wheel".



I used to agree, but I've lately come over to the JSON side, I think. It's easier to read by a human, and it doesn't have this weird "should it be a node or an attribute" thing. Single things are properties, many things are arrays.

And now with Schemas and editor support for them, I think it is an acceptable replacement personally.


Also whitespace handling. XML is still based on SGML and its idea that a document is basically a single string of text with certain substrings "marked up" with metadata. This assumption is of course completely false for the overwhelming majority of XML use cases. Yet, it still influences many design decisions and processing steps in the XML toolchain, the most prominent being that simply formatting an XML document may change its contents.

JSON is a lot more "honest" in this respect, in that it's core data model is already useful for many applications, even without additional standards bolted on top of it. (Though those exist, this article being one of them)


Explicit arrays are what I consider the most important advantage of JSON over XML. In XML, every element is repeatable by default and only out-of-band specification can restrict cardinality. Two bytes extra per array, that convey so much structural self-documentation that is just not there in an XML document.

So JSON succeeded because XML is not verbose enough? Really did not see that one coming.


Nodes for data, attributes for metadata.

I think you made a good argument, although I personally prefer the more mature XML tooling and metadata support for versioning.


It's hardly answers the question, as strictly speaking there's no such thing as "metadata". Everything you can pass to another person can be described as "data", whether it is a color of your coat or the statement that you are describing the color of your coat. Basically, you can think of any attribute of an object as "metadata" as long as it can be only one per object and doesn't have any attributes itself, and both things can change easily and depend on the point of view.

On the contrary, I don't remember one single case when the object was described easily using "node/attribute" separation, but couldn't be described as easily using JSON. In fact, I don't even think it's possible, as you can always make two children for each object: "attributes" and "nodes".

So I guess it actually is unnecessary complication and not the benefit of using XML over JSON.


That's fine in theory, but it's not a universal heuristic, and even then, different smart well-intentioned people independently applying it won't necessarily come to the same result.


> Nodes for data, attributes for metadata.

Its a nice soundbite, but it ends up being less than useful in practice, because all metadata is data, and almost any data can be viewed as metadata, a distinction which is both subjective and strongly influenced by the use to which a consumer is putting the data rather than being determined on the basis solely of the inherent nature of the data.


Its a rule of thumb (heuristic). Judgement is still required.

(A soundbite is something altogether different.)


http://programmers.stackexchange.com/questions/61198/if-xml-...

http://blog.codinghorror.com/xml-the-angle-bracket-tax/

http://nothing-more.blogspot.co.uk/2004/10/where-xml-goes-as...

SOAP's serialisation of RPC predated the popularity of JSON, and has largely been replaced by JSON for web-RPC through REST APIs. Why? Because it's needlessly verbose and complicated.

(I do worry that new serialisation formats are being developed in a vacuum, and we'll reinvent ASN1 or something)


Every XML parser I've seen is garbage on the performance side of things.

Personally I'm a fan of YAML for both being fast and human readable.


JSON is pretty great to parse in a variety of (web-relevant) languages.

And it's a good data exchange format if you're okay with its loose typing.

But if I need a strongly-typed, extensible markup language, I'd think really hard about inventing my own…


Sure, XML may be decent as an extensible markup language, but what if you need a strongly-typed data exchange format?


> what if you need a strongly-typed data exchange format

I'm not convinced that "strongly-typed" is an attribute that can meaningfully be possessed by a data exchange format. "Strongly-typed" is about allowed actions in processing data, not about the inherently-static format of data exchange or serialization.


I'm going to risk sounding thick here, but why would you need a strongly-typed data exchange format? I always thought the beauty of JSON was that it forced the sender to organize the data in a generic way, which allowed the receiver to interpret it however needed.


Without a strongly-typed data exchange format, you constantly have to write code that makes assumptions you hope the sender followed. Timestamps are ISO formatted. No, timestamps are seconds since 1970. No, they're floats including milliseconds. Blobs are base64. No, hex. Money is a string. No, it's a float. No, it's an int in micros. An empty array is different than an omitted field. No, they're the same. No, multiple values are comma-delimited strings.


> Money is a string. No, it's a float. No, it's an int in micros

No, it's a decimal number with attribute-dependant accuracy settings, with at least three different accuracy levels used in the same object used for different purposes.

(Just something I came across today at work. Thankfully, my SOAP library handles that automatically…)


But no matter what format any bit of data is in, it should still be documented. Then any consumers of the data are going to have to convert it into a format that suits their language and application.


> But no matter what format any bit of data is in, it should still be documented.

That's what types are: enforced documentation.


That's what types are: enforced documentation.

That's an interesting way to think about it, but I'm not sure it applies here. From the example given about money being a string or a float. A type would just enforce either or, but the consumer of the data would still need to know what it is, and convert it to whatever is necessary for them.


Turtle!

Well, I can dream..


XML is eXtensible (via DTD or XS), but JSON is a better Object Notation than any Markup Language.


Hopefully something better than XML comes of it before the enterprise gets its grubby hands on it!


Enterprise here. We like XML because it's explicit.

   { number: 1.0000000000000000000000000000001 }
Try parsing the above JSON consistently in several languages without a consistent schema definition.


True, any problem domain that requires accuracy to 31 decimal places might not be best suited to JSON. I wonder how many decimal places would be required to represent the percentage of projects that would actually be affected by that...


That's not the problem I'm outlining. It is that semantics around type handling are implied rather than specified by schema.


I don't understand your point. The semantics around type-handling are very clearly defined in the JSON spec: a number is simply a series of digits (plus decimal point, - sign, etc.). If you choose to interpret that JSON in a language with less than infinite precision, then you need to accept that some data loss can occur. That will happen if you're mushing an XML string into a number type, too.


I'm also not sure I'm getting you. In JSON, types can be unambiguously deferred by syntax/via gramar. You can't mistake JSONString for JSONNumber/JSONNullLiteral/JSONBooleanLiteral, the type is absolutely clear from the syntax. So I don't see how a schema would help here. What would the schema express? Surely not "this is a number" as it is clear without the schema. What then?

And what exactly do you mean by "semantics around type handling are implied", especially in `{ number: 1.0000000000000000000000000000001 }` case?


It's not valid JSON to start with.


I know. This was intentional and I'm glad someone noticed it finally.

Run it through JSON lint ( http://jsonlint.com/ ), then fix the error (unquoted number text), then run it again and watch the data loss occur due to my original point...


Then use this instead:

{ number: "1.0000000000000000000000000000001" }

How is it worse than XML?


As soon as you are making a schema for JSON you've eliminated XML with no good justification. More and more, people are re-inventing XML on top of JSON: schemas, namespace, the works. JSON is great because when you don't need all the things that XML solves it's extremely succint and readable. Use any single copied feature from XML and it turns out being more verbose and less readable than XML.

Use the right tool for the job.


"As soon as you are making a schema for JSON you've eliminated XML with no good justification"

Sure, if you don't consider "human-readable" a good justification. Some of us do.


People moan about XML readability somehow forget that it uses approximately the same syntax as HTML. Imagine HTML in JSON. Nope...


> People moan about XML readability somehow forget that it uses approximately the same syntax as HTML

No, they don't.

> Imagine HTML in JSON.

Yes, sure, JSON is a crappy text markup language, and would be much less readable than HTML for that purpose.

OTOH, readability when used as a markup language for content consisting largely of prose text and readability when used as a structured serialization format for data that doesn't mostly consist of large blocks of annotated prose isn't necessarily the same thing.


The point is, XML is quite good for unstructured, semi-structured and strongly-structured data and is more versatile from this point of view.


> The point is, XML is quite good for unstructured, semi-structured and strongly-structured data

That's a highly-subjective and controversial point.

(To me, XML is the Java of data languages -- its a lot worse than the best alternative considered on its own for almost any purpose -- though the best alternative will vary by purpose -- but it has a fairly wide range of uses for which its not intolerably bad, and its often a better choice than its inherent features would suggest because of the strength and maturity of the ecosystem around it.)


> best alternative

For 100% of the XML feature-set I don't actually know of a viable alternative. If you are using XML for the right reasons and the right way (rare) there is currently little or nothing that can replace. That being said, there are a vanishingly small amount of problems that actually require XML - namespaces and extensibility are two of them.


> For 100% of the XML feature-set I don't actually know of a viable alternative.

Real problems rarely need 100% of the XML feature set to solve. The breadth of the feature set is why there are lots of problems for which XML is a tolerable solution based on its inherent features (which in turn is a big factor in why it has such a large ecosystem), but they often don't make it the best solution (especially before considering the ecosystem, which is important in choosing a tool, but not a reason to avoid developing a new alternative, since any new alternative is going to start with an ecosystem disadvantage, but with adequate inherent value should be able over time to gather an ecosystem of tools.)


I should have probably said "equally good".



Ok, now I see that you are complaining that the JSON Schema language is not strong enough. In this case I think people should create better new schame languages for JSON, as I find that the JSON syntax is better suited to data than XML, which is better suited to marked up text. (maps, arrays and atomic literals are fundamental data concepts, while tags, attributes, free text are closer to mark-up. )


The thing with XML is it's actually much lower level. There are no types inferred by an XML document. It's literally just chunks of data. JSON defines strings, boolean, maps, arrays, numbers.

It's conceptually easier to think about JSON with simple data sets but it's terribly inflexible and you have to think about how things are represented inside strings. A couple of thought exercises on JSON:

1. How do you represent an image inside JSON?

2. How do you represent a reference to another part of the data in JSON (consider a DAG for example)?

3. How do you represent an ordered set or an unordered set in JSON?

4. How do you represent an unsigned value in JSON?


Try creating that number is several languages.

#/usr/bin/python n = 1.0000000000000000000000000000001 print n # 1.0

//Javascript n = 1.0000000000000000000000000000001; console.log(n); // 1


  from decimal import Decimal
  n = Decimal("1.0000000000000000000000000000001")
just because float is the default, doesn't mean you have to use it. JSON doesn't have anything else, though.


JSON does not have float (nor double) either. JSON has JSONNumber.


Feel free to write a schema definition for json data (or use the existing). Why should this not be possible?


Well in this case, JSON schema doesn't specify any more fundamental types other than "number". On the receiving end of a wire or network contract, how do we pick a storage type for "number"? We can't because the constraints of the type are undefined. Ergo JSON schema isn't a strong schema language. Be explicit is really important when defining contracts.

Type this in your address bar for an illustration:

    javascript:alert(1.0000000000000000000000000000001);
So where can we go here. Yep:

   { number: "1.0000000000000000000000000000001" }
which means we then break the encapsulation boundary of the metadata. Then we have a wire contract that says "this is a string" and a separate semantic contract that says "this is a decimal".

XML:

   <number>1.0000000000000000000000000000001</number>
Schema:

   <xs:element name="number" type="xs:decimal"/>
This is just one example. We can also serialize and deserialize complex self-relational composite types transparently at both ends of the channel.

This is a real world problem we encounter in the financial sector every day.


Otoh, what's the difference between

    { "number": "1.0000000000000000000000000000001" }
along with

    { "number": { "type": "ModelReal", precision: 64 } }
versus

    <number>1.0000000000000000000000000000001</number>
along with

    <xs:element name="number" type="xs:decimal"/>
In each case you have text data given meaning by an external semantics enforced via a schema. The real and unavoidable downside is that JSON actually contains a really lousy primitive. It wouldn't be bad except practically every implementation of JSON automatically performs a lossy coercion to IEEE floats.


Well, the `xs:decimal` example is quite interesting. In the XML Schema specification on xs:decimal (http://www.w3.org/TR/xmlschema-2/#decimal), you'll see the following note:

All minimally conforming processors must support decimal numbers with a minimum of 18 decimal digits (i.e., with a totalDigits of 18).

You have 32 decimal digits in 1.0000000000000000000000000000001, so "minimally conforming processors" are, according to spec, actually free to drop everying after 17th place after comma in this case.

You say "constraints of the type are undefined". I don't see how "double-precision 64-bit format IEEE 754 value" from ECMAScript spec is lesser defined than "decimal numbers with a minimum of 18 decimal digits".


JSON doesn't define the number format or precision which is a major issue. It's not IEEE754, its a string of digits. What you have done is prove the point that it's implementation specific, in this case JavaScript.

You're right about decimal precision however so I conceded there but the precision and capability is defined.


I just see no big difference between xs:decimal and JSONNumber. None of them is defined precisely enough to guarantee unambiguous handling of numbers like 1.0000000000000000000000000000001.


On Firefox, the maximum number of digits that yield values different to 1 are:

javascript:alert(1.000000000000001);


If Transit is just a human readable XML that is easy to use, that's a great thing because people will actually use it!


Not really. Transit doesn't care about the underlying data-layer. It could be JSON, could be msgpack, could be XML. Currently, only JSON and msgpack backends exist. XML is probably ruled out due to performance.


Why the downvote? XML could also benefit from automated tag-shortening (in my last place of employement, we encoded XML-tags and attribute-keys manually).


The most novel use of transit is in the Sente [1] library for clojure/script. It is an abstraction over long-polling / websockets that lets us treat it as a core.async channel (which is like a go-block in Go).

It's worked awesome for updates, and using Transit to keep the transmissions minimal has let us focus on the API for a realtime system.

[1] - https://github.com/ptaoussanis/sente#sente-channel-sockets-f...


In a project I tried both chord and sente and settled on chord. Chord was much simpler.

It was a point to point system. Sente seemed to have better support for point to multipoint (like a chat app). It was overkill for what we were doing and chord fit the bill nicely.

https://github.com/jarohen/chord


Why choose this over Google's Protocol buffers? https://github.com/google/protobuf


One reason for using something like Transit in browser-land is performance, believe it or not. According to tests we did (6+ months ago, so take with some salt) there was significantly less latency in deserializing Transit, which matters more than size on the wire in some cases. This is because Transit is just JSON with some semantics added, so in the simple case it's just a `JSON.parse` away from being a usable data structure in Javascript. Of course, if you actually want to make use of Transit features, you'll want a reader, but these can be made very performant (especially if you don't have to support all features) so latency can still be kept very low.

There are other reasons as well, but deserialization latency is a big one.


One issue with Protobufs is they do not always enable optimal performance (latency, power). Better than JSON but worse than some others. Compared to MessagePack, Protobufs require a schema, which is not always convenient. Compared to Cap'n Proto, Protobufs are less CPU efficient. Trade offs all around.


It looks like Transit can be late-bound - like JSON: you don't need to know or negotiate the schema ahead of the serialization.


One reason is compatibility - it can be encoded to json and thus is handled by http proxies, etc. It is also human readable in encoded form.


Protobuf is just a serialization format; you can send it over HTTP.


Not sure I like the idea of cramming ASCII type tags in to the encoded JSON.

I'm more partial to the way Avro does it, where the encoded JSON remains type-tag and cruft free, and a separate schema (also JSON) is used (and required) to interpret the types, or encode to the correct binary encoding.


Transit also allows you to use msgpack, but JSON is more performant on the web.

Another feature Transit has is that it caches identical keys (can also cache values with some additional code), giving you a smaller footprint.


Sure, but all that comes at the huge disadvantage of throwing away the ability for someone to just grab a JSON library and access your data adhoc. Now you have to fiddle with stripping away tildes and cache references and such.

At the same time, I don't see how you can do anything sensible with Transit except throw it in to a hashmap/dict alongside runtime type information. This is natural for dynamic languages, but, without a schema, you're left to write you're still left to write your own error-prone structural validation, and can't fully leverage the efficiency of static languages. MsgPack has this problem as well.

Optimising the JSON representation just doesn't seem worth it to me. Avro for example encodes enums by exposing the type names as JSON keys. It sucks, and it's something I'd change, but at the end of the day you're building on a poor transport. Caching reminds me a lot of DNS label compression, or a poor mans preset zlib dictionary.


The great thing about Transit is that I can extend the data layer with new types (like JodaTime). Even without that, I really like Transit's builtin support for sets, keywords, symbols, lists and vectors (which really helps in a clojure environment).

Another great thing (again, for someone using functional languages) is that Transit can ensure identity for equal values.

When it comes to structural validation, that is something I use prismatic.schema for. Of course, it helps that I can share the validation code between backend and server (using Clojure and ClojureScript).

When using plain JSON, I had to convert types myself after decoding/encoding, which was error-prone and gave a surprising amount of bugs (especially regarding dates).

Transit solves a big problem for me, while still retaining the easy to read syntax which JSON has, and also has great tooling in browsers.

Also, caching isn't only about smaller size. It allows for faster parsing of the initial data, allowing Transit to be just as fast as plain JSON for certain payloads, even considering that it has to expand from cache and decode values.



I recently used this in a project where I simply wanted a typing guarantee that JSON can't provide (i.e., that a timestamp really is a timestamp when it arrives on the other side, not a string like in JSON). It's very easy to use, more or less just a drop-in middleware.


I can't help but wonder if it isn't simpler just to use gzipped JSON. I'd be interested to see a size comparison of the two. It seems like they're going to an awful lot of work to hand-roll a suboptimal text compression scheme here.


It's the whole FAST/FIX clusterfuck all over again. Use a binary format and stop cocking around. You can't serialise/deserialise it faster and JSON is a string based mess. No low latency path can absorb it. Even influxdb have removed JSON support because it was their slowest part of their critical path that could not be optimised. Complete lack of mechanical sympathy + laziness is why JSON is popular outside of the web world


>Complete lack of mechanical sympathy + laziness is why JSON is popular outside of the web world

I generally prefer JSON as a serialization format for a variety of reasons beyond what you cited:

1. Simplicity/plain text. Even a total newb could read a JSON file and probably puzzle out how the data works. This is important if I'm providing an export function or otherwise expect my users might want to shop somewhere else to deal with this data.

2. Forward compatibility/future proofing. If somebody finds my JSON file in twenty years with no documentation, (1) ensures that it's not going to be lost to them and they'll be able to extract the data with minimal difficulty.

3. Cross platform. It might not be the best format, but it's going to be supported in pretty much any language I would ever want to add support for the data to in a new system.


Well, I think you need to look at the intended use cases for transit. On the browser, transit serialisation is faster than binary precisely because it uses the fast paths. For server to server communication, it can use msgpack.

There's also a) it's fully schemaless and b) you can add new primitives. Which you may or may not need. But if you're looking for something that does a) and b) while being fast for browser communication, transit's currently your best option.


Easy there tiger ;)

What does it even mean that something is string based?

Here: https://github.com/eishay/jvm-serializers/wiki a JVM bench where JSON is pretty competitive with binary formats [and results would be event better if project owners actually merged PRs ;( ]


I don't think I would enjoy debugging a service that talks in a binary format.


I've never done it with JS. It's not really a problem in C, though. If there's a lot of data, you're no better off with text; if there's not much, reading binary data isn't too bad. You've usually got other problems on top of decoding the actual data - like, why is this data coming in the first place? Why isn't the code accepting it properly? Why isn't the other end listening? The data encoding is just the tip of the iceberg.

Either way, the advantages of binary often outweigh the difficulty of having a human read the data. As the saying goes, code is executed far more often than it's read.

(Most binary formats have some fairly obvious possible text representations, at least for key fields (they're just encoding ints, floats, strings, bitfields, etc.). When you're having difficulty, or you're simply curious, you can print them out. If the format is any good, malformed messages are easy to check for in code - this code is not harder to debug than any other. If anything it's actually easier than with some kind of text format.)


Sure, but it's not very convenient.

I regularly MITM a connection between a mobile app and a HTTP server, when something isn't going quite right. A look at the JSON they exchange exposes the problem in under a minute more often than not. If I want to test something out quickly, I simply modify the incoming / outgoing JSON by hand. It's rare to get the syntax wrong and involves no context switches. I can then go back to the code, find the relevant section and make the changes I need to make. I find this a very convenient way of debugging and I don't think it would be as nice to do this with a binary format.


That ship is at the edge of the harbor, sails raised, and helmed by HTTP2.0. Binary formats are becoming the default, not the exception.


True. Fortunately, we can be reasonably sure there will be multiple good tools to debug something talking over HTTP 2.0. With something like Transit, I'm not so sure.


debugging the data, no. debugging the code, depending on how the representation looks like, probably wouldn't be too bad...


From the announcement blog post:

Transit is self describing using tags, and encourages transmitting information using maps, which have named keys/fields which will repeat often in data. These overheads, typical of self-describing formats, are mitigated in Transit by an integrated cache code system, which replaces repetitive data with small codes. This yields not only a reduction in size, but also an increase in performance and memory utilization. Contrast this with gzipping, which, while it may reduce size on the wire, takes time and doesn't reduce the amount of text to be parsed or the number of objects generated in memory after parsing.

- http://blog.cognitect.com/blog/2014/7/22/transit


> The extension mechanism is open, allowing programs using Transit to add new elements specific to their needs. Users of data formats without such facilities must rely on either schemas, convention, or context to convey elements not included in the base set,

The extension mechanism is writing handlers in all languages communicated with, since its stated purpose is cross language value conveyance.

In contrast, a schema language allows extensions to be described once, in one language.

I was expecting this to be a sort of macros for data notation (an inline schema language), but it seems more like an extendible serialization library.


> In contrast, a schema language allows extensions to be described once, in one language.

No, it doesn't. A schema language allows that to be done for the syntax, but requires the semantics to be implemented for each language. Schema languages often include, as part of their specifications, core types which must be supported; when an application restricts itself to these core types and types whose only important semantics are derived from them (e.g., restricted subsets of core types in most cases), then the fact that every full implementation that supports the core types will already have this work done for every language means no additional work is necessary. But that's not a product of a schema language preventing types to be implemented for all host languages, that's a result of the fact that a predefined set of core types accompanying the schema language means that all implementations are required to have already done the work of implementing the core types for all languages.


Do you have an example of such a schema language?


Previous HN discussion from when it was announced last year: https://news.ycombinator.com/item?id=8069346


The main advantage it seems for Transit is that JSON is fast for many different clients. Otherwise I'm not sure I would ever use it for internal services considering there are so many other probably faster options (Protobuf,Avro,SBE ... etc).

Are folks using it for internal services?



So it's basically a set of libraries used to marshal/unmarshal objects without using a schema or can it also be used as an RPC library?


So the python implementation is 2.7 only...


Very interesting (just commenting to read it later)


You can just upvote it, then go to https://news.ycombinator.com/saved?id=latenightcoding

Or use bookmarks :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: