Hacker News new | past | comments | ask | show | jobs | submit login
Jolt – JSON to JSON transformation (bazaarvoice.github.io)
108 points by bencevans on Nov 23, 2016 | hide | past | favorite | 70 comments



I see this is a Java library, but if you're in the command line (or even if you are able to call an external process for the job) jq[1] is great.

[1]: https://stedolan.github.io/jq/manual/


I wanted to really use and love jq but I always struggle with the syntax and debugging is a huge pain. I feel that better documentation of complex examples and chaining would get me over the line.


I did XSLT years ago and had far less of a problem learning my way around stylesheet syntax, and that was that bad old XML, don't you know. I was looking into Jolt just a week or two ago and ended up having a really rough time. I think if there was a more deliberate tutorial, it might be easier to learn. But the Javadocs and the hairball of example transformations were something I found confusing.


I came here to say exactly the same thing. jq is a great tool, very fast as well... and easy to deploy (just an executable with no library dependencies).


I was hoping jolt was a command line tool in the style of jq. I dont believe jq does transformations of json, only queries.


> I was hoping jolt was a command line tool in the style of jq. I dont believe jq does transformations of json, only queries.

I guess that depends on one's definition of transformation but:

    echo '{"hello": "world"}' | jq '{brave:{"new": .hello}}'
it also doesn't even require JSON input, meaning one could use it strictly for composing valid output JSON:

    # where -n = null input
    jq -n --arg foo bar '{"meet me": ("at the "+$foo)}'
I use it for transforming AWS responses all the time. I put jq up there with vim as one of the "must have" tools. In fact, the keys `%!jq .` are burned into my fingers because I type them a lot

_ed: fixed a mobile character replacement, added in the --arg flag I couldn't remember at the time_


It does! And you can do wonders with it. It has functions that make it capable of doing almost anything.

Also, my project https://requesthub.xyz/ relies entirely on the transformation capabilities of jq. See real world examples of jq transformations in this context: https://gist.github.com/fiatjaf/1d57953aa285b9bf5b51712268cf...


You should rather look at App::RecordStream. Last time I compared them jq was a poor cousin of App::RecordStream.


The idea of supplying a spec to transform arbitrary data is interesting to me. I did one in JavaScript called Reshaper[1], and then hooked it up to a library wrapper called Smolder[2].

The result was a sort of system whereby data going into a function would be automatically 'reshaped'. It worked well as a proof-of-concept, but obviously was too fragile for most uses (though it's used in the automatic graphs in Kajero[3]). The difference here seems to be that the spec defines the actual transformations, rather than just the desired final structure.

[1] https://github.com/joelotter/reshaper [2] https://github.com/joelotter/smolder [3] https://github.com/joelotter/kajero


Are you familiar with Xml and xsl/xslt? They took this concept a long, long way.


And DSSSL before that. But I guess XML is just not hip enough anymore.

Actually, having done some work on XSLT these days again I was joking to a colleague at work that one day someone will surely think what a great idea it would be to create a programming language in JSON to transform JSON to other JSON. And lo and behold, a few days later I stumble over this.

I have to admit, XSLT looks far nicer to me, but that might be familiarity and a proper specification to read to understand things instead of just a bunch of examples.


This seems like something that would lend itself to not tying itself to any specific implementation, yet it seems to be entirely based on a particular Java implementation.

It would probably be more useful if it more explicitly tied itself to the Java implementation (i.e. stop pretending to be its own thing) or were more abstract to be worth implementing in other languages.

In the latter case it'd be helpful if the operations were part of the actual unified DSL instead of having a DSL for each transformation (with the implication that each transformation is applied individually?).

EDIT: But if you abstract this away from Java, why not just use JSON Patch: https://tools.ietf.org/html/rfc6902


The spec you linked to is dated Apr 2013. Jolt's first release was in Feb 2013.


sigh, can we please just stop re-implementing S-expressions and Lisp, and instead just use S-expressions and Lisp?

It's like I'm the only one using electric lighting while all the hipsters are upgrading their wax-dipped hemp brands to artisan whale oil …


I've used template libs to do this in the past, e.g. mustache.js, handlebars.js, Jinja2, etc. Then generating JSON as output instead of HTML. Usually quick to learn the template libraries DSL, and I find templates easier to read than transforms.


Some food for thought:

JSON format+JSON.parse() make you loose the graph structure of the data you have on your server and that you send to the client. Because it is basically a tree structure.

The Semantic Web defines a graph description langage called N3. If your server can serialize and send the data in such a format, and if you use the function N3.parse() on your client, you eventually retrieve, on the client, a graph of in-memory objects that corresponds to the data graph on your server. You can then traverse that graph in any direction you want.

So basically, with N3, you never lose the graph structure of your data.

And you do not need to restructure your JSON.


I have some stuff I am working on that is semantic web based that has a similar conceptual layering to Jolt. Note that JSON-LD provides a path to convert JSON to an RDF graph, both in the sense that people can publish a JSON-LD document that has the mapping to RDF embedded in, or that you can "paint on" a mapping after the fact to turn ordinary JSON you find on the street to RDF.


Interesting, I hadn't heard of N3. What clients support this? Punching N3.parse() into Chrome's console doesn't work.


Turtle is a lot more popular than N3:

https://www.w3.org/TeamSubmission/turtle/

Turtle is a subset of N3. N3 has some syntactic sugar plus also some syntax to describe first-order logic statements that include implication and the use of existential and universal qualifiers.

Tim-Berners Lee started work on this software:

http://infomesh.net/2001/cwm/

which implements reasoning on N3, but it is not particularly performant for a number of reasons -- one is that it does not implement any description logic optimizations, another is that it is written in Python, another is that it doesn't have a particularly sophisticated rules engine.

The closest thing to a standard in this area is ISO Common Logic

https://en.wikipedia.org/wiki/Common_Logic

which is based on KIF but uses RDF for the basic data structures, i.e. you can load RDF data and work on it with Common Logic.

There are plenty of people who use logical reasoning over RDF data but usually they use something specific to their tools, such as the Jena Rules Engine, or some kind of Prolog -- it also is not that hard to stuff RDF data into a SAT solver, theorem prover or similar tools.

It's a conspicuous absense that there is no common standard for production rules.


Fortunately, the N3.parse() I mentionned accepts any dialect of RDF (N3, turtle, ...)

Oh, and btw, the package is here: https://www.npmjs.com/package/n3


I have often wished that RDF and semantic web technologies had caught on. I think the overly complicated specifications and poor tools doomed, them, though.

I completely agree that trying to represent a graph structure of data properly into JSON can be inordinately difficult.


I paste a previous comment of mine, from an old HN thread: https://news.ycombinator.com/item?id=10947013 "RDF has no adoption. JSON is the winner. Let's try to understand why: Because JSON.parse() is instantly available on the client. On the contrary, RDF.parse() (or more precisely N3.parse()) is available on the client only if you use that obscure library I mentionned above. Noone knows the library, so noone uses the N3.parse(). Ok. End of discussion. Now my own feeling: RDF (and especially its N3 dialect) is the only description language I know of that serializes and deserializes graphs with no specific code, and has a good JS lib for client-side consumption. JSON (or XML, or any other tree descrption language) requires extra effort whenever you use it to serialize/deserialize/traverse a graph structure. That is why i think they suck. And that's why I prefer RDF/N3."


I think that there is more to it than just lacking a RDF.parse() function. The entire web of technologies surrounding RDF is just vastly more complicated and the formats themselves feel unnatural to people (in a way that JSON does not).


It is a big mistake to say that RDF/semantic web has "failed".

What has happened is that people have tried a lot of things and some have succeeded and some have failed. For instance, OWL seems to have become an inpediment to progress rather than a useful tool. On the other hand, people are starting to understand what JSON-LD is and they like it.

See https://developers.google.com/schemas/formats/json-ld

Also note that by simply specifying a namespace to put the terms in you can transform JSON into usable RDF and even put multiple documents in and see them as a comprehensive graph.


iterlib: https://github.com/facebookincubator/iterlib

is based on similar principles, but more concrete.

Store the graph in rocksdb, transform using the iterator library (only 2-3 transforms currently there, more coming) and then send the "json" over the wire without copy. We're looking at flatbuffers.


In .Net-land, I've switched to doing this type of thing in Linq. But if you like XSLT (I still do), you can do it in three lines of code. 1. Convert JSON to XML 2. Run XSLT 3. Convert XML to JSON


That approach is listed on the site under alternatives.


Interesting but with no mention of json patch in the alternatives. I'm not sure why this project would be better.


json patch is verbose and limited. It's designed to patch documents, not to transform them.


Related/shameless plug: We've been working on a reimagining of JSONPath [1] which we hope will be useful to other people.

Unfortunately, it's not public yet — though we have working libraries in JavaScript and Go (a heavily modified fork of Kubernetes' JSONPath code) we intend to release — mostly because we want to publish a proper spec. In fact, we're looking for a good name for it, as we feel that releasing "JSONPath 2.0" would be a little presumptious. I was thinking something like JSONMatch. If there's sufficient interest I could prioritize it (email me!).

Our own version of JSONPath is intended for both searching and patching documents in general. A simple search would be something like:

    friends[name == 'bob'].id
or

    shoppingItems[1:3][description, id]
We use this to declaratively extract data from documents, but also change it. The patching support lets you do things like:

    match(document, "..[role == 'owner'].role")
      .set('admin');
This will set "role". It gets rather magical and beautiful when you have multiple paths that involve unions, subindexed arrays and constraints.

We also have a separate patch notation, expressed in JSON, to declaratively transform documents. It uses these JSONPaths to select and set values. We might write a spec for that, too, although I'm not sure the utility outside our app is that great.

[1] http://goessner.net/articles/JsonPath/


The readme is more up-to-date than GitHub pages. It answers some of the questions in the comments. https://github.com/bazaarvoice/jolt


This is an interesting take on schemas in json. Is the big advantage that this provides additional data validation (on top of type validation?) Can there be custom transforms written in javascript?


Are we _really_ going to reinvent all the spectacularly bad ideas of XML, except this time in JSON?


I think you have make a distinction between "bad ideas of XML" and "essential ideas of CS that were implemented in XML because it was the fad at the time, and SGML before that, and are now being reimplemented in JSON, and will be reimplementing in the next big thing".

It's like complaining about regular expressions "Are we _really_ going to reinvent all the spectacularly bad ideas of Perl, except this time in Python?


The fact that we keep re-implementing ideas like this means that there are a lot of valid use-cases for this kind of tool. The same applies to (JSON|XML)RPC, xpath|jq, and schemas. There are even folks currently developing namespaces for JSON!

I think it's human nature, as often as I see it repeating throughout history (and not just in the tech industry). Just wait, there will be a new JSON just over the horizon which will close the circle and bring back a very simplistic serialization format with a minimal number of built-in structures.

Perhaps it would be better to create a C++ of serialization formats, all the features out of the gate, so you can pick and choose from the beginning. Again, I think human nature means we can't go back to XML (too much horror associated with SOAP and WSDL), or even continue with JSON (custom and subtly incompatible parsers being built around the core to handle missing features).


This cycle keeps re-occurring in multiple places every few years because the ideas are in fact useful, but the surrounding ecosystem falls out of favor.

Old solutions accumulate cruft and start to need more expertise (and sometimes, straight wisdom) to operate well. People complain, blog posts are written, inertia and inaction are challenged; babies are thrown out of the bathwater as clean-room rewrites commence with different stacks, different tradeoffs, and often, different bikeshedding.

The new, shiny tool is promoted on its merits; some of that excitement inspires hype, which takes on a life of its own. Through a combination of informed usage and less-informed codemashing, the tool accumulates users from all walks of life, who turn into willing or unwilling stakeholders in its future. Development pressures pull the product in different ways, some people leave, eventually only the really passionate or really locked-in remain. These people acquire domain knowledge, "sometimes straight wisdom", making the solution -- its merits, design rationale, tradeoffs, and lessons learned -- more opaque to newcomers. Those newcomers find that existing solutions don't appear to do what they need at the level of complexity that they can grok. The cycle repeats.


> a very simplistic serialization format with a minimal number of built-in structures.

It's already here: protobufs.

https://en.wikipedia.org/wiki/Protocol_Buffers


XSLT did have some nice bits - I still like XPath.

Edit: Just to be clear some of the most terrifying code I've ever seen (and possibly wrote) was in XSLT.


If you like XPath and JSON, too, you can use my just updated Xidel (http://www.videlibri.de/xidel.html) to run XPath queries on JSON.

It can do transformations, where you write the output structure, and then pick the values from the input JSON. E.g. to transform from {"a": [{"b": 123}, {"b": 456}]} to {"c": [123,456]}, you could write an XPath expression {"c": a/b }


Increase the terror further by adding Jsonx to the mix: http://www.ibm.com/support/knowledgecenter/SS9H2Y_7.5.0/com....


Isn't one of their examples invalid XML?

http://www.ibm.com/support/knowledgecenter/SS9H2Y_7.5.0/com....

Shouldn't

    <json:string name="ficoScore"> > 640</json:string>
be

    <json:string name="ficoScore"> &gt; 640</json:string>


It's one of the many peculiarities of XML: the "less-than" bracket is not allowed in text (because it could indicate the opening of a new XML tag), but the "greater-than" bracket does not need encoding when it's in text, because it's not ambiguous.


I'm pretty sure most XML handling libraries would use &gt; - never seen a raw > used in XML (though possible I might not have noticed).

Edit: Forgot to say thanks for pointing that out!


Just check what the W3C says about writing a raw >:

> For characters such as > where XML defines a built-in entity but does not require its use in all circumstances, it is implementation-dependent whether the character is escaped.

(https://www.w3.org/TR/xslt-xquery-serialization-31/#serphase...)


But but but it's FUNCTIONAL, and turing-complete!!!

// yes I understand the use cases of functional code


DSSSL [1], the SGML predecessor of XSLT and CSS, included a full blown built in Scheme interpreter, by the standard. That turned out to be a bit too much raw power and complexity for most people, so it never really caught on outside of the Boeing Airplane Maintenance Manual Publication Department and other big enterprisy SGML-loving organizations like that.

Microsoft's non-standard implementation of XSLT [2] is "out of the tarpit" Turing complete [3] because it lets you write handlers in JavaScript and other languages, to actually get some useful work done and call external librarys, without bending over backwards.

Of course Microsoft's non-standard extensions to XSLT aren't supported outside of Windows. But after using it, going back to plain XSLT was pretty frustrating to me, and in many cases it was easier just to forget about XSLT and write straight-up JavaScript code with XML parsing and XPath expressions.

There's nothing that XSLT can do that you can't easily implement in a JavaScript library. At this point it's better to start with JavaScript and forget about XSLT, instead trying to use XSLT, hitting a wall, realizing you want to ship on some platform other than Windows, then rewriting it all in Java, then realizing you want it to run in the browser, then rewriting it all in JavaScript, which you should have started with in the first place.

Another point for JavaScript: In the past decade, a whole hell of a lot more effort has been put into making JavaScript run fast, than making XSLT run fast.

[1] https://en.wikipedia.org/wiki/Document_Style_Semantics_and_S...

[2] https://msdn.microsoft.com/en-us/library/bb986124.aspx

[3] https://en.wikipedia.org/wiki/Turing_tarpit


Isn't standard XSLT Turing complete anyways?

Even XPath 3 is Turing complete


The point is that XSLT is deeply stuck in the "Turing Tarpet": "Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy." -Alan Perlis https://en.wikipedia.org/wiki/Turing_tarpit

You would never want to write a JPEG decoder in XSLT, but the fact that it might be possible if you were willing to bend over backwards, write unmaintainable code that runs incredibly slowly and requires colossal amounts of memory, misses the point.

For getting real work done in the real world, where big companies have mountains of data to process, and are required to pay real money for electrical power and computing equipment, Microsoft decided to add the ability to call JavaScript, Visual Basic, and other scripts from XSLT templates.

Because it's just too hard to get anything of interest done in XSLT, and it's trivial and convenient in JavaScript, and there's probably already a library to do it.


"Because it's just too hard to get anything of interest done in XSLT"

I thought the point of XSLT wasn't really arbitrary processing but to transform XML into other formats (mostly, but not always, other forms of XML such as XSL-FO) - and it was reasonably good at that.


But who defines the line between "transform" and "arbitrary processing"? One language's transformation is another language's arbitrary processing. If XSLT doesn't know how to format dates for Eskimos, and I have an off-the-shelf JavaScript or .NET date formatting library that does, how does that make my requirements cross the line between "transformation" and "arbitrary processing"? It's a failure of XSLT to not offer Eskimo dates as a transformation (or a way to call a library that does that), not of me for requiring an Eskimo date transformation.

DSSSL was extensible in Scheme, so all transforms had access to arbitrary processing, by the standard.

XSLT was not extensible in any language, so Microsoft went ahead and made a non-standard extension that integrated JavaScript, Visual Basic, C#, and all other IScriptingEngine/.NET compatible languages, much in the same way standard HTML web browsers like Internet Explorer integrate scripting languages.

"The languages that you can use differ between the MSXML and .NET implementations. In MSXML (from version 2.0 on), you can use either VBScript or JavaScript. In .NET, however, you can use any .NET-supported language including C#, Visual Basic, JScript.NET, and even JScript, as before. The ability to use strongly typed languages like C# and Visual Basic .NET makes this an even more attractive opportunity." https://msdn.microsoft.com/en-us/library/bb986124.aspx

Lots of people got lots of work done with Microsoft's non-standard scriptable XSLT, because they believed it was just too hard to get anything of interest done in standard non-scriptable XSLT, but worth paying the price of being locked into Microsoft's ecosystem.

I think it's better to forget about XSLT entirely, because it's a long lost cause, weak and ineffective at what it was designed, and use JavaScript and libraries like XPath to transform XML (and that goes squared for JSON), to reap the advantages of the latest JavaScript engines and libraries.

If Jolt is meant to be a language-independent standard, there should be a definitive reference implementation in JavaScript.

(in reply to arethuza's reply:)

Yes, I'd rather have all the great things you can do in domain specific languages like XSLT (like XPath, CSS, RegEx, etc) packaged up into nice libraries I can call from JavaScript, instead of being big complex hybrid frameworks that call my little snippets of JavaScript, like Microsoft's extended XSLT.


Och yes - I was definitely not recommending preserving XSLT (or indeed a JSON based equivalent). I always found that special purpose languages like XSLT were quite good but that for most tasks general purpose programming languages were far better so - the times that recursive pattern matched templates in XSLT made sense were few and far between.


But it is not deeply in the tar-pit.

Here is an raytracer written in XQuery: https://dev.w3.org/cvsweb/2011/QT3-test-suite/app/Demos/rayt... (imported modules are in the directory above) looks quite maintainable to me.

XQuery is pretty much XPath-with-functions, and XPath 3 has anonymous functions, so you can directly translate it to XPath 3 by replacing `declare function foo(..` with `let $foo := function(...`

Once it is XPath, you can put everything in a xsl:value-of tag.


I'd prefer to be on a mountain top far away from any tarpit, than only partially submerged in a tarpit.

The mountain tops are modern JavaScript engines. How much work have Mozilla, Google, Apple, IBM and many other big organizations and excellent engineers poured into making XSLT run fast and efficiently in the last decade, compared to how much has been applied to JavaScript? Can you cite any interesting SIGPLAN articles about XSLT optimization? Have any commercial games shipped using a ray tracer implemented in XSLT?


As if we're not already doing that. JSON configs are quite popular (e.g. package.json), and it's not really a format made for human consumption.

We all cried about the abuse of XML and its inherent foibles, but I seriously doubt that replacing that with JSON and Markdown was the best choice.


Seems so, just like how we are turning JavaScript into Java.


But without a standard library


XPath is fantastic, although not everything is relevant to non-XML docs.

But in general, JSON, XML, S-expressions, yaml, HTML, and many data interchange programs are all 95% the same hierarchical data structure.

I wanted to make a cross-dataformat XPath subset that would work on all of them, and even allow a format that mixed the formats as needed in a MIME-esque document format, but the parser... oh god the parser.

This guy's project was like the initial steps of mine: use whatever XML tools exist, but that kills streaming processing and doesn't pass cursory smell tests...

I like this though, it is food for thought.


> But in general, JSON, XML, S-expressions, yaml, HTML, and many data interchange programs are all 95% the same hierarchical data structure.

S-expressions can also represent graphs. I have a vague recollection that YAML can as well.

An S-expression example:

    (nodes #1=(a #2=(b #1# #3=(c #2# #3#)) #3#) #2# #3#)


Firefox's Spidermonkey used to support a similar notation for JS objects. I can't remember when it was removed, but it was pretty obscure:

http://blog.notdot.net/2006/9/Serializing-JavaScript-objects...

http://philogb.github.io/blog/2009/03/24/sharp-variables/

https://developer.mozilla.org/en-US/docs/Archive/Web/Sharp_v...


Worth noting that you can both read and print such forms.


> Worth noting that you can both read and print such forms.

That's actually how I checked that I had the syntax right: I wrote:

    (let ((*print-circle* t) (*print-case* :downcase)
      (print '(nodes #1=(a #2=(b #1# #3=(c #2# #3#)) #3#) #2# #3#))
      nil)
Thus proving that reading & printing work well.

Seriously, Lisp is an awesome language.


Seems enormously complicated. Is this "cool" in Java-land?


Couldn't agree more. If you really think about it, Javascript is the transform language for JSON.


Yup. There is an insane difference between how straightforward it is to transform data structures in Javascript and how cumbersome it is in Java.


This is much easier to do with Groovy.


Let's be real though.

This type of thing is targeted towards enterprise and the usual large and well-established environments.

Environments where replacing a 20+ year old app written in COBOL isn't a question.

No one is ever going to say "Let's write this new app in Groovy!". On top of management saying "WTF is Groovy?", it'll never get approved because it's a huge amount of technical risk to take on.

In big enterprise, you want to have your stuff written in the most supportable language you can find resources for. This is why Java is so popular - it's a lot easier to find a Java developer than it is someone who knows Groovy, Rust, Scala, etc.


> it's a lot easier to find a Java developer than it is someone who knows Groovy, Rust, Scala, etc.

By mentioning Apache Groovy in the same breath as Cobol, Java, Rust, and Scala, you're comparing apples and oranges. Groovy is to the JVM what bash is to Linux. Groovy is a dynamically-typed language, best used for writing scripts used for glue code, testing, build control, and what not, whereas those others are statically compiled, used for creating actual systems. Although static compilation was added later, virtually no-one uses it.


Or JavaScript. Json is just JS literal.


The website stutters here with safari when scrolling, making reading the site a bad experience.


Well wy not go all the way there? It's just a matter of time. Here's the manual.

http://www.wrox.com/WileyCDA/WroxTitle/XSLT-2-0-and-XPath-2-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: