I wanted to really use and love jq but I always struggle with the syntax and debugging is a huge pain. I feel that better documentation of complex examples and chaining would get me over the line.
I did XSLT years ago and had far less of a problem learning my way around stylesheet syntax, and that was that bad old XML, don't you know. I was looking into Jolt just a week or two ago and ended up having a really rough time. I think if there was a more deliberate tutorial, it might be easier to learn. But the Javadocs and the hairball of example transformations were something I found confusing.
I came here to say exactly the same thing.
jq is a great tool, very fast as well... and easy to deploy (just an executable with no library dependencies).
it also doesn't even require JSON input, meaning one could use it strictly for composing valid output JSON:
# where -n = null input
jq -n --arg foo bar '{"meet me": ("at the "+$foo)}'
I use it for transforming AWS responses all the time. I put jq up there with vim as one of the "must have" tools. In fact, the keys `%!jq .` are burned into my fingers because I type them a lot
_ed: fixed a mobile character replacement, added in the --arg flag I couldn't remember at the time_
The idea of supplying a spec to transform arbitrary data is interesting to me. I did one in JavaScript called Reshaper[1], and then hooked it up to a library wrapper called Smolder[2].
The result was a sort of system whereby data going into a function would be automatically 'reshaped'. It worked well as a proof-of-concept, but obviously was too fragile for most uses (though it's used in the automatic graphs in Kajero[3]).
The difference here seems to be that the spec defines the actual transformations, rather than just the desired final structure.
And DSSSL before that. But I guess XML is just not hip enough anymore.
Actually, having done some work on XSLT these days again I was joking to a colleague at work that one day someone will surely think what a great idea it would be to create a programming language in JSON to transform JSON to other JSON. And lo and behold, a few days later I stumble over this.
I have to admit, XSLT looks far nicer to me, but that might be familiarity and a proper specification to read to understand things instead of just a bunch of examples.
This seems like something that would lend itself to not tying itself to any specific implementation, yet it seems to be entirely based on a particular Java implementation.
It would probably be more useful if it more explicitly tied itself to the Java implementation (i.e. stop pretending to be its own thing) or were more abstract to be worth implementing in other languages.
In the latter case it'd be helpful if the operations were part of the actual unified DSL instead of having a DSL for each transformation (with the implication that each transformation is applied individually?).
I've used template libs to do this in the past, e.g. mustache.js, handlebars.js, Jinja2, etc. Then generating JSON as output instead of HTML. Usually quick to learn the template libraries DSL, and I find templates easier to read than transforms.
JSON format+JSON.parse() make you loose the graph structure of the data you have on your server and that you send to the client.
Because it is basically a tree structure.
The Semantic Web defines a graph description langage called N3.
If your server can serialize and send the data in such a format, and if you use the function N3.parse() on your client, you eventually retrieve, on the client, a graph of in-memory objects that corresponds to the data graph on your server. You can then traverse that graph in any direction you want.
So basically, with N3, you never lose the graph structure of your data.
I have some stuff I am working on that is semantic web based that has a similar conceptual layering to Jolt. Note that JSON-LD provides a path to convert JSON to an RDF graph, both in the sense that people can publish a JSON-LD document that has the mapping to RDF embedded in, or that you can "paint on" a mapping after the fact to turn ordinary JSON you find on the street to RDF.
Turtle is a subset of N3. N3 has some syntactic sugar plus also some syntax to describe first-order logic statements that include implication and the use of existential and universal qualifiers.
which implements reasoning on N3, but it is not particularly performant for a number of reasons -- one is that it does not implement any description logic optimizations, another is that it is written in Python, another is that it doesn't have a particularly sophisticated rules engine.
The closest thing to a standard in this area is ISO Common Logic
which is based on KIF but uses RDF for the basic data structures, i.e. you can load RDF data and work on it with Common Logic.
There are plenty of people who use logical reasoning over RDF data but usually they use something specific to their tools, such as the Jena Rules Engine, or some kind of Prolog -- it also is not that hard to stuff RDF data into a SAT solver, theorem prover or similar tools.
It's a conspicuous absense that there is no common standard for production rules.
I have often wished that RDF and semantic web technologies had caught on. I think the overly complicated specifications and poor tools doomed, them, though.
I completely agree that trying to represent a graph structure of data properly into JSON can be inordinately difficult.
I paste a previous comment of mine, from an old HN thread:
https://news.ycombinator.com/item?id=10947013
"RDF has no adoption. JSON is the winner. Let's try to understand why: Because JSON.parse() is instantly available on the client. On the contrary, RDF.parse() (or more precisely N3.parse()) is available on the client only if you use that obscure library I mentionned above. Noone knows the library, so noone uses the N3.parse(). Ok. End of discussion.
Now my own feeling: RDF (and especially its N3 dialect) is the only description language I know of that serializes and deserializes graphs with no specific code, and has a good JS lib for client-side consumption.
JSON (or XML, or any other tree descrption language) requires extra effort whenever you use it to serialize/deserialize/traverse a graph structure. That is why i think they suck. And that's why I prefer RDF/N3."
I think that there is more to it than just lacking a RDF.parse() function. The entire web of technologies surrounding RDF is just vastly more complicated and the formats themselves feel unnatural to people (in a way that JSON does not).
It is a big mistake to say that RDF/semantic web has "failed".
What has happened is that people have tried a lot of things and some have succeeded and some have failed. For instance, OWL seems to have become an inpediment to progress rather than a useful tool. On the other hand, people are starting to understand what JSON-LD is and they like it.
Also note that by simply specifying a namespace to put the terms in you can transform JSON into usable RDF and even put multiple documents in and see them as a comprehensive graph.
is based on similar principles, but more concrete.
Store the graph in rocksdb, transform using the iterator library (only 2-3 transforms currently there, more coming) and then send the "json" over the wire without copy. We're looking at flatbuffers.
In .Net-land, I've switched to doing this type of thing in Linq. But if you like XSLT (I still do), you can do it in three lines of code.
1. Convert JSON to XML
2. Run XSLT
3. Convert XML to JSON
Related/shameless plug: We've been working on a reimagining of JSONPath [1] which we hope will be useful to other people.
Unfortunately, it's not public yet — though we have working libraries in JavaScript and Go (a heavily modified fork of Kubernetes' JSONPath code) we intend to release — mostly because we want to publish a proper spec. In fact, we're looking for a good name for it, as we feel that releasing "JSONPath 2.0" would be a little presumptious. I was thinking something like JSONMatch. If there's sufficient interest I could prioritize it (email me!).
Our own version of JSONPath is intended for both searching and patching documents in general. A simple search would be something like:
friends[name == 'bob'].id
or
shoppingItems[1:3][description, id]
We use this to declaratively extract data from documents, but also change it. The patching support lets you do things like:
This will set "role". It gets rather magical and beautiful when you have multiple paths that involve unions, subindexed arrays and constraints.
We also have a separate patch notation, expressed in JSON, to declaratively transform documents. It uses these JSONPaths to select and set values. We might write a spec for that, too, although I'm not sure the utility outside our app is that great.
This is an interesting take on schemas in json. Is the big advantage that this provides additional data validation (on top of type validation?) Can there be custom transforms written in javascript?
I think you have make a distinction between "bad ideas of XML" and "essential ideas of CS that were implemented in XML because it was the fad at the time, and SGML before that, and are now being reimplemented in JSON, and will be reimplementing in the next big thing".
It's like complaining about regular expressions "Are we _really_ going to reinvent all the spectacularly bad ideas of Perl, except this time in Python?
The fact that we keep re-implementing ideas like this means that there are a lot of valid use-cases for this kind of tool. The same applies to (JSON|XML)RPC, xpath|jq, and schemas. There are even folks currently developing namespaces for JSON!
I think it's human nature, as often as I see it repeating throughout history (and not just in the tech industry). Just wait, there will be a new JSON just over the horizon which will close the circle and bring back a very simplistic serialization format with a minimal number of built-in structures.
Perhaps it would be better to create a C++ of serialization formats, all the features out of the gate, so you can pick and choose from the beginning. Again, I think human nature means we can't go back to XML (too much horror associated with SOAP and WSDL), or even continue with JSON (custom and subtly incompatible parsers being built around the core to handle missing features).
This cycle keeps re-occurring in multiple places every few years because the ideas are in fact useful, but the surrounding ecosystem falls out of favor.
Old solutions accumulate cruft and start to need more expertise (and sometimes, straight wisdom) to operate well. People complain, blog posts are written, inertia and inaction are challenged; babies are thrown out of the bathwater as clean-room rewrites commence with different stacks, different tradeoffs, and often, different bikeshedding.
The new, shiny tool is promoted on its merits; some of that excitement inspires hype, which takes on a life of its own. Through a combination of informed usage and less-informed codemashing, the tool accumulates users from all walks of life, who turn into willing or unwilling stakeholders in its future. Development pressures pull the product in different ways, some people leave, eventually only the really passionate or really locked-in remain. These people acquire domain knowledge, "sometimes straight wisdom", making the solution -- its merits, design rationale, tradeoffs, and lessons learned -- more opaque to newcomers. Those newcomers find that existing solutions don't appear to do what they need at the level of complexity that they can grok. The cycle repeats.
It can do transformations, where you write the output structure, and then pick the values from the input JSON. E.g. to transform from {"a": [{"b": 123}, {"b": 456}]} to {"c": [123,456]}, you could write an XPath expression {"c": a/b }
It's one of the many peculiarities of XML: the "less-than" bracket is not allowed in text (because it could indicate the opening of a new XML tag), but the "greater-than" bracket does not need encoding when it's in text, because it's not ambiguous.
Just check what the W3C says about writing a raw >:
> For characters such as > where XML defines a built-in entity but does not require its use in all circumstances, it is implementation-dependent whether the character is escaped.
DSSSL [1], the SGML predecessor of XSLT and CSS, included a full blown built in Scheme interpreter, by the standard. That turned out to be a bit too much raw power and complexity for most people, so it never really caught on outside of the Boeing Airplane Maintenance Manual Publication Department and other big enterprisy SGML-loving organizations like that.
Microsoft's non-standard implementation of XSLT [2] is "out of the tarpit" Turing complete [3] because it lets you write handlers in JavaScript and other languages, to actually get some useful work done and call external librarys, without bending over backwards.
Of course Microsoft's non-standard extensions to XSLT aren't supported outside of Windows. But after using it, going back to plain XSLT was pretty frustrating to me, and in many cases it was easier just to forget about XSLT and write straight-up JavaScript code with XML parsing and XPath expressions.
There's nothing that XSLT can do that you can't easily implement in a JavaScript library. At this point it's better to start with JavaScript and forget about XSLT, instead trying to use XSLT, hitting a wall, realizing you want to ship on some platform other than Windows, then rewriting it all in Java, then realizing you want it to run in the browser, then rewriting it all in JavaScript, which you should have started with in the first place.
Another point for JavaScript: In the past decade, a whole hell of a lot more effort has been put into making JavaScript run fast, than making XSLT run fast.
The point is that XSLT is deeply stuck in the "Turing Tarpet": "Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy." -Alan Perlis https://en.wikipedia.org/wiki/Turing_tarpit
You would never want to write a JPEG decoder in XSLT, but the fact that it might be possible if you were willing to bend over backwards, write unmaintainable code that runs incredibly slowly and requires colossal amounts of memory, misses the point.
For getting real work done in the real world, where big companies have mountains of data to process, and are required to pay real money for electrical power and computing equipment, Microsoft decided to add the ability to call JavaScript, Visual Basic, and other scripts from XSLT templates.
Because it's just too hard to get anything of interest done in XSLT, and it's trivial and convenient in JavaScript, and there's probably already a library to do it.
"Because it's just too hard to get anything of interest done in XSLT"
I thought the point of XSLT wasn't really arbitrary processing but to transform XML into other formats (mostly, but not always, other forms of XML such as XSL-FO) - and it was reasonably good at that.
But who defines the line between "transform" and "arbitrary processing"? One language's transformation is another language's arbitrary processing. If XSLT doesn't know how to format dates for Eskimos, and I have an off-the-shelf JavaScript or .NET date formatting library that does, how does that make my requirements cross the line between "transformation" and "arbitrary processing"? It's a failure of XSLT to not offer Eskimo dates as a transformation (or a way to call a library that does that), not of me for requiring an Eskimo date transformation.
DSSSL was extensible in Scheme, so all transforms had access to arbitrary processing, by the standard.
XSLT was not extensible in any language, so Microsoft went ahead and made a non-standard extension that integrated JavaScript, Visual Basic, C#, and all other IScriptingEngine/.NET compatible languages, much in the same way standard HTML web browsers like Internet Explorer integrate scripting languages.
"The languages that you can use differ between the MSXML and .NET implementations. In MSXML (from version 2.0 on), you can use either VBScript or JavaScript. In .NET, however, you can use any .NET-supported language including C#, Visual Basic, JScript.NET, and even JScript, as before. The ability to use strongly typed languages like C# and Visual Basic .NET makes this an even more attractive opportunity." https://msdn.microsoft.com/en-us/library/bb986124.aspx
Lots of people got lots of work done with Microsoft's non-standard scriptable XSLT, because they believed it was just too hard to get anything of interest done in standard non-scriptable XSLT, but worth paying the price of being locked into Microsoft's ecosystem.
I think it's better to forget about XSLT entirely, because it's a long lost cause, weak and ineffective at what it was designed, and use JavaScript and libraries like XPath to transform XML (and that goes squared for JSON), to reap the advantages of the latest JavaScript engines and libraries.
If Jolt is meant to be a language-independent standard, there should be a definitive reference implementation in JavaScript.
(in reply to arethuza's reply:)
Yes, I'd rather have all the great things you can do in domain specific languages like XSLT (like XPath, CSS, RegEx, etc) packaged up into nice libraries I can call from JavaScript, instead of being big complex hybrid frameworks that call my little snippets of JavaScript, like Microsoft's extended XSLT.
Och yes - I was definitely not recommending preserving XSLT (or indeed a JSON based equivalent). I always found that special purpose languages like XSLT were quite good but that for most tasks general purpose programming languages were far better so - the times that recursive pattern matched templates in XSLT made sense were few and far between.
XQuery is pretty much XPath-with-functions, and XPath 3 has anonymous functions, so you can directly translate it to XPath 3 by replacing `declare function foo(..` with `let $foo := function(...`
Once it is XPath, you can put everything in a xsl:value-of tag.
I'd prefer to be on a mountain top far away from any tarpit, than only partially submerged in a tarpit.
The mountain tops are modern JavaScript engines. How much work have Mozilla, Google, Apple, IBM and many other big organizations and excellent engineers poured into making XSLT run fast and efficiently in the last decade, compared to how much has been applied to JavaScript? Can you cite any interesting SIGPLAN articles about XSLT optimization? Have any commercial games shipped using a ray tracer implemented in XSLT?
XPath is fantastic, although not everything is relevant to non-XML docs.
But in general, JSON, XML, S-expressions, yaml, HTML, and many data interchange programs are all 95% the same hierarchical data structure.
I wanted to make a cross-dataformat XPath subset that would work on all of them, and even allow a format that mixed the formats as needed in a MIME-esque document format, but the parser... oh god the parser.
This guy's project was like the initial steps of mine: use whatever XML tools exist, but that kills streaming processing and doesn't pass cursory smell tests...
This type of thing is targeted towards enterprise and the usual large and well-established environments.
Environments where replacing a 20+ year old app written in COBOL isn't a question.
No one is ever going to say "Let's write this new app in Groovy!". On top of management saying "WTF is Groovy?", it'll never get approved because it's a huge amount of technical risk to take on.
In big enterprise, you want to have your stuff written in the most supportable language you can find resources for. This is why Java is so popular - it's a lot easier to find a Java developer than it is someone who knows Groovy, Rust, Scala, etc.
> it's a lot easier to find a Java developer than it is someone who knows Groovy, Rust, Scala, etc.
By mentioning Apache Groovy in the same breath as Cobol, Java, Rust, and Scala, you're comparing apples and oranges. Groovy is to the JVM what bash is to Linux. Groovy is a dynamically-typed language, best used for writing scripts used for glue code, testing, build control, and what not, whereas those others are statically compiled, used for creating actual systems. Although static compilation was added later, virtually no-one uses it.
[1]: https://stedolan.github.io/jq/manual/