The original implementations go back to SYSTAT and SPSS GPL (Graphics Production Language).
GPL especially, with its statement-based approach, has arguably better ergonomics for interactively and iteratively producing plots compared to function-based approaches.
Rather expensive by my standards these days. When I was a starving student, I would not have hesitated to buy it from that corner in the back of Cody's Books.
Springer runs occasional sales up to 40% discount about once a year, but a don't recall if "The Grammar of Graphics" was eligible last time.
Leland Wilkinson (GoG inventor) and I designed it together a couple of years back.
The function for creating marks (a layer) tries to be as "flat" as possible, in the sense that it should be possible to render most common kinds of plots without having to pass nested/hierarchical options: https://wave.h2o.ai/docs/api/ui#mark
I’ve been building data visualizations for web for almost ten years. Most of the time it was some kind of dashboard with custom charts, interactivity and of course brand look.
Grammar of graphics always was this North Star for me. It is very helpful to go through papers and books and search for inspiration how to organize your system. But direct implementations are finicky to work with. And in my hubris I attempted to write yet another one implementation of grammar of graphics and it resulted in exactly the same problems!
With complex marks it is ambiguous what is a data point and what is a series. Tuning looks require this configuration objects scattered around chart definition and composition sometimes require to inject something in two different parts of definition.
Now I treat grammar of graphics as this collection of patterns and good practices. But surrender to pragmatic solutions when necessary.
Anyway I think I owe big part of my career to works of Wickham and Wilkinson.
> With complex marks it is ambiguous what is a data point and what is a series.
I sort of agree with you. I've implemented the Grammar of Graphics from the ground up four times professionally (!), twice in collaboration with Leland, all in different products.
The main reason why it might be finicky to work with directly is that the point vs. series vs. series-of-series distinction can run arbitrarily deep, so there's some mental gymnastics involved on the part of the library's user how to refactor the data and present it correctly so that the library can do its thing.
Tableau, which is also a GoG system, sort of deals with this by having slots for "Dimensions", "Pages", "Color", etc. as proxies for multi-level aggregation ("group/slice/dice" in BI terms). So even though it's not immediately apparent to new users how to present data correctly to the rendering system to get the kind of vis they want, at least it's pretty low-friction UX to shuffle variables between those slots till satisfied.
With programmatic use, that shuffling-around gets cumbersome because now you have write code to munge data into submission.
Tableau introduced the "Show Me" feature precisely for this reason - most new users would rather get stuff done quickly than figure out how the GoG can best solve their vis problem.
I'm really curious how Observable will fare in the VC bear market. They seem to be a fantastic team of conceptual thinkers but I'm not sure what the cash flow looks like for this kind of tooling
I’ve never wanted to use a product more, yet not been able to use it.
Javascript data products fall into a weird gap of having the best visualization tooling and the worst data manipulation tooling. I know there are efforts like arquero and DuckDB which make data more accessible, but there’s no really strong scipy/numpy/statsmodels/scikit-learn equivalent.
I am ever increasingly a fan or just manually generated SVG. Creating DSL abstraction layer on top is fine, I guess. But ultimately I know what I want and it’s easier to just make it directly instead of fighting a tool to try and induce it to make the thing I want.
That said, I’m weird. My blog is artisinally hand crafted HTML, JS, and CSS for the same reason.
This is definitely possible, and gives you a high degree of control over the visual design of visualizations.
I built an experimental system to do just that - design the layout and all data visualizations in a single Sketch/Figma document, export to SVG, then map data to the SVG elements in the browser. It's all 100% declarative.
Kind of. Except with SVG data viz you likely don’t need a layout engine. It doesn’t need to be “responsive” and render differently on different devices.
There are a few primitives that d3 uses, and once you implement those you can produce easy SVG results. `Scale` is the most important, for mapping your x,y plane into SVG pixel coordinates. Then some of the ticks helpers can be handy too.
Writing the SVG directly is just the fastest way to get what you need.
The thing I hate about all of these GoG approaches is the wastefulness of the translation of data + style into visual representations. For example, if you have a dashboard with 8+ charts visible, the scaffolding of the charting library starts to weigh down the system in both performance and memory usage. VegaLite, especially, seems to make a copy of the data being passed in. Looking at the examples of ObservablePlot, I can see more wasteful processing in the form of dataset.map(d => d.property) sprinkled in several places.
This applies to any charting library that forces you to provide both spec and unaggregated data to memory/cpu constrained clients (e.g. Javascript in the browser). This is done for implementation-simplicity (Vega, for example), but obviously doesn't scale to larger datasets.
I've implemented a system where the data part of the spec is munged in-database, and aggregated data is provided to the browser, along with hints for axes, scales, legends, etc. It requires a part of the GoG interpreter to be resident on the server-side.
That sounds very similar to VizML (the visualization/data processing library underlying Tableau). That has been my big complaint about most visualization libraries - there is no sharing of the underlying data set for multiple projections across the same large data set. Grid/table libraries have the same issues
Yes. Tableau would have to separate rendering from data select/filter/aggregation, especially because integrating with customer databases live is a key use case. Hence the built-in buffet of connectors/drivers.
It looks like with later versions they switched to kind of a hybrid approach (part-remote, part-local) with Hyper to reduce latency for interactivity.
> there is no sharing of the underlying data set for multiple projections across the same large data set
But that would require some kind of open standard for portability, no?
>But that would require some kind of open standard for portability, no?
I like the approach AGGrid uses - they provide a viewport based interface that the grid uses to display data, and you can implement that interface on top of your data model - https://www.ag-grid.com/javascript-data-grid/viewport/. Unfortunately it's only available in their enterprise version, but this approach scales to both grid and chart based UIs. D3 has a bit of that flavor as well, since you can map visual attributes into your underlying data any way you'd like.
I didn't know GoG existed when it came to writing up a couple of tutorials[1][2] on how to go about building a (very, very simple) charting tool[3] on top of my canvas library. I'm going to have to re-assess those lessons, and add some links to other guides, now that I know about them.
Luckily for me, the main purpose of the lessons was not so much about how to build a charting tool, but rather concentrated on how to break the code into modules in the hope that some of the modules could be reused in other, similar projects.
If I'm making obvious mistakes in the approach, or code, that I set out in the lessons then feedback is always welcome so corrections/improvements can be made to them!
As the other guy mentioned this has nothing to do with GoG. A good data language or library should provide the user (and plotting libraries) copyless, cheap and immutable slices of the data being handled. Javascript just doesn't really have one. It shouldn't be the concern of the plotting library however.
Start with the platform one is already using for data munging/transformation. I think R with its ggplot2 library is very good. Python's Matplotlib is also not bad. ObservableHq is also good when the data is closer to visual representation. Overall, I find the data transformation technique does the 80% of the work when it comes to data visualization.
You can also generate Vega(lite) JSON: https://vega.github.io/vega-lite/
And then pick your favourite language/library that generates that : https://vega.github.io/vega-lite/ecosystem.html
If needed you can switch to a different library/language while keeping the end result the same (or use different libraries/languages for different parts of your visualization, depending on which is best at a particular task)
https://byrneslab.net/classes/biol607/readings/wickham_layer...
Wickham is the Chief Scientist at RStudio and created R packages such as ggplot2 and the tidyverse.