Hi Hacker News! One of the Vega-Lite authors here. I'm excited to see you all here checking out our declarative visualization system.
If you want to learn more about the academic origins, check out the paper: https://idl.cs.washington.edu/papers/vega-lite. Vega-Lite is also available as the default plotting library in JupyterLab.
I'd like to second the thanks; Vega-Lite is awesome.
I've two pieces of feedback.
The first is I started by using Vega-Lite-API thinking it would be easier to understand. As I tried to do what I needed I found most examples in the manual are JSON only, and the JS API docs weren't helpful.
It took a while to work out how to make a log-scale axis, or how to customise the tooltip content (I could see in the JSON how to do it easily) and I gave up trying to work out how to provide a custom colour scale via JS - or to change to another pre-defined one.
The second is around toggling on/off the data in the chart. I had ~20 series on one scatter plot, and being able to highlight other members of a series, or turn series on/off would make it easier to explore the data.
I found a JSON example of doing this; but not one with the JS-API. It was also more complicated than I hoped, wanting to use Vega-Lite as a better Excel to explore this data, and not to write my own chart-app if that makes sense (different purpose).
I’d just like to say thanks for working on vega-lite. I haven’t used it much, but it will change the world to a slightly better place.
Dear sw devs who are doing visualization: please do have a look at vega-lite, even if you can’t use it for $PROJECT for $REASON, you can learn so much about visualization by reading the docs and playing with the online editor. The separation of concerns was an eye opener for me.
JupyterLab outputs have a mime type. Depending on the mimetype, a different renderer is used. For example, text is shown as plain text and an image is binary with the image type (e.g. png). There is also a mimetype for Vega and Vega-Lite JSON.
What this means for you as a user is that you don't need to install the renderer as it comes with the JL frontend.
Vega-Lite is client side so it's not running in the kernel.
We use it for our web based visualizations or genetic data.
We love it’s ability to save these a svg/png out of the box.
Vega lite has a lot of sane defaults which is great but sometimes it takes a little bit to get what you want. The examples are good but sometime simple. The docs are comprehensive though.
I have just completed a number of vegalite graphics for a University assignment focused on creating editorial data visualisations (due in a few days! https://crcorbett.github.io/FIT3179/)
As someone quite fresh to data analysis, I found Vega-Lite fun to work with. I have some experience with ‘Grammar of Graphics’ approaches, having taken a few classes focussed on R and Tidyverse.
The docs are comprehensive, but I found the example charts often too basic for the ideas I was trying to implement. This might be more reflective of my greenness in the area than any deficiency in the library.
I understand it’s relatively new to the scene so it’s not expected to have full functionality just yet. This assignment made me appreciate how simple ggplot() is, in particular when it comes to faceting graphics.
I personally don't like packages that combine aggregation or other data processing with display. There is clearly a community that likes this paradigm (plotly is similar) but I favor doing the analysis and getting the view I want as a data structure, and then plotting as is.
I would love to hear the advantages of combining aggregation and plotting the way Vega does.
Vega-Lite author here. The reason why we integrated the two is that we wanted to provide support for interactions with a fully reactive runtime system (https://idl.cs.washington.edu/papers/reactive-vega-architect...). It's difficult to do that while being agnostic to different implementations of these transformations.
I think the Vega team would agree with you since it seems to me like they are doing exactly that with Arquero (https://github.com/uwdata/arquero).
From my experience it's the right architecture.
You cannot have a great data visualization library without a great data transformation library, but the data transformation functions should be at lower level; not provided by the visualization layer.
Vega author here. We actually disagree with the last part.
For visual analysis tools (not just UI charting library), having the ability to quickly summarize data is very useful for analyses (esp exploratory ones).
We are not alone with this design choice. ggplot2 in R and GUIs like Tableau also includes data aggregation as first class citizen in their tools.
I think I misread the OP that they were talking about separating the implementation code at the vis and data layer (rereading I think they were talking about the user's experience). I disagree and think for a user it's great to have your visualizations and transformations work seamlessly.
My memory of Vega was that it mixed data transform code with vis code. So I didn't have like a dplyr + ggplot2 combo. I thought design wise that was a mistake, because nailing the transforms from an edge case and performance perspective is hard and doing it 2x doesn't make sense to me. So decoupling the vis code from the data engine I think is better.
But I took a fresh look at the Vega repo and it is indeed nicely decoupled, and the transforms look usable standalone. So maybe it sort of already has the Dplyr+GGPlot2 style decoupled architecture that I thought Arquero would bring.
I had thought of Vega as "a monolith for datavis", but now looks like there's lots of smaller usable packages in there.
I've only used ggplot2 and found it fantastic compared to standard plotting tools. What has been improved about the design of these type of visualization tools since then?
I would actually turn this question around: why would you want to implement your own aggregation functionality?
It's super convenient to be able to just make a histogram. And I'm sure my hexbin aggregation function would have errors in it the first time I wrote it.
But these are all opt-in - if I need to make a histogram of 10 trillion datapoints and performance is critical, sure, I'll do the aggregation myself and just call the barplot function instead of the histogram function. What did bundling a histogram function take away?
Generally being able to interactively adjust the granularity of the data, drill down, and also to interactively filter the data and see it properly re-aggregated.
I agree that some libraries do this but in general Plotly does not: we mostly visualize the data given, and lots of users wish we would onboard more transformation/aggregation/processing :)
I like Vega-Lite and I used it in various small projects over the past few years. It's really easy to work with.
That said, I hit performance issues very quickly, when trying to do interactive data visualization - and by interactive, I mean changing the data, not chart styling. That may be because every time data changes, I have to repackage it into Vega-Lite JSON description and rerender the chart. I wonder if there's a better way of doing it? A partial update? I couldn't find anything in the docs last time I tried.
Related: what would be the best alternative in JS, short of writing my own d3/canvas code, for cases where I have a structurally fixed (but possibly complex) chart, but I need to hit it with 100k or 1M data points and have it redraw under 100ms?
Vega-Lite is built on Vega, which is fully reactive and can do partial updates. In order to use it, you need to update the data via the Vega view api. Check out https://vega.github.io/vega/docs/api/view/#view_data.
One other piece of advice I have it to reduce the number of marks you need to draw with aggregation or sampling. I used this approach in https://github.com/uwdata/falcon to visualize billions of points and interact with them in real time.
Thank you! And thanks for linking to Falcon, I'll check it out. The demos are similar to the thing I was trying to achieve, and show the performance I dreamed of.
I like the idea of the whole plot being described as json. I was looking for a way to automatically generate plots from a project written in Prolog (https://github.com/stassa/louise). Until now I was composing some R plotting scripts with Prolog, which can get a bit clunky. Swi-Prolog has a solid library to convert from Prolog terms to json so it should be straightforward to translate between program output in Prolog and Vega-Lite json. I will definitely give this a try.
Just a question to @domoritz - I noticed that in this example plot, that shows relative numbers of different types of animal produced in the US and UK rendered as emoji:
- the data is encoded as hard-coded values that tell the engine where to place each icon of a sheep, pig or cow. Isn't it possible to derive these positions from numerical data, automatically?
For instance, instead of enumerating each instance of a "pig":
We just finished integrating Vega-Lite into our Xamarin mobile app - with support for UWP, Android and iOS. It's great.
The charts are self-contained in a blob of JSON, so you can ship them around easily.
It supports both Canvas and SVG rendering, which is great if you need to export to a .png since Canvas is better suited to this. SVG is good for crystal clear vector graphics though. So we use SVG mode on the mobile app, and Canvas mode on the back-end.
We tried numerous other JS charting libraries (c3, chartist, chart.js, frappe, and a couple others I forget) but none of these were as professionally and comprehensively put together as Vega-Lite in our experience.
I am starting to view Vega-Lite as "the SQLite" of the charting world - I hope this view holds for the long-term.
> Looks like this library generates static graphs.
It can create interactive charts [0].
> Is there any specific reason to use this, as opposed to say, Apache Echarts
I've never used it, so my initial impressions may be mistaken, but ECharts looks much less declarative.
It's interesting to compare the specifications for a bubble chart in both systems [1, 2].
The ECharts example [1] first specifies a chart-type ("scatter"), which seems to be hard-coded to use the first two elements of each entry in the data array as the x and y positions. This is then customised by writing JavaScript functions to set the symbol size and color.
In contrast, the Vega-Lite example [2] defines the chart fully declaratively - you first set the mark type to circle, then specify the data encoding which defines how each variable maps to each attribute. This mapping is properly declarative - it isn't just a manually-defined function.
If you had a multidimensional dataset and wanted to change which variables you want to plot, it looks like you'd need to reshape the data array if you were using ECharts, whereas you could just change the "field" attributes in the encoding part of a Vega-Lite specification. This makes Vega-Lite more convenient for exploratory data analysis.
The way Vega-Lite represents these encodings is convenient - a recently created library by Krist Wongsuphasawat tries to expose a similar interface to other visualisation components [3].
I like Vega, but I hate that it is not very efficient. It uses array-of-structs which is not efficient in JavaScript. For example to create a heat map you have an object for every single point. Also there does not seem to be a way to do proper heat maps at all (i.e. an interpolated image), only a grid of squares.
Finally I'm not really sure of the point of being declarative if it only supports JavaScript anyway. Maybe they plan a WASM implementation?
> Finally I'm not really sure of the point of being declarative if it only supports JavaScript anyway. Maybe they plan a WASM implementation?
Vega schemas are JSON and JSON is quite portable, despite its JS origins.
Vega has actually become fairly popular in the Clojure ecosystem (see e.g. https://github.com/metasoarous/oz) due to how well the data-orientation fits the Clojure philosophy. I also think Altair for Python is quite popular (https://github.com/altair-viz/altair) and that is a Vega-lite library.
You just declare what you want drawn and you get back an svg tree which you can either modify further or transform to xml to get an actual .svg format string
It's very unopinionated and can be easily inserted in both Clojure and Clojurescript applications. You can render with the browser, webview, batik, svgsalamander or I even quickly wrote a converter to Javafx bc I have the svg tree I can directly traverse
> that still needs to call JS though... It's not like they rewrote the renderer in Clojure
But why exactly is that an issue...? Some people also think the JVM is icky so they won't touch Clojure. I don't really care much myself what underlying technology is used, I just like to get things done.
> A simpler and easy to grok alternative with minimal dependencies is thing-geom
Thanks for reminding me of thi.ng. I am both in awe at how prolific he is, while at the same time frustrated with the non-standard org-mode-driven development style.
> You just declare what you want drawn and you get back an svg tree which you can either modify further or transform to xml to get an actual .svg format string
Probably worth mentioning that Vega can also give you an SVG.
> I even quickly wrote a converter to Javafx bc I have the svg tree I can directly traverse
Can you link your code? I would like to explore this some more.
It's not an issue as such, it's just that you are doing extra work to avoid Javascript by having a fully declarative system (there's no Javascript in the Vega specs - it even has a custom language for expressions).
That's great if you intend to rewrite the renderer in Clojure. But if you don't do that - if you just call into Javascript anyway then what's the point?
It's very future-looking, but feels quite YAGNI at the moment.
And every language is equivalent to machine code which is represented as electricity in a circuit. Who cares what's further down the layers? It's about what abstractions your work with, not what libraries your code calls into.
Apparently org-mode no longer drives it, from what I read in the link above!
> Originally developed in a Literate Programming style using Emacs & Org-mode, it has recently (May 1) decided to revert to a traditional Clojure project setup to encourage more contributions from other interested parties. The original ORG source files are kept for reference in the ./org/ directory until further notice.
Well the core "issue" is they Vega ,as far as I understand it, isn't a format in the same sense as SVG is
So you're creating a hard dependency on having to embed a JS run time. There is no other rendering backend. If you have your problem space all speced out then maybe that's alright, but that generally leaves me a bit uncomfortable. The dependency graph is huge vs geom.
With geom I started my project with using batik, then moved to svgslamander when I needed to draw updates a bit faster and then when I had a bit more time to write my own renderer I changed to cljfx/javafx.
If I'm not happy with the vega renderer then I'm kinda stuck - while geom is all a digestible size
If you think that the JVM is icky, GraalVM may satisfy some of your concerns.
Just getting access to Clojure on Windows through a single graal executable saved me some heartache this week, and I can't wait to start packaging apps with it.
The main use case where I've wanted something like that is that you want to pair an interactive chart with a data table. For example you create a crossfilter and want the table to list the observations that pass the filter.
I actually contributed an example to the Altair documentation that links a table to a scatter plot.
I agree though that the tables are hard to make and not very nice looking. I think it's just not really an intended use of Vega.
If you want to learn more about the academic origins, check out the paper: https://idl.cs.washington.edu/papers/vega-lite. Vega-Lite is also available as the default plotting library in JupyterLab.
If you like, also check out Altair (https://github.com/altair-viz/altair), a Python API for generating Vega-Lite and Vega-Lite-API (https://observablehq.com/@uwdata/introduction-to-vega-lite), a JavaScript API to generate Vega-Lite.