Hi Folks, Although I did not post this link to HN (and have no idea who did), I am the author of the book. As an FYI, the printed and ebook versions will be available from No Starch Press early in 2015. The online version will remain online even after the other versions are available.
The current content is just a draft of the final version, but I'll be updating it as the book gets finalized. Significant updates will be noted via the @jsdatavis Twitter feed.
Comments, suggestions, and criticism are all welcome. You can reach me directly at stephen@sathomas.me.
Thread hijack meta discussion re: affiliate links -
If you are providing something of value it may be perfectly fine to include an affiliate link provided it is disclosed as such. (Of course, you may wind up driving more traffic with it specifically excluded.) I've seen HN responses go both ways on this but generally people are amenable to allowing positive contributions to be rewarded.
I very much like the aim and contents of this book. There are loads of data visualization basics that this tutorial gets right, like: In bar charts, however, it’s almost always best to make zero the y-axis minimum.
Or, regarding pie charts, humans are not particularly good at judging the relative size of areas
Or about bubble charts, We should never use the extra bubble chart dimensions to convey critical data or precise quantities, therefore. Rather, they work best in examples such as this example—neither the exact wind speed nor the specific classification need be as precise as the location.
The author does a great job of covering many different types of charts and graphs. There's also a good focus on getting rid of chartjunk, though the author doesn't go far enough, leaving in extraneous shadows and other elements (and I have to question the use of a library that requires so much effort to remove the default chartjunk).
Unfortunately there are a number of wider issues that obscure these valuable insights. Many the data visualization concerns are buried beneath a mound of introductory web development information that vascillates between explaining the nature of foundational technologies like AJAX, SVG and CDNs and assuming that the reader is an avid jQuery user. The last chapter is a treatise on MVC application development. I'm not quite sure who the target audience really is.
But the worst offense committed by this tutorial is the emphasis on overly simplistic charts. From the introduction: Effective visualizations clarify; they transform collections of abstract artifacts (otherwise known as numbers) into shapes and forms that viewers quickly grasp and understand. The best visualizations, in fact, impart this understanding subconsciously. And later (in defense of pie charts), the author advocates for a graph that epitomizes the idea of chartjunk: a 400x400 pixel circle that gives no more information than the number 22.4%.
This, I think, is where the true challenge in data visualization comes: not in producing a pretty chart to display whatever information's at hand, but to DROP the chart if it doesn't truly add value. Waste no ink producing a chart that is better left as a table of numbers (e.g. a reference), and certainly do not waste the viewer's time with a chart showing something that's more effectively communicated in prose. The answer to the question of how much of the world lives on $1.25 a day is 22.4%. A chart simply cannot illuminate a single data point.
Where data visualization gets really interesting is when you maximize the amount of information conveyed. Don't waste the reader's time producing these USA Today (or The Onion) style bar charts. A handful of numbers is best presented as numbers. Seven data points is TINY, not a moderate size appropriate for a bar chart, as the author would have you believe.
Effective data visualizations demand the viewer's careful study. If a reader can completely understand a chart subconsciously, there's probably no need for a visualization at all. Effective data visualizations are information-rich: they have a data density far exceeding that of prose. Good charts are exceedingly multivariate - small multiples are probably the best example. If your charts don't meet this standard, you're wasting your time producing them, and you're wasting your reader's time forcing them to parse a visual representation of what should be simpler prose explanations. If your headline contains as much information as your chart, drop the chart.
A great example of this issue comes in the section on scatter plots. After charting the relationship between health care spending and life expectancy, the author glibly declares In this example, we can see how life expectancy relates to health care spending. In aggregate, more spending yields longer life. However, that's only the least interesting factoid (as in, plausible-sounding inaccuracy) you could glean from this chart. There are many interesting questions that could be asked, but of the three the author poses, only one is answered: who the heck is that outlier that spends 50% more on health care than the pack yet has a lower than average life expectancy (spoiler alert: it's the United States). After demonstrating how to highlight this one data point, the author doesn't bother to explain why it's worth plotting all the others, or how this graph explains the situation uniquely. So once again we have produced a chart with basically one piece of information: that the US healthcare system sucks. This somewhat obvious insight simply isn't worth the ink spilled.
This emphasis on overly simplistic visualizations is entrenched in the choices of libraries. This is not to pick on Flotr et. al, they're good for what they do, but what they don't is far more important. Like essentially every dedicated charting library, you are restricted to just a handful of options that the developer allows you to have. You can choose from a few preselected chart types and customize them in a few preselected ways. If you have a novel dataset that requires anything more customized or unique, you are up a creek without a paddle. Such charting libraries require that you convolute your data until it fits its own assumptions. Nowhere is this better illustrated than this tutorial series, where the differences between iterations are frequently simply a reorganization of the data structure (i.e. a waste of developer time).
Far better, then, to use a generalized data library that allows you to manipulate your visual tools with endless freedom. D3 is a great example of such a library. If you're writing a data visualization tutorial in JavaScript that uses anything but D3 you have a burden of proof to demonstrate why. This is not because D3 is the end-all-be-all of charting libraries, but rather because any tutorial based on the limited selection of possibilities afforded by anything else is simply not a data visualization tutorial. It's a charting options tutorial (also valuable, but far narrower in scope).
The author finally gets around to talking about D3 near the end of the book, but misses the opportunity to demonstrate how simple it actually is to replicate the early examples with D3. It's also worth noting how understanding these composable techniques from the start gives you significantly more power. I really wish this book started with the tutorial on D3; everything prior just seems anachronistic.
For more information on information-rich data visualization, check out the work of Edward Tufte, in particular his book The Visual Display of Quantitative Information [0] (which the author cites but seems not to have read). For more information about JavaScript data visualization with D3, read everything you can find by Mike Bostock [1].
p.s. please don't make a map assuming latitude/longitude == x/y, even a really small one. A decent geographical charting library exists and is free, so just don't hack it.
Thank you very much for sharing your thoughts on the book. It's really gratifying when someone takes the time to seriously consider an author's work and then takes the additional time to compose a thoughtful critique. It seems pretty clear that the book you wanted to read was not the book that I've written, but that's actually a pretty good thing. Your comments will definitely help me clarify the goals and approaches of the book when I flesh out the Introduction. (As most everyone probably knows, the Introduction is the last section that's written. What's there now is mostly just a placeholder; I'll write the real Introduction once the editorial and technical review are complete.) If you're interested, you can check the online version in a month or so to see how effectively the book meets its own goals.
For those folks that do want a book mostly devoted to D3.js, you probably won't be satisfied with my book. (In fact, the publisher and I had quite a bit of discussion about including any material on D3 at all. We finally concluded that any book on JavaScript data visualization couldn't ignore D3, so there is a chapter dedicated to the library.) The good news, though, is that you have lots of other options. Amazon lists at least nine books dedicated to D3. As an author myself, I don't feel comfortable making specific recommendations publicly (as that might imply a negative opinion of books not recommended), but anyone is welcome to contact me privately for my thoughts. (Contact info is in a comment below.)
I think this is a great comment on data visualization. And there is one part that I want to highlight as relevant to me right now:
> Far better, then, to use a generalized data library that allows you to manipulate your visual tools with endless freedom. D3 is a great example of such a library. If you're writing a data visualization tutorial in JavaScript that uses anything but D3 you have a burden of proof to demonstrate why. This is not because D3 is the end-all-be-all of charting libraries, but rather because any tutorial based on the limited selection of possibilities afforded by anything else is simply not a data visualization tutorial. It's a charting options tutorial (also valuable, but far narrower in scope).
Yes! When I saw this article was based on flotr2, I immediately checked to see if it was based on D3. It was not (correct me if I'm wrong) and I was a bit disappointed because flotr2 appears to be all about charting, but data visualization is much more than just charting.
I'm looking for a good charting package right now, but my requirements are that it's based on D3 so that I don't have to introduce two different libraries when I need some data visualization that goes beyond mere charting. So NVD3 appears to be my choice at the moment.
Nice work! I'm becoming increasingly interested in data visualisation, so this couldn't be timed better for me.
I made a side project recently using the google charts API (http://texas.joetannorella.com). This flotr library seems good though so I'll be using it on my next project.
A nice topic, but it seems that along with advising some good practices and ideas, there are a few problematic things. For example:
- usage of sequential color scales for categorical data (compare with http://colorbrewer2.org/),
- lack of interactivity (one of the main advantage of JS plots is ability to check data with mouseover; for example - which country is this particular dot),
- in general some aesthetic considerations (e.g. http://d3js.org/ is not only a library, but a whole philosophy of doing nice visualizations; including small, but visually important things, like the choice of colors).
As already advised (by couchand), that after Edward Tufte and Mike Bostock, there is a high baseline level for information beauty and clarity.
If you don't need all the features from Flotr, you could be quite a bit better of using Morris.js[1]. I've been using it for my work and have been really impressed.
PS. I'm one of the collaborators now, since I needed some features that weren't implemented.
There is also d3.js (http://d3js.org/) which I've used in the past. Based on the examples pages for the two libraries it looks like d3.js has much more functionality:
Still working on that. It should be complete before the printed/ebook is available in January.
FWIW, the code for the visualizations (but not the libraries) is neither minified nor concatenated, so you can access it directly from a web inspector.
Amazon preorder at http://www.amazon.com/Data-Visualization-JavaScript-Stephen-... (that's NOT an affiliate link)
The current content is just a draft of the final version, but I'll be updating it as the book gets finalized. Significant updates will be noted via the @jsdatavis Twitter feed.
Comments, suggestions, and criticism are all welcome. You can reach me directly at stephen@sathomas.me.