As someone with too many GIS degrees, I feel a level of cathartic release in reading this and thinking that laypersons might be able to improve their map making skills, avoiding some of the more serious cartographic gotchas. It was well-written. The beauty of the ubiquity and greatly-improved UX of modern GIS tools is that everyone can dive in to doing geospatial analysis and building static and dynamic maps. It also means people can accidentally author very misleading visualizations.
Despite this ESRI-backed article on the subject, I think the popular ESRI-driven map dashboard for Coronavirus[1] has a major flaw that violates the crux of this article. Dot density maps _MUST_ be set to scale relative to your map scale, or else you get nightmare scenarios like this one[2]. This is doubly true if the dots are varying in size (which I also think is a fundamentally terrible representation, because people suck at mentally comparing areas). If I were to modify it, I would probably use a choropleth-like representation. Keep the dots equally sized and colour them different shades of red. That way nobody's brain will mislead them into thinking "this larger circle means a larger area is all infected."
As another commenter pointed out, geographic projections are well-known in the infoviz literature to be problematic, for a number of reasons.
Further, two dimensional area (circles) is a particularly terrible dimension to marry to geography, because it’s an additional (and therefore competing) spatial dimension. Color is better, but still has problems (compare Rhode Island to Texas, or populous New Jersey to unpopulated Wyoming/Alaska). And color forces you to bin, which can be misleading. Chloropleths are still harder to read (compare California to Maine, they don’t share an Axis and are irregular shapes, making it hard to compare their areas) than a bar graph or histogram.
IMO a logarithmic bar graph is the most reasonable choice, if you want to include population density I’d encode it with opacity and one-dimensional space (a shaded-in bar representing infection, a dark bar representing mortality, and a transparent bar representing total population). If geographic projections are that important to you, you can superimpose those bars on countries. It sucks but it gives you geographic scale. If anyone wants to build this graph, please include an adjacent, different-hued bar encoding the number of tests performed thus far.
Yep. Problems everywhere. But maps are by definitions lies (aka abstractions). You decide which lies are more important to minimize depending on your goals.
Totally in agreement. The fundamental lesson of infoviz, in my view, is that visualization is an act of creative distortion: graphs are literally distortion (much like how diaries, photographs, and films are distortions of experience).
There’s no such thing as a neutral graph. Making certain things easier to compare might not be the goal. This is why journalistic graph makers must be extra purposeful, and grounded in theory.
I don’t believe that laypeople are using geographic distortions poorly with malicious intent, but more likely with ignorance, as the article suggests.
Personally I'd use an equal-population cartogram like https://go-cart.io/cartogram instead of a geographic projection, and a dot map or solid colors based on density.
And per the terminology in the article, that ESRI map uses proportionally scaled symbols, not dots.
That equal population map looks decent enough for the US, but it's pretty poor for Canada. Arbitrary political boundaries vastly distort the size of regions. Northern Alberta, for example, is barely more populated than the Northwest Territories. However, southern Alberta has a bunch of people, so northern Alberta is huge while the North West Territories are squished flat.
That pattern holds true for nearly every province, making them all rather misleading.
> Dot density maps _MUST_ be set to scale relative to your map scale, or else you get nightmare scenarios like this one[2]
There are a lot of flaws with the visualizations of the infections. But using choropleth representations would need a population reference no ? I'm genuinely curious, should the range be reflective of the population with the series the infections ?
It could then be enhanced with deaths per infections in certain regions, which could be further enchanced with distance to hospitals.
I sound like an ass right now but the data is here, we should use it properly to help, and with people like yourself, maybe it would be better than whats being given right now
You'd need to normalize the data against area but not necessarily a population reference. You're just showing how prevalent an attribute is relative to other areas.
Honestly, it's tempting. I should dust off QGIS or Leaflet and try a few examples. But to speak candidly, I'd probably much rather play Lego with my kid tonight.
I was surprised when you said that. Recently I was doing some georeferecing of historical maps on top of current maps, and I was very disappointed with the choices.
That was my first experience with GIS toolkits. I tried ArcGIS, QGIS and a few lesser knowns. I was looking for a good UI/UX, partly because the goal was to teach a non-technical acquintance to do it. There was a shareware toolkit that was exclusively for georeferencing that had a very satisfactory UI, but it had a blocking bug that would cause it to crash.
I used to work for a GIS company. All of these companies' tools are based on whatever software development platform was current when the project was initiated. The mapping tool may grow to be 8, 10, 15, even 20+ years old and new features are continuously added, but never is the underlying software platform upgraded or the tool rewritten.
I'm not sure why this is or even if it is peculiar to GIS, or just more visible compared to the many slow-to-upgrade software fields which don't have a UI or which do more server-side. Also I think some of it is also driven by having one or more big customers who themselves refuse to upgrade.
The effect was particularly visible as an outside team was developing a greenfield iOS app for our data at the same time as our team maintained their old-new Windows app. The iOS team was able to, as they say, "move fast and break things" and gain accolades for whizbang features. It was interesting to watch them accomplish more with less computing power and a more primitive (IMO) development language (their Objective-C to our C#).
Edit: I called it the "old-new" Windows app as there had been an even older app (predating C#) which the C# app replaced. In the circle of life the once-new C# app itself became ossified and stuck with whatever short-sighted design legacy decisions were baked in. There was a lot of technical debt in the codebase.
I should say, partially replaced, as they were never (while I was there) able to convince all of the customers to upgrade from the original app built, hence why I say that big customers who can simply refuse to change might be a factor.
Yeah, that's true. If you walk through a standard GIS toolbox you'll notice that it covers a _lot_ of ground. It's difficult to make all these tools equally generic and straightforward. If you are running ArcGIS or QGIS you're already in "advanced user mode", in my opinion.
I'm not sure how common your use case is, but maybe it does need some simpler option (does it exist in Google Earth maybe?). I think a lot of the improvements are in the form of data authoring and presentation. Even just, "hey here's a public participation map of jogging/biking routes, all measured out and hand-optimized for avoiding traffic lights. Check it out and add your own!" has gone from impossible to pretty easy since I began in the field.
I found this article to be very informative. Most of us don't have an understanding of how nuanced (or not) using maps to display data can be.
I wonder if someone with the proper credentials could contact the creator of this website [0] with advice. I seems like a good idea and resource but the map bothered me the very first time I saw it. The color coding is simply wrong and it communicates something that does not align well with reality.
I can't count the number of times I've looked at a data visualization and wished I could sit down with the person who made it and read an Edward Tufte book to them. There's just so few good examples out there of data visualizations that respect basic principles of visual communication, like the ones outlined in this article. They generally seem to aim more for visual impact (like the useless 3D display in the article, which you've gotta admit is striking) than for clarity, which I guess is understandable but is still too bad.
(And as long as I'm griping, don't get me started on all the people who think a wall of text slapped into a PNG constitutes an "infographic.")
I think Tufte is very overrated. People who are reasonably comfortable with data rather have it in basic format. Tufte-style often takes a lot of effort to produce and the payoff isn’t there. Consultants love it though because they can bill their clients for playing around for hours with charts.
Where Tufte (and others like him) is concerned, I try to remember the maxim Do not follow in the footsteps of the sages. Seek what they sought.
Slavishly reproducing his methodology ignores everything we've learned since then. On the other hand, for those new to the field, reading about his work and understanding what he was trying to accomplish with the tools available at the time can open our eyes to new ways of thinking.
(As an aside, this same maxim has also helped me with things like programming tools. We don't need to use Lisp or Smalltalk for everything, but we can learn a lot from these languages, and especially from what their creators and proponents were trying to achieve with them.)
Furthermore, there are different styles for different purposes.
The famous graphic about Napoleon's army which is very associated with Tufte is an example of a graphic that crams a lot of data into an illustration that rewards careful study. It's actually not a graphic that especially makes data about something obvious at a quick glance.
Sometimes an illustration that best serves as a background for a knowledgeable person spending 30 minutes explaining it is a good approach. Other times you want to capture the contrast between a few numbers in a compelling way.
I have and admire this graphic and I think it highlights a few top-level things you mention like the right representation for the right audience. It takes some time and concentration to understand the Napoleon graphic vs. a simpler presentation where a few key points jumping out at you with ease, but that it manages to encode so many dimensions into a 2-d format is where it is unique. Tufte generally pushes information density over other, equally as valid, goals.
But, as the linked article discusses, this is actually a really good diagram if you have someone up there explaining it in extreme detail. But it looks like a mockable graphic to the casual observer.
1. Tufte-style is fairly meaningless as a term. His books cover a vast array of graphics.
2. Almost every chart and graph I see would be better if its creator understood and applied Tufte’s principles.
3. His books are truly delightful to read and are not overrated.
4. If he’s so overrated, name a person or resource that would better impart a set of useful principles to create effective and accurate visual representations of information.
Tufte, as another user correctly pointed out, is massively overrated. He is also incredibly thin-skinned and often blocks people, many of whom work in data visualization, if they criticize or critique his work or ideas.
Alberto Cairo is just as overrated, but he is more receptive to feedback.
See my rebuttal to that other comment. And please elevate your discussion from the realm of feelings about people. Let’s say he is thin-skinned. So? That doesn’t change how helpful and eye-opening his delightful books are.
My pet peeve is electoral maps, they show some interesting trends but there overuse make us seem a lot more divided on geographic lines than we actually are.
Maps are often abused or misused, but I'd be curious to know why you believe map viz in general to be the worst. Done responsibly, they can and often are extremely insightful, serving purposes that no other viz can.
The fact that most maps are terrible means we need to encourage better maps, not dismiss them entirely.
That argument can be used for any visualization. Used correctly, they are usually good.
That said, I am being dramatic on my claim. It doesn't help that I don't have an internal map. I'm oddly good with directions, but I do not visualize getting from here to there in anything resembling a map in my mind.
So, to that end, most maps that someone uses to show me something that it is best at, a simple time series or scatter plot would have done as well. Often better.
It seems like you are complaining about graphics in the article, but I'm not sure. If you read the article, it specifically talks about why those are not good visualizations and gives pointers on developing good ones.
For the 3D one specifically, right under the graphic, the article says:
"3D has a time and a place. It can be a really useful way to encode thematic data on the z-axis and make something useful. But extruding Hubei compared to the rest of the areas just doesn’t work. It’s gratuitous and adds nothing. It’s really hard to make any sense of relative amounts and that’s before we even deal with foreshortening and occlusion."
I read the article, thank you. It's you who have misread my comment. I was praising the article for illustrating good principles of visual communication, and lamenting how there are so many people making data visualizations out there that don't understand this stuff.
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith... Please don't comment on whether someone read an article.
Hence my first sentence, "It seems like you are complaining about graphics in the article, but I'm not sure."
First you said, "I can't count the number of times I've looked at a data visualization and wished I could sit down with the person who made it and read an Edward Tufte book to them."
I was and am 100% on board with this comment. I think the same thing often.
Then you said "There's just so few good examples out there of data visualizations that respect basic principles of visual communication, like the ones outlined in this article."
I agree, the article does a pretty good job.
Then, "They generally seem to aim more for visual impact (like the useless 3D display in the article, which you've gotta admit is striking) than for clarity, which I guess is understandable but is still too bad."
I was uncertain about this statement. The previous sentence you start by stating "There's just so few good examples..." and end with "...like the ones outline in this article", which made it a little unclear if the one's in the article were good or not, but as I was reading it I was leaning to the good side. Then this sentence started with "They generally seem...", and since the end of the previous sentence ended talking about the "ones outlined in the article", I associated "They" with "the ones in the article". And this sentence that started with "They generally" was negative.
Then I contributed some miscommunication. When I used "you" in the sentence I was thinking in general terms (including myself) and not you personally. I think that might have been better stated as "If one reads the article...".
Anyway, I was initially confused by your statement. Now I see what you were going for.
As someone who read your comment before reading the article, I took your comment to mean that the article was poor because it had bad graphics. That's not a criticism against you on my part btw, only an observation. So it might be that more people read your comment that same way due to how you phrased it.
The article has bad graphics. The question whether GGGP criticizes the article or not is only resolvable if you know both the comment and the article. If you do, the answer is quite obvious. If not, it is hard to predict. A wonderful example of entropy.
Like several others, I was also confused by your initial comment. At first I thought you were criticizing the article as an example of bad graphics and useless 3D.
I am no master of communication, but there is one thing that stuck in my mind from a class I took many years ago: If I am talking to someone or writing something they read, and they seem to be misinterpreting or misunderstanding me, who is responsible for that? Is it the reader or listener, or is it me?
The lesson was that I, the person doing the communicating, am responsible, not the person receiving the communication. It's usually not helpful to blame them for misunderstanding. Instead I should realize that I was probably unclear in some way, and do what I can to clear it up.
Of course there are exceptions. Sometimes people are willfully misunderstanding and don't give you a chance to clarify. I remember one friend who delighted in pouncing on me if we were casually brainstorming and I said something that wasn't exactly what I really meant. When I would correct myself they would say "Oh no, you already said XYZ and you can't take it back now!"
But I think those cases are unusual, and I've found it very helpful to avoid blaming the listener and just see how I can be more clear.
> ... lamenting how there are so many people making data visualizations out there that don't understand this stuff.
This point was clear in your top comment.
> I was praising the article for illustrating good principles of visual communication...
This point was completely unclear in your top comment.
I read your top comment three times, and each time made me feel more certain you were complaining about the site as an example of failing to implement good visualizations (until I read this comment).
Don't get so defensive about a communication mistake that you made while talking about communicating effectively. Can't you accept it with grace that your comment could be misinterpreted the way you wrote it? I also juggled in my mind what you meant.
Very nice and well-written writeup. Here's one graf that randomly provoked some thoughts:
> But looks can be deceptive. The fact that it looks okay is hiding a dark secret that, if you’re not aware of the fact, won’t even get noticed. The map is using totals (absolute values). There are very very few golden rules in cartography but this is one of them: you cannot map totals using a choropleth thematic mapping technique. The reason is simple. Each of our areas on the map is a different size, and has a different number of people in it. These two innate characteristics of all thematic maps means you simply cannot compare like for like across the map.
> The label tells us that Hubei region has over 65,000 cases of coronavirus. It sounds a lot. But does Hubei have 100,000 people, or possibly 100,000,000 people living there?
I definitely agree with the author: that there are very few "golden rules" in visualization, and that not depicting absolute numbers in a choropleth map is one of them. However, the author does an excellent job (with a bar chart and revised map) showing how this anti-pattern severely obfuscates how much the Hubei region is an extreme outlier.
IMO, during the first half of an epidemic, when a small portion of the population is infected and infections are growing exponentially, it makes sense to use the absolute number of infections. Then later when infection is widespread and the curve looks logistic, it makes sense to give the proportion of infected. I think we are clearly in the first half when it comes to coronavirus.
> during the first half of an epidemic, when a small portion of the population is infected and infections are growing exponentially, it makes sense to use the absolute number of infections.
That's even a bit of an understatement, because three infected in a city of millions can easily be an early stage pandemic whereas three infected in the middle of nowhere would just be a three very unlucky persons.
I guess I'm having a hard time thinking of reasons why absolute value is more important than rate, especially in this scenario, when we're measuring the impact of an infectious disease that spreads person-to-person. I suppose in the hypothetical situation, where there are 10,000 cases in Wyoming and 10,000 cases in New Jersey. A choropleth map by rate would shade Wyoming 15-18 times darker than New Jersey. And this would obfuscate the likely reality that 10,000 cases in New Jersey is imminently a far bigger story – because N.J. is not only 15-18 times more populous, but ~25 times more dense. (The fact that Wyoming is ~15x bigger by land mass would make the issue worse, in terms of visual distraction)
But I'm not sure how shading this by absolute totals – in which case, Wyoming and N.J. would be the same shade – would provide significantly more value? Sure, N.J.'s situation wouldn't be effectively invisible in the totals map, compared to the rates map. But now we have to imagine a scenario in which a person-to-person virus managed to sicken so many people (proportionally speaking) in such a large rural state compared to an extremely urban state. It's very hard to imagine a scenario in which we don't want to focus attention on Wyoming. For 10,000 Wyoming people to be infected – and only 10,000 affected in New Jersey – would almost certainly mean that the infection's original epicenter is Wyoming, and that someone from Wyoming had direct contact via travel with a New Jersey resident, (especially if Wyoming and N.J. are outliers in terms of absolute totals by state).
The argument isn't that you never want to present absolute values.
The argument is that you never want to present absolute values on a choropleth map, explicitly because it always obfuscates the data in a way that is misleading.
The data in question should probably be in a table, if the idea is to be able to compare Hubei Province to the others. The bar chart makes the difference very clear, but it also makes it difficult to interpret the data for the other provinces.
What for though? What can you tell from that value, other than the value itself?
You cannot tell whether it's common or rare, you cannot tell the risk of anyone in a certain area to be affected, you will have a hard time showing trends because people will react to the phenomena and avoid a certain high-risk area which will then result in fewer cases in that area.
Is a virus going to spread equally fast in a sparsely populated area vs a densely populated area?
Isn't there implicit knowledge going into two of those? 100 cases in the Yukon territory feel less likely to infect other areas than 100 cases in NYC. For the global economy, you'll also want additional data, e.g. how relevant to the global economy is that area, how many people are working there, is is the working population that's affected or is it primarily old people etc.
When the presence of the contagion is the risk, absolute numbers communicate a lot. Relative counts are less meaningful right now.
The majority of meaningful information received from such a chart right now is the presence or absence of the virus. Secondary is the number of cases to indicate the stage of spread (e.g. 1 suggests maybe an outlier, 2-10 suggests early stages of contact spreading, etc).
Communicating information with an inherently exponential growth rate is just entirely different beast.
Why is "has province reported any cases?" the most meaningful information? Ignoring the current reality of every province having reported cases since January, a simple boolean shading would obfuscate nearly every vital insight realistically conceivable. If it were the case that 3 months after the Hubei outbreak, Hubei had 100,000 reported cases, and all bordering provinces had 1-100, that is an extremely important distinction to make when assessing the effectiveness of containment policies (and/or the trustworthiness of official government numbers).
I agree with what you're saying about reporting cases in China. My point was in the context of reporting elsewhere in the world where most areas have zero cases and so having cases or not is the most important information, followed by the number of cases. I should have been clearer.
> Secondary is the number of cases to indicate the stage of spread (e.g. 1 suggests maybe an outlier, 2-10 suggests early stages of contact spreading, etc).
if you want to indicate that info, why not indicate that info, e.g. by marking current cases of unknown, known local transmission, imports only etc? the absolute count becomes irrelevant.
why use a proxy to speculate about what you can say directly?
But how do you do that in a way that is superior to simply just showing the numbers? The point of the OP was to argue that absolute numbers aren't meaningful in topological maps. But how do you communicate these more specific facts in topologically useful ways aside from just showing the numbers?
> this anti-pattern severely obfuscates how much the Hubei region is an extreme outlier.
For this reason I think the 3D projection graph is actually not as bad as made out. Sure, it's hard to tell anything about any of the other provinces compared to Hubei, but it really highlights the difference between there and other provinces.
The difficulty I always thrash around with is: proportional by area or proportional by population? I used to do some crime maps and some areas would look quite crime-ridden ... because they were areas with very little population, as the census counts it, like parks and such, so the crime would look rather high. So dividing by population isn't the cure-all, but it beats nothing. For giggles, I would do crimes in a given region, crimes in a region divided by that area, and crimes in a region divided by the population in that region. Very different-looking results.
I have often considered dividing by some kind of combination of area and population, but even that seems not quite right. Disregarding "victimless crimes," much crime is interactive: two or more parties must be involved, therefore the population ought to have some kind of exponent attached to it, like particles bouncing against one another in a container.
I never did puzzle this out, I am sure brighter minds than I would have come to some conclusions.
The answer ends up being: Why not Both? Both are useful but tells us different things.
For COVID-19, the number of infections has real meaning regardless of its proportion of the population: Each are instances of viral infections that can spread that infection further. But proportion infected can give information about virulence, etc.
That was my first thought, yes, but then I think area ought to be involved somewhere; if the area is large enough, even a medium population will not have people ("particles") interacting ("colliding") as often.
In the rather clumsy taxonomy of crime I created from the UCR, most violent crime -- excepting suicide -- would be collision-based. Some drug crimes like possession would not be collision-based (although it could be argued that possession involves buying which involves another person) while drug sales would be. Crimes against property are interesting -- is that another person by proxy, or should that merely be collision-less?
What about something like property crimes/100,000 people and violent crimes/100,000 people? Disease rates are often reported in such ways like in the article posted the best map had rates/100,000 and not strictly population or area.
The number of cases per people statistic is silly. It might make sense when the virus is common around the world, but when it's just spreading the number of cases itself is more important. For example if you detected 100 people infected with a virus in some region, does it matter if it has 200 million people (Uttar Pradesh) or 10 million (Lombardy)? These political divisions are arbitrary anyway.
I initially thought the same thing, but revised my thinking after I kept reading. I made a comment here [0], but the author is correct that in this case (as in almost every conceivable visualization case), mapping the cases-per-person value is necessary. The mapping of absolute counts almost completely hides the severity of the impact in Hubei province, and how the severity in other provinces has a direct relationship to their geographic distance from Hubei.
> These political divisions are arbitrary anyway.
Not sure how interstate activity and travel is regulated/limited in China (in normal times), but in the U.S., state borders are not just some imaginary political construct. Laws and services – and therefore, impact to respective populations – can drastically differ by state lines, and ignoring that is a huge mistake.
I think what they mean by the political divisions being arbitrary is that you could use a different division system, for example city boundaries instead of province, and have something completely different, with even more contrast in densities.
I concur. If you are concerned about catching it during your travels, people who don't have it are just as irrelevant as the number of automobiles who don't have it; thus, cases per people is just as irrelevant as cases per (people + automobiles).
You should be concerned about the fraction of land area on which you are at high risk. If 100 people have it, and each person creates a high risk across A area (and the areas don't overlap), that is 100*A / country-area. Which is proportional to cases/area (presuming A is constant) the first statistic he used.
EDIT: if you know you are going to interact with N people, the cases per population figure is relevant again.
It's that the case anywhere? Even city centre / suburbs will have different values, much less a whole province where A may contain an empty field or a group of residential buildings with 50 floors and shared lifts.
I could see how it would be useful for a map of a city, or maybe even at a scale of some regions... but not for comparing totals between regions.
Of course you are always going to interact with N people. Your family consists of n₀ people, there are n₁ people in your office, there are n₂ people around you on the train, etc. Area calculations would matter if each infected person somehow densely contaminated an entire circle of a large diameter.
This is interesting but sadly their service is not accessible in Iran, one of the hardest hit countries by Covid-19, not due to censorship by the Iranian government, but due to server-side blocking of IP addresses originating from Iran. The reason: US sanctions!
Email me if you need a rehost in EU - We can do a stream of the content from JH and feed it through, it might be ~2 seconds behind but if you guys are stuck for info it might be better than nothing.
Also worth considering whether you really need to aggregate all cases in the same province. If you can get higher-resolution data, use it. (E.g. for each prefecture in Hubei province: https://news.sina.cn/project/fy2020/yq_province.shtml?provin... Their visualization isn't great, but someone else could use their data to do a better job.)
OP wants to give lesson about mapping yet include Taiwan in a map of China. Seriously? He should learn about geography and country borders first before writing any blog post.
How about something constructive? Please write an uncontroversial blog post explaining "country borders". It should be easy since there's never once been disagreement on the subject in all of human history.
How about opening Wikipedia and reading about the subject? I don't owe you any explanation. There is no ambiguity or "disagreement" (except on the Chinese side) on the matter: the fact is Taiwan has never ever been under the sovereignty of the People's Republic of China, period. The preceding entity that had both control over mainland China and Taiwan was the Republic of China, which is now ruling Taiwan.
If for you the fact that a country emits internationally valid passports, print its money, has a government and an army is not enough to be a real state, you're a living in a province of China too.
That is a BIG problem with this article - it looks like Chinese efforts to infiltrate all maps with their idea of Taiwan are pervasive and this author has unwittingly used their version of map data.
A map seems a terrible base layer for any information that isn’t trying to show proximity or proportional landmass. Seems ridiculous to mess around with talking about a projection when it’s still showing provinces related to their shape and size, which is worthless information here. Why not just use a population cartogram as the base?
If anyone from Esri ever reads these comments, please for the love of maps stop using scroll wheel and pinch to move maps north-south. Nobody, literally nobody, has ever wanted that, literally never.
I was thinking today that it would be a good use of that map data thats recorded from our phones GPS for mapping the route of the virus.
Hypothetically: If all infected submitted their map data for the last few days (annonymously - no need to identify people) and all of that data was plotted over maps, you could identify the routes and direction of infections.
I don't know if it would be anything more than an interesting visualisation of the data already collected, but the comment mentioning Edward Tufte really got me thinking how to visualise the data we have properly.
We haven't seen something spreading like this in my lifetime anyway and at the same time, we've never had so much data on ourselves in my lifetime either, might be a good time to put it to good use for once.
Incidentally, I made a demo app with proportionally sized circles like they suggest, and it allows to move day-by-day to see the progression. https://coronaprogress.com/
Could you allow the viewer to select the color of the case indicator? Or maybe just add a contrasting outline on the circles? I'm a not-at-all-uncommon type of colorblind, and I find it very, very difficult to make out red dots on green satellite image.
The specific type of coronavirus is clear from context. Language would be very tiresome if we had to give a full taxonomy for every term we use.
For instances I say "I saw a fox outside my house yesterday". I don't need to specify the exact species of fox, and anybody who knows that I live in Europe will know that I mean vulpes vulpes.
I'm unclear as to whether we should be seriously concerned about Coronavirus in the US at this point. Are there preparations I should be making or precautions I should be taking? People have been WhatsApping me articles about face mask shortages, but I don't know if this is just scaremongering.
(Last: if you're sick, if the outbreak is local, if you don't absolutely need to be somewhere.)
Ready: Pandemic preparations: Community mitigation guidelines to prevent pandemic influenza https://www.ready.gov/pandemic
Before a Pandemic
- Store a two week supply of water and food.
- Periodically check your regular prescription drugs to ensure a continuous supply in your home.
- Have any nonprescription drugs and other health supplies on hand, including pain relievers, stomach remedies, cough and cold medicines, anti-diarrhoeal medication, fluids with electrolytes, and vitamins.
- Get copies and maintain electronic versions of health records from doctors, hospitals, pharmacies and other sources and store them, for personal reference.
- Talk with family members, loved ones, neighbours, co-workers, and other frequent contacts, about how they would be cared for if they got sick, or what will be needed to care for them in your home.
During a Pandemic
Limit the Spread of Germs and Prevent Infection:
- Avoid close contact with people who are sick.
- When you are sick, keep your distance from others to protect them from getting sick too.
- Cover your mouth and nose with a tissue when coughing or sneezing. It may prevent those around you from getting sick.
- Wash your hands frequently to help protect you from germs.
- Avoid touching your eyes, nose or mouth.
- Practice other good health habits. Get plenty of sleep, be physically active, manage your stress, drink plenty of fluids, and eat nutritious food.
(Most of the prepatory advice will be familiar to Bay Area residents as typical earthquake preparedness. Elsewhere it's standard preparation for major winter storms or hurricanes. Be prepared to sit tight for a few weeks.)
As someone living here in the Bay Area, I have little to no interaction with the current ongoing Coronavirus outbreak. Why are Asian countries taking such dramatic measures right now?
To be willing to take on such an economic drain in order to do so makes it seem like they're treating the virus like a potential pandemic. Are the death rates for the current coronavirus outbreak substantially higher than the regular flu? What else am I missing here?
Yes, the death rates are somewhere around 20x higher, and it is much more transmissable. The nytimes has a great graph showing the range of possible values for death rate and transmissability of the virus as compared to other historical viruses. https://www.nytimes.com/2020/02/18/learning/whats-going-on-i...
My only quibble is that the shape of the uncertainty shouldn't be a box, it should be oriented around a downward sloping line.
The true death rate is unknown, the currently measured case fatality rate is 20X higher than the flu. This should be taken with a big grain of salt, because early case fatality rates are generally highly overestimated due to mostly testing people that have severe complications. The seasonal flu is much better researched.
The passengers on the diamond princess cruise ship [1] could give us a better estimate of the true death rate, because they all got tested regardless of symptoms. So far 4 out of 700 people have died (0.5%). The death rate for people 65+ with the seasonal flu is 0.9% [2]. If the coronavirus is as deadly as the flu we expect 7 people to die, if it is 20X as deadly as the flu we expect 140 people to die.
This isn't the best article, it appears to be a student activity guide. I was just intending to link to the graphic that has appeared throughout the nytimes' coverage.
I don't think that is an accurate characterization at all, what they were saying about color choice was more nuanced and allowed that things might get much worse. Why start with the most evocative "danger" mapping when there are a lot of unknowns?
Hospitalization rates and mortality rates are up to 20x those of the flu. This upper bound will probably decrease because the denominator is undercounted.
The case mortality rate seems to be much higher than the flu. It's also extremely infectious because you can infect people while asymptomatic, having no symptoms. You can be asymptomatic up to 24 days. They're also saying that you can still infect others after recovery.
Yes, the death rates are substantially higher than the flu. Flu is about .1% while coronavirus is 2% and remember that we have flu vaccines and somewhat effective drugs to combat influenza and nothing (yet proven) comparable for this new disease. High rates of hospitalization and injury.
In the case of China, chaos resulting from the panedemic has the potential to undue Xi Jinping's reign, so his cadre has decided to take the hit on the economy and go full war mode to combat it.
That being said, it does pose a critical danger and greater mortality rate if healthcare infrastructure is overwhelmed. The drastic lockdowns do help control the spread to a degree that mitigates this possibility and allows for the ramping up of response capacity. The US should be responding with comparable force (and probably will be forced to in the coming weeks), but there's a lot going against taking action at the moment, from poor national coordination, Trump administration cuts and malfeasance, bureaucratic impediments around mass testing, and outsourced supply chains.
No vaccine. No way to know that someone is infected and spreading the disease. More deadly than the flu. More devastating even to people who survive, something like 10% of those infected require weeks of intensive hospitalization.
Because it’s spreading silently and is so impactful to its victims it’s a really big deal.
>We’re mapping a human health tragedy that may get way worse before it subsides. Do we really want the map to be screaming bright red? Red [...] can connotates [sic] danger, and death, which is still statistically extremely rare for coronavirus.
This really seems like a case of "it's not a bug, it's a feature". It may be rare (so far, anyway), but few would argue "danger and death" is an inaccurate characterization.
What caught my eye is that in Hubei province, which we all imagine as a zombie apocalypse now, only ~110 out of every 100,000 got infected, and at this point the new infection rate is going down.
Which is quite amazing, given that the virus was spreading there for weeks (at least) without anyone being aware of anything before all the mess was uncovered and announced.
I don't think they have done any random sampling in the testing. If you only have mild symptoms (could be it or something else) it's best to stay home rather than risk catching it at the hospital.
If you enjoyed this post, then I'd really recommend "The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures" by Donna Wong. http://www.donawong.com/
Thanks for posting this, and not just because it's immediately relevant. ESRI goes over some really good guidelines for visualizations as well as interpreting them that can be applied to anything you see in e.g. the New York Times.
A proportional symbol map doesn't seem very useful. It's meant to scale by area of the symbol, but at least personally I find myself comparing the diameter of the symbol.
Its funny that for a software company ESRI basically owns the GIS market. I really liked how this article goes over processing a projection and communicating reality w/o panic and sky is falling insanity.
I'm not convinced the 3D map isn't the most illustrative and compelling graph of them all. It says way more than the sprinkle of hard-to-parse dots does.
In what data visualisation situation would you use anything other than an area equal map? What advantage does the web projection offer over area equal?
Conformal map projections are important for navigation. Equal area projections can be conformal over small areas, but not over large areas.
Web Mercator Auxiliary Sphere is a good default for a software application to use. It is global, meaning that regardless of what data you dump onto it, it will show up on the map. It is conformal, which means if you zoom in, shapes will be preserved. If you zoom in on a town square that is actually square, it will be square on the map, too. North is up in all locations.
That being said, it's only a good default because if you users aren't knowledgeable enough to select the right projection, web Mercator aux sphere is the least bad, lowest common denominator option. When you as a user choose what projection to use to visualize your data, it's usually wrong to select web Mercator aux sphere. But if you were never going to make the effort to select the right projection anyway, it's not a completely terrible default.
Note that web Mercator is different from web Mercator aux sphere. Web Mercator is not conformal, which makes it pretty useless. Many people use the terms web Mercator and web Mercator aux sphere interchangeably, which they shouldn't.
As a point of reference for how deadly the corona virus is, you are more likely to die by murder in New York city than you are to die by corona virus in Hubei, assuming a death rate of 2.5 percent. The murder rate in NYC is 5 per 100K whereas the infection rate in Hubei is 111 per 100K.
Edit: This isn't really a fair assessment. See the comments below.
The infection rate is still unknown, so spreading misinformation like this is not helpful.
It took just 1 person to infect 600 people on a 2,700 passenger cruise ship, many of which happened even after quarantine and medical staff were introduced.
That means that 1.9 million NYC people can be infected and 38,000 people can be killed from just one person.
> It can easily connotate danger, and death, which is still statistically extremely rare for coronavirus.
No, no, no. 2.2% (conservatively) is not extremely rare for a virus that we have been helpless to stop from spreading to every continent except Antarctica.
Really interesting article.
Ironically, though, an article about mapping responsibly is using a map of China with Taiwan on it.. That's a pretty huge oversight.
Edit: Downvote all you like guys - but Taiwan is an independent nation. :)
Lots of people in the west unfortunately don't know or care about the situation and just grab whatever map of "China" they can find on google images, unaware its got another country in it.
What's odd though is the author even plotted data for Taiwan, so they must have seen what they were doing..
The article raises several good points, but inexplicably includes Taiwan in a map of coronavirus in China. Might as well include North/South Korea as well.
This is obviously a tangent to the author's main point, but it is interesting because it makes me curious if the author purposely included Taiwan so that his blog post would not be banned from dissemination in China based on Chinese government rules on "One China": https://www.scmp.com/economy/china-economy/article/3033331/d...
Please let's not go off topic into that one. A plausible interpretation is that someone made a mistake. Even if it wasn't a mistake, there's no new information here that could support a discussion, so we'd end up with a generic Taiwan/China flamewar. Such threads are bad because they're repetitive and predictable, and of course get nasty.
It's not off topic. Millions of people live in Taiwan, including yours truly.
Taiwan's exclusion from the WHO and other geo-political bullying notwithstanding, the situation regarding this outbreak is dramatically different here than it is in China.
The impacts of travel restrictions on people who have recently visited Taiwan are also dramatically different than those who have recently visited China.
Indeed, it is not off topic when the topic is "Mapping Coronavirus, Responsibly."
One practical implication of grouping Taiwan with China is that Italy banned travel from Taiwan along with China, even though the situation in Taiwan is basically fine.
It's also not plausible that the author, a professional cartographer, just made a mistake about Taiwan.
> One practical implication of grouping Taiwan with China is that Italy banned travel from Taiwan along with China, even though the situation in Taiwan is basically fine.
It is interesting that Russia acted differently from Italy: to ban travel from China but not from Taiwan.
Listen there are comments on the post itself raising the idea that responsible includes proper borders let's take the party there and let poor HN off the hook for once
And I agree the implications are far reaching as are the chilling effects and the fact of the inclusion of Taiwan on this map is more relevant than the rest of the article in many people's eyes cannot be discounted
It's clearly done on purpose: the third graph is explained as "Here’s a bar chart of the number of cases by Chinese Province." and it includes Taiwan in the list. So it's not clearly not a mistake. This is of course inacceptable, and people (including myself) are in their right to point it and complain about it.
It seems unlikely that someone posting on the blog of ArcGIS, a geographic information system, made a "mistake" by being unaware of the geopolitics of Taiwan vs the People's Republic of China.
I don't think it's at all likely that this author is consciously making an explosive political claim about Taiwan in a post about epidemiological cartography. If you or anyone thinks otherwise, find some evidence and make a separate submission about it. You could begin by asking the author.
In the absence of additional information, all we're going to get is the same old predictable geopolitical talking points. That's what happens when there's nothing but a single bit of provocation to talk about. That sort of high-indignation, low-information discussion is off-topic on HN.
Within your goals to keep HN as clean as you have (incredibly) done, this makes sense.
I don't know how to reconcile your goals with another person's rejection of any kind of normalization of Chinese threatening of Taiwanese sovereignty. Perhaps the only option is for those unwilling to let articles get away with glossing over Taiwanese sovereignty unchallenged to get banned / downvoted to oblivion. I think that's acceptable, though sad, because I like it here.
Good data visualization is hard, and most mapped data that isn't geographical in nature is poorly done (cue xkcd cartoon).
Some of the point in this discussion are pretty good, but the thing I missed is a good commentary on the temporal nature of anything like virus spread.
Despite this ESRI-backed article on the subject, I think the popular ESRI-driven map dashboard for Coronavirus[1] has a major flaw that violates the crux of this article. Dot density maps _MUST_ be set to scale relative to your map scale, or else you get nightmare scenarios like this one[2]. This is doubly true if the dots are varying in size (which I also think is a fundamentally terrible representation, because people suck at mentally comparing areas). If I were to modify it, I would probably use a choropleth-like representation. Keep the dots equally sized and colour them different shades of red. That way nobody's brain will mislead them into thinking "this larger circle means a larger area is all infected."
[1] https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.h...
[2] https://imgur.com/NPhEzk7