Hacker News new | past | comments | ask | show | jobs | submit login
Inequality and Mass Transit in the Bay Area (dangrover.github.io)
184 points by dangrover on May 7, 2013 | hide | past | favorite | 133 comments



"The results show the Bay Area's economic inequality and its relationship with transit and urban form."

I'm not sure what they are supposed to show. It seems to show me that there's very little relationship between income and mass transit service.


Yeah, the Bay Area has too many rich outer areas to make any sense of it. The one about New York from about a month ago is a lot more clear - rich center, poor periphery. http://www.newyorker.com/sandbox/business/subway.html


In both NY and SF, if you keep going further out (beyond the base mass transit system) incomes start to rise again. There's basically the rich central city, the poorer surrounding "urban" neighborhoods, and then the rich suburbs. Chicago too.


Indianapolis too


I guess that was the point. Both places have inequality, but the Bay Area is wacky, and not so evenly covered with mass transit. And the mass transit that exists is fragmented between different agencies.


Agree: I assumed the chart would show that low income areas are less-served by transit stops, but that doesn't appear to be the case. It just shows that income wildly varies by stop..?

I'm confused as to what I'm meant to glean from this data.


Data aren't supposed to show anything. That's only if you're trying to push some kind of agenda, publish your work, or get funding.


That doesn't really mean anything. Of course you can draw conclusions from data.


Sure, but you're not "supposed" to. Or maybe you are, which is the problem. "Inconclusive" or "negative" results are just as meaningful (i.e. no conclusion is still a conclusion), but we're biased towards "positive" results showing identifiable correlations.


That's because there is no value in gathering and publishing data for its own sake. Conclusive results are valuable because you can act on them to achieve something. And there's nothing wrong with that.

Ignoring inconclusive/negative data is only problematic if it contradicts the data you choose to act on.


What about genome sequencing? Or census taking? Or benchmark characterization? There is quite a bit of data publishing that is valuable to other researchers.

Talking about what didn't work is good because it stops other people from repeating failed experiments.

Finally, in this case, it does contradict the NY study.


i dont understand what you are trying to say.

data certainly do not have preconceived notions, but i dont think that is your point.

people are not supposed to draw conclusions from data? that sounds like exactly what people are supposed to do, if they are to survive.

is your point that humans are biased towards certain results? that does not follow from your earlier two points ("Data aren't supposed to..." and "Sure, but you're not 'supposed' to...").

why do you think humans are biased towards "positive" results?


You're missing Throwaway's point entirely. The context from higher in the thread:

> I'm not sure what they are supposed to show.

In other words, "in the context of what argument are you presenting this data?" Throwaway is saying that the person presenting the data needn't have an intended conclusion; it doesn't need to be part of an argument at all. It's OK, in fact preferred, for it just to be data from which you can draw conclusions, or decide there are no interesting conclusions to draw.


Except that they do need to have a conclusion. Otherwise they're wasting everyone's time showing off "this data is meaningless". The only time you want to show 'no pattern' is if someone asked for data for/against a pattern.


There is no such thing as meaningless data. Data are just observations. Meaning is constructed from the interpretation of data.

The primary questions of good science are what, where, when, and who. These are the questions you answer when you collect data. Once you've answered them you can address secondary questions of why and how. Asking why and how without giving priority to what, where, when, and who is putting the cart before the horse.

When you are unable to answer why and how for a given set of data, it is not meaningless. Rather, the lack of correlation or explanations just says that perhaps we need to look into this more deeply. "I've looked at the data and I don't know" is a profound statement, and it can be inspiring.

Science also has to be falsifiable, and effectively that's what these graphs do, at least as far as extrapolating from the NY study goes.

I agree it would have been more helpful if the author had presented conclusions about what the data mean or don't mean, but they aren't a priori meaningless simply because there isn't a visible correlation. No correlation, which is the rather obvious conclusion, is just as meaningful. I hope this is more clear.


interesting. the authors made the statement "The results show the Bay Area's economic inequality and its relationship with transit and urban form."

you are saying that a fair conclusion is "there is no clear relationship between transit and income in the bay area", which could be the conclusions the authors were drawing (i think it's not clear exactly what conclusion if any they drew).

i actually was not thinking about it correctly in retrospect. i saw the nyc graph and i thought "manhattan = rich, everywhere else = poor". i saw these graphs and i thought "don't see anything". i assumed the conclusion from the nyc graphs was superior, because there was a clear positive relationship there. you are saying that the "don't see anything" conclusion is just as valid, even though it is not a positive relationship.

i (now) agree with you.


Yeah! Well, another point comes up here too. Do we really need to look at subway lines to find out if rich people live in Manhattan? It seems like you only need income tax returns to answer that one. It's more like, are the stops on subway lines segregated into rich and poor clusters the same way that physical neighborhoods of rich and poor people are segregated on a map? And then speculation as to why or why not is interesting. For SF it's hard to answer, but for New York, well, to build a train track that leaves Manhattan is expensive unless you're going to the Bronx, so you're going to put all the Manhattan stops together.


Data that answers a question nobody wants answered is effectively meaningless. If you are showing a lack of correlation in a situation where a correlation might be expected then good job. If you are showing a lack of correlation between giraffe migration and cactus branch count then you're wasting everyone's time by bringing it up.


There are multiple questions here:

What do the transit vs. income graphs for SF look like?

How do the SF graphs compare to the NY ones?

Is there a clustering of rich and poor stops in SF like there is in NY?

And then finally, what are the possible explanations?

Sure, they didn't answer the last question, and you have to inspect the data to answer the second and third ones, but it's okay to provide data for other people to look at.

Surely if the first question was worth answering for NY, it's worth answering for SF. You don't answer questions simply because you expect to find something, you answer them because you're curious.

Do you know the story about Richard Feynman and the wobbling plate in the cafeteria? It's another question that "nobody wanted to answer".

https://www.youtube.com/watch?v=x98SEQUo48c


The wobbling plate is an explanation, not a yes/no answer about correlation. If you can explain something then by definition there is causality in there somewhere.

Look, I'm wording myself badly today, let me try again. The question of 'is there a correlation' is worth answering but ONLY because a correlation is plausible. If you are graphing data to apply to a PLAUSIBLE hypothesis, then your work is reasonable. But if you are instead graphing random junk without any reason, you are wasting everyone's time. Data needs to cause some kind of mental connection in the viewer. That is a VERY low bar to meet. This article meets it. But not all theoretical articles do.

No conclusion is not necessarily a conclusion. Data doesn't have meaning but showing people data should have meaning.


Ok ok, the thing about the wobbling plate is that Feynman says the freedom to investigate a question that was only interesting to him was what led him to get the Nobel prize. Here's an excerpt from his book that's faster to process than the video I linked to:

http://www.physics.ohio-state.edu/~kilcup/262/feynman.html

But otherwise, I think we basically agree. Mostly I thought you were talking about the SF vs. NY thing and this article, not about inane investigations into arbitrary correlations (e.g., Is there a relationship between the number of steps someone takes per day and the number of spoons in their apartment? - well, actually, there probably is, especially if you start taking away knives and forks too). The most important thing I guess is to have a question that the researcher is interested in answering.


I believe this is the reason. The accompanying article highlights socio-economic issues, not mass transportation problems.


Could we please get username.github.io subdomain support on HN? Unlike the old github.com, these are all user content so it makes sense to distinguish them just like for wordpress.com.


I added this to the Feature Requests thread, but I have no clue if pg actually looks at that.


Ok.


Income and wealth are not the same thing. Wealth is a measure of total assets, while income is annual wages and investment income. A lot of high income people carry huge student loan and housing debt and are not wealthy.


There are few student loan totals large enough to make what is otherwise considered "high income" not wealthy.

The difference in lifetime earning between two people, both of whom have a high five figure or low six figure income and good growth potential, one of whom has $150,000 in student loan debt, is not very large.

Housing debt isn't relevant unless to wealth unless the house in question is underwater.


Don't forget about lost earnings and interest. Med school followed by 250k job vs. 120k withought med school is not as clear of a win as you might think.


So I did a back of the envelope.

If you start at a $120,000/yr job at 22 and retire at 65, with increases of 2% a year, you will earn ~$8.3 million nominal over the course of your career.

If you start at a $250,000/yr job at age 30 and retire at 65 , with increases of 2% a year, you will earn ~13 million nominal over the course of your career. That doesn't include the (modest) pay during the second half of medical training.

The estimated cost of attendance for the most expensive medical schools top out at around $60,000 a year for four years. Assuming you meet the requirements, the federal government will lend you $40,000 of that at 6.8% and the rest at 8.5%. If nothing is paid off until after one's residency, that means our doctor would be starting his career with $330,500 in debt and a $250,000 salary.

Meanwhile our petroleum engineer(?) would be earning $140,600 at that point. If the doctor put away the difference every year until his debt was paid off, it would take less than four years.


So at 34 your doctor has zero debt. Let's assume the 120k job is saving 10% at 8%ROI for those 12 years that's worth ~250k at 34 and adds an efective 20k a year in interest. On top of that there paying significantly more in taxes and that lifetime earnings gap is far less meaningful.


Fair enough. At that age, the doctor will be making $270k debt free and the engineer $152k + $20k. In any event, the non-financial trade-offs swamp the financial ones, I would think.

My basic point in the great-great grandparent post is that bellyaching of the form I'm not really rich because of X, usually doesn't stand up to scrutiny.


Housing debt won't drag your total assets down unless you happened to buy a house that's now underwater. The usual situation is for the difference of house value and mortgage amount to be significantly positive.

Income and wealth aren't the same thing, but they are highly correlated.


>Income and wealth aren't the same thing, but they are highly correlated.

No they are not.

According to the US Treasury:

"These low realized rates of return call into serious question the use of realized income from capital as part of any measure of well-being or ability-to-pay. For owners of capital, economic income may have little relationship to realized income, and rates of realization may vary according to the assets they hold."

According to the Federal Reserve:

"...very wealthy people try quite hard to minimize their income."

[1] http://www.treasury.gov/resource-center/tax-policy/tax-analy...

[2] http://www.federalreserve.gov/econresdata/scf/files/wealthin...


They are actually quite correlated. Your links argue that there are cases where wealthy people don't have high income and vice-versa, but that doesn't disprove the quite strong general trend.

See table 5 (pg. 36) here, which shows both net worth by income percentile, and income by net worth percentile, to see the strong relationship between being high-net-worth and high-income: http://www.federalreserve.gov/pubs/feds/2009/200913/200913pa...

Example numbers: the top 1% of 2007 income-earners held 26% of U.S. wealth, and the top 1% of households by net worth in 2006 earned 16% of the year's income. Meanwhile, the bottom 50% of income-earners held only 14% of U.S. wealth, and the bottom 50% by net worth earned only 22% of U.S. income. The general trend holds in the in-between categories as well: the 95-99th percentile of incomes hold about twice as much wealth as the 90-95th percentile, etc.

So the relationship is not perfect, but taking the groups in aggregate, higher-income-earners control considerably more wealth than lower-income-earners. Some of the later figures in the document explicitly plot some ratios.


The correlation exists, but is not extremely strong, as the GP stated, which is what my point was. I never said there was no correlation.


How is it "not extremely strong"? If there were no correlation, you would expect the top 1% of income earners to hold 1% of net worth, because there's no expected relationship between being high in income and high in net worth. If there were a correlation but a weak one, you might expect them to own a few times the uncorrelated amount. Maybe they'd own 2% of U.S. assets (twice the otherwise expected amount), maybe even 5% (five times the expected amount). Either of those would be enough to establish a clear relationship, but a weak one.

But they actually own 26% of American assets, twenty-six times the amount you'd expect in the uncorrelated case! The top 1% of Americans by income own a full quarter of all the country's assets— stocks, bonds, real-estate, etc. That seems like a pretty strong relationship.


This just isn't good statistics. The fact that the vast majority of hysterectomies are performed on women doesn't tell us anything about the probability that a randomly selected woman has had a hysterectomy; the fact that the vast majority of wealth is held by the rich doesn't tell us anything about the probability that a randomly selected rich person has a lot of wealth.


Err, the fact that the vast majority of hysterectomies are performed on women does tell us that hysterectomies are strongly correlated with sex. That was the initial dispute: whether wealth and income are strongly correlated or not. The distribution is a separate argument, although I'll note that the PDF I linked has some data on the distributions as well, and it does not support the "weak link between them" argument. The proportion of very-wealthy people with low incomes, and very-high-income people with low wealth, is actually quite small.


I took a quick crack at the correlation implied by table 5 in that paper you linked. Assuming wealth was constant across the ranges specified, I came up with a correlation of 0.45. So meaningful but not high. I personally suspect that it's actually much more highly correlated and that fact would come out if I didn't have to assume wealth was constant for 0-50 and 50-90, but it would be conclusory for me to incorporate that into my numbers.


There's a correlation in the truest sense between income and wealth - your chances of creating wealth increase with the level of your income.

This doesn't mean that having a high income means you are wealthy, as you suggest, but that they are correlated.


GP suggested they were highly correlated, I disagreed. You are stating that they are correlated but not highly so.

So you agree with me then?


I think overall it is rather highly correlated, yes. In fact, I'd submit that the number one predictor of having a high net worth would be to also have (or have had) a high income.

The easiest formula to build wealth is to have positive result to the income -expenses equation. If you accept that, income and expenses should be the two strongest correlations to large amounts of wealth overall.

Considering expenses need to be something north of zero in the best case and income has no upper bound, I'd give income the edge over the two in terms of overall strongest factor.


You're looking at one year of wealth creation here, and dancing around the obvious third variable: time. Old people are much more wealthy than young people[1] despite having comparable or lower incomes.

[1] http://www.pewsocialtrends.org/2011/11/07/the-rising-age-gap...


Sure, they are.

I'm not sure how that translates into an argument either for or against high income correlating to high wealth.

We're all using the same variables for time.


Just to add: Income to wealth correlation though is a matter of one being used to create another. The level of one's income has an impact on how easy it is to create wealth.

Aside from that though, the two should never be conflated. Many people are able to create wealth with small to moderate incomes, while higher income doesn't mean higher wealth if you also have high expenses.


I think it's also important to note that the causal relationship goes in both directions. Higher income increases the ability to create wealth, but also higher wealth increases the ability to have a higher income.

Certainly, they are very different things, and individual variance can be high, but they are definitely strongly correlated in the population at large.


but also higher wealth increases the ability to have a higher income.

This I don't quite agree with. Overall it makes sense but it is not as direct a correlation.

Wealth gives you leverage, but only by the weakest of factors can it give you higher income directly. There would always be some sort of intermediate step. There are few ways you can transform wealth directly into income without significant work. (At that point it becomes less a factor of the wealth directly and more you earning an income.)

Eg. You are born wealthy and because of that your father knows a lot of people willing to give you an opportunity that a person not born wealthy wouldn't have. This only works if you aren't an idiot, or an alcoholic, etc. If you are, the chances of you turning your own wealth into income which you can then convert back into more wealth is significantly hampered compared to someone without these issues but less starting wealth. If you aren't putting work and effort into exploiting the leverage that wealth provides, you will not create income.


I don't see this as counter to income and wealth inequality. Certainly there are those with medium to high income whose debts put significant pressure on their lives, but having access to homes and higher education is a big part of the wealth and income inequality in the US, so even having access to those things already means you have a certain level of wealth or privilege that many poor people do not.


Person A starts with $0. Person B starts with $10,000 and spends it. Now they both have $0.

They are not equal.


That would depend on what person B spent the $10,000 on.


No, it wouldn't. People with high incomes like to pretend that worthwhile uses of that money -- student loans, school for their kids, a house in a safe neighborhood -- somehow shouldn't count toward their spending. But that's the whole point of money! You get to spend it on the things you think are worthwhile!

"I'm not wealthy! I spend my entire (large) income every month!" is a profoundly silly thing to say.


I don't think you understand the concept of wealth and income from an economics perspective.

When person B has $10,000 he is wealthier than person A.

If person B spends $10,000 he is losing wealth. Depending on how he spends his 10,000, this may result in an overall net gain in his wealth (i.e., an investment, like buying a house) or an overall net loss to his wealth (i.e. a purchase, like buying a car, since cars depreciate).

Your wealth is an overall assessment of what you have accumulated. Your wealth is the sum of your current position + (revenue - expenses). Note that your current position can be negative regardless of how many cars, boats and homes you are entitled to use at the moment. This is the situation of many, many Americans at this time.

Your income, on the other hand, is simply the revenue. A relatively tiny part of the equation.

So it is entirely true that you can have a high income and little to no wealth. While it might seem silly to you from the perspective of someone that is looking at the use of possessions, depreciating or overly-leveraged positions make the statement a perfectly normal thing to say.

You might envy the house, but if that house is unable to be leveraged because more is owed to the bank than it is worth, it is actually a net loss when trying to determine wealth. A renter is actually better off than this homeowner.

Once you understand this, you'll realize that even with a low income, maximizing wealth creation opportunities is what pays off in the end.


Remember that we're talking about recurring income streams. You get to start over with a new $10k each month. It's silly to argue about how the $10k/month is grown or spent. It's still several thousand dollars a month more than someone below the poverty line! We don't need to dig much deeper than that -- the differences are pretty clear -- but for reasons that I think are obvious, a lot of people making that much want to focus on how poor they feel after they've spent the money.

"Once you understand this, you'll realize that even with a low income, maximizing wealth creation opportunities is what pays off in the end."

This is another one of those silly things that people with large incomes say a lot. People making minimum wage will never, ever, ever be able to save at a rate high enough to make any sort of difference. Parables about hard work and austerity are fun, but they miss the point.

And besides, you're just completely missing the point. Of course you'll be poor if you blow all your money. Who said otherwise? The point is that it's pretty silly to say at the beginning of each month, "I'm struggling just as much as everybody else" based on the fact that you're about to spend all that you've earned on luxury goods and services.


Your attitude is both ignorant and defeatist.

Let's keep in mind here that you were the one that introduced a "good/bad" evaluation into the discussion with your comparison of people against each other. I made no such comparison. I'm not choosing to critique a person's life choices, I'm simply explaining how wealth and income, although correlated, are not the same thing.

You are equating both "lots of stuff" and "lots of income" with wealth, and that is an improper view of the matter.

It's silly to argue about how the $10k/month is grown or spent.

Except that this balance between the products of the equation will make all the difference in the world to the end result - wealth. What is irrelevant is the dollar amount. What matters is how the equation is balanced.

It's still several thousand dollars a month more than someone below the poverty line!

While true, this is largely irrelevant to this overall picture. The equation remains. We can address a minimum expense expectation overall as a society, but that's a different topic, and we have already established some hard lines. The basic fact is that if your income is less than your expenses, you are becoming less wealthy. What the person next door is doing is not important. Wealth is not a zero sum game! The neighbour having a ferrari does not detract from your ability to earn an extra $10K. It doesn't matter to you at all, really. If he is complaining because he cannot afford that expense, he has the same issue as the person that is complaining they can't buy an xbox.

People making minimum wage will never, ever, ever be able to save at a rate high enough to make any sort of difference.

I hate this bullshit. Why? Because I did it, and I came from far worse circumstances than 70-80% of the people that have this defeatist attitude.

That's that bit about investment I was mentioning. I won't lie, it isn't easy to live on minimum wage. It's even harder to work at that level where you are deemed "taxable" but can't really afford anything. That's why you invest. You look to change the numbers in the equation to create a situation where you can build wealth. You put yourself in a position where you earn more and spend less, and then you do that for a decent period of time.

you're about to spend all that you've earned on luxury goods and services.

What's a luxury? Your shoes? Your beer? Your cigarettes? Your fast food lunch? Your Honda? Your cable tv? Your boat? Your house? Your Ferrari? Your private jet? Your island? This is entirely dependent on perspective, and thus, for the most part, irrelevant.

In reality, the poor overspend just as foolishly as the middle class and rich do. Only the nature and amount of the luxuries change. Not being able to afford a pack of smokes or cable TV is exactly the same as not being able to afford a boat or a second car. Either way, you can't afford it, so if you want to enjoy these things, you need to get yourself into a position where you can afford it.

It's meaningless and unproductive to compare a person overextending themselves on their McMansion to a person overextending themselves on McNuggets. The solution is the same - adjust the equation. EVERYONE can adjust the equation. EVERYONE can live cheaper than they do right now. EVERYONE can earn more than they do right now. The only difference between us all is that some of us will sacrifice to achieve these goals and some of us won't. Regrettably, one thing the poor do share is that most of them fall into the latter.

And that's where I remind you that I'm not a total douche and won't be following Ron Paul of the ideological cliff. Social programs and safety nets are tremendously important. It is essential that as many people as possible are allowed to find themselves in a position where effort and sacrifice are rewarded. It's essential that as many people as possible understand what I just said to be a viable path. Personally, if you are American, I feel there is work to be done in your nation in this respect. For me as a Canadian, I think we've more than covered off this level of support. Anything further is up to the people in the mirrors.


I appreciate your lengthy response, but this is all hand-waving. I don't know why you're trying so hard to prove that there's no difference between someone with a high income and someone with a low one. But here we are.


I wish more people understood this, especially in regard to tax policy.


Exactly this.

Likewise, many of us would be unable to pick out many/most of the Millionaires Next Door in a pinch.


Nice visualization, I particularly like the link between the income graph and the map on the upper right. Built using D3, TopoJSON, Bootstrap, Angular, and JQuery. (That's a lot of frameworks!)


Slightly more interesting to me would be the median income of a person getting off at a particular stop, rather than the folk who live there. Plenty of people work in the poorer parts of the city and live in relatively expensive neighborhoods, and plenty of people also do the inverse.


Wish I could find that data. Some stops are weird because they're all hotels (Powell) or all businesses (Montgomery).


It's cute that someone made this in response to the New Yorker piece. San Francisco always wants to feel like it's one of the big guys like NYC.

If anything, this transit/income data shows how little of a correlation there is between the two. This doesn't suprise me: most people in SF have cars, and muni/BART are embarrassingly awful compared to the big the cities in the US.


compared to the big the cities in the US

cities ==> city

The only big city in the USA with decent transit is NYC. San Francisco is comparable to Boston, DC, and Chicago and far ahead of LA, Dallas, and Miami.

By world standards, NYC is barely average and the rest of the country has no transit system at all to speak of.


London, Paris, Seoul, and Tokyo all shut down most of their mass transit system overnight. NYC runs it all night long (albeit with fewer trains and some diversions). By the important metric of availability that puts it among the top.


Indeed, 24-hour rail service, even though it runs at infrequent and irregular intervals, is a big point bringing NYC up to barely average in its peer group. Other measures like coverage (suburbs count), travel times, intermodal operations (awful airport connections), connectivity (think Jersey), and jitney service and taxi availability drag NYC's score down.

I can count NYC as barely average only because the peer group includes not only London, Paris, Seoul, Tokyo, and Osaka but also Rio de Janeiro, Moscow, Istanbul, Mexico City, and Buenos Aires. If I'd just used your list, NYC would be dead last in almost every category.


If you're going to use taxi service to judge how good public transit in, your methodology is probably flawed.


Chicago ought not to be lumped in with Boston and DC. DC and Boston are both quite nice (comparatively).

The CTA is not. It's barely tolerable (and by tolerable, I mean that people are even capable of using it to get around and/or commute at all. So, the system might technically be a functioning transit apparatus, but barely.) I've used it daily for a sizable portion of my life, and it fails with remarkable consistency.


by which metric is nyc barely average?


I keep hearing that Muni and BART are awful, but how so? I've been on public transit in NYC, Boston, Berlin, and other major cities and I don't notice any significant disparities.


...what? Since when is the Bay Area not a "big" metro area?


Don't let the guys from Boston troll you.


One thing the graph doesn't represent is how often a particular train stop is serviced.

As an example, since (August?) of 2005, Atherton only has weekend and special event caltrain service. You could argue that shouldn't be surprising given the graph (median income of ~$193K), but it's still an important piece of missing information.

It would have been nice to see a graph based on total number of stops scheduled for a station overlaid with the median household income as it is currently graphed for caltrain.


I've lived here for a few years and the median income in Redwood City still surprised me. Wow.

Click the Caltrain 'Local' route and see for yourself. I had to double-check it since it seemed so implausible to me, but it seems to be roughly accurate.


I am not sure where their data is pulled from, but the numbers are not correct, at least for Redwood City. http://www.city-data.com/city/Redwood-City-California.html Redwood City median income in 2009 was $67,611.


Data is for the census tracts where the stops are located, which admittedly makes more sense for subway/streetcar/bus stops than commuter rail stops (where people drive from other tracts and park at the stations). The BART is particularly troublesome because it takes on characteristics of both types of transit.

A data science friend of mine said we should do a "watershed" model where we define an area where people flow into a stop, but I'm not sure how best to do that! Maybe someone smart can fork the project and improve on our methods.


Probably you could just bunch census tracts within a certain distance of any stop and count them as being part of their nearest stop and then average them. (Weighing correctly for population differences.) If you wanted to do it better, you could have the weight for each census tract fall off with distance from the stop.


I also looked at the Redwood City data point and it's pretty clear that they are using incorrect data. Since I have been to Redwood City and am going to be living there permanently starting a few months from now, I was a little confused. Using this tool on ArcGIS [1] I was not able to find a single corridor that has such a low median household income anywhere near Redwood City. Note that they don't seem to use the most granular data level, but zoomed out I get the number as $45,568 for tract #060816102.02 as opposed to their $30,800. $45k is still low compared to the surrounding areas, but it's far away from $30k.

[1] http://www.arcgis.com/home/webmap/viewer.html?services=da76d...


Hm, looking into that. I noticed that the lat/longs in the GTFS feeds can occasionally be off/inconsistent.


If you follow many blogs of people in the data visualization community, you'll see right away how often they talk about an iterative process in drawing out a story.

There's a big exploratory component, where they investigate possible approaches and try and find something interesting they can show.

On the one hand, you want some significant features and relationships that exist in the data to be apparent to any intelligent reader who spends a little time studying your visualization.

On the other hand, you don't want to distort the data, or impose an interpretation on it that isn't warranted.

For a well-done visualization, there is definitely a lot more work going into the final product than simply plotting some dimensions against some other dimensions.

As an example, the New York Times pours a lot of money (and therefore talent and person-hours) into their visualization work. Some of the behind-the-scenes of that operation is blogged about here: http://chartsnthings.tumblr.com/


I visited San Fran last week and with each visit I'm shocked by the plight of the homeless there.

I've lived(NYC, Philly & more)in and been in many cities across the US and never witnessed this on such a scale/epidemic.

It made me wonder what the government is doing there to help with what I see as an epidemic?


Giving people money to be homeless probably doesn't help.

"The city of San Francisco, California, due to its mild climate and its social programs that have provided cash payments for homeless individuals, is often considered the homelessness capital of the United States"

https://en.wikipedia.org/wiki/Homelessness_in_the_United_Sta...


That wikipedia page is a train wreck. It's pretty clear there's an edit war going on between two political factions that are uninterested in keeping things factual. Wikipedia is great but in some situations the quality of the content goes way downhill at the hands of people with alternate agendas.


A lot of those people have mental illness and don't want to live in a home. And a lot of people go there with the intention of being homeless because they are treated fairly well by the city and there are lot of people to panhandle from or perform for.


SF actually has a tiny homeless population compared to NY or LA, at least in number. But they're much more concentrated around the high-traffic Union Sq / Tenderloin area than most cities, and you also tend to cross paths with them more if you drive less. LA, by comparison, has a much bigger homeless population but you rarely see them when you live there, because they're spread out and you tend to drive by them rather than walking.


It's just warmer in SF comparing to NYC/Philly, so to me at comes as no surprise.


Ever been to Harvard Square in Cambridge, MA?


Well done (I'd say better than the original).

One of the things I find most surprising is the data on the bus lines. I'd assumed, as a recent LA transplant, that the bus lines would generally serve worse off neighborhoods (it is always thus in LA; minority and lower income neighborhoods get mediocre bus service, while wealthier neighborhoods get expresses and light rail).

Of course, it would be interesting to overlay the stops of the various corporate buses on top of this information. My guess is all those high points have private alternatives serving them.

Final point: this might be best for the questions it raises. How does service compare across lines? How many people does a line move and how fast? How much is the line getting subsidized (BART, I'm guessing, crushes the others in that regard).


The bus routes are indeed IMO the most interesting bits of the data.

The trick with San Francisco is that because of the buckshot nature of public housing developments in the city, poor areas as mixed in surprisingly evenly with wealthy areas. This creates a lot of negative effects for residents - the expensive and trendy Hayes Valley for example, is right next door to an extremely high-crime area, the Western Addition. Keep going a bit further and you hit the Fillmore, which is again a wealthy, trendy area.

SF does this at micro-scale. In a given neighborhood there can be extremely good blocks that are directly next to extremely bad blocks. It's not hard to walk 300 feet and end up in a completely different-seeming universe.

One thing that's interesting to note is that SF buses stop very often, so the highs and lows aren't really spread across a large geographic distance, they are often separate only by a block or two. The "cliffs" in the graph really are that steep when you project it onto a map.


Having crime spread into rich neighborhoods instead of being concentrated in only poor neighborhoods seems like a positive for residents overall since the more affluent neighborhoods have more resources to deal with it. Unless when you say "residents" you mean only the rich ones like yourself.


It's an interesting question of how that shakes out! Here's the map of police districts: http://sf-police.org/index.aspx?page=868

You'll notice the Tenderloin (a high crime area) has its own station partitioned off from the rest of the districts. The others are also interesting. The Mission station for instance, serves a really broad community which includes both the Mission District (currently undergoing gentrification) and the Castro (gentrification complete). On the other hand, districts like Bayview, Ingleside, Taraval and Richmond are just larger, generally more residential and less in the center of everything. The Park and Northern districts on the other hand, are mostly affluent with a few outliers.


Hayes Valley is arguably _in_ the Western Addition which, in turn, is not an extremely high-crime area, even in relation to the rest of SF.

The visualization does show drastic transitions but the majority of them seem to be the buses crossing the seedier areas of downtown which are, indeed, quite seedy.


The Western Addition is not an extremely high crime area.


I'm a little disappointed, insofar as this data tells us nothing we don't already know-the area around Fremont BART is more affluent than Fruitvale, OMG! But it is a nice bit of data visualization, and I applaud them for that. I live in the Bay Area, develop software, and rely on public transit everyday. I definitely see improvements in what data should be presented.

For example, for Caltrain, break each graph down further. Find the number of passengers who board at Baby Bullets, NB and SB, for the weekday commutes. What you need is more demographic data about the riders of each system, at specific times. Anyone whose been on Caltrain can tell you that the weekday commute is very much a white collar commute. Finance, Law and Tech going north to SF, mostly tech going south to MV (by the time baby bullets get to SJ Diridon, they are very empty). Palo Alto is an outlier, a lot of people commute from points south to Stanford, which obviously isn't a tech company.

Having said that, since I really enjoy developing transit software, I'm really going to take a look at the code and see what I can do. Its a really good start, and I'm happy they posted it to HN.


I don't see what this has to do with mass transit. It seems to be a gimmick. Why not just plot this information with a (two-dimensional) heatmap?


It is a gimmick, but if you're not a car person, your mental model of the place you live in is probably closer to a topological transit map, since you largely get around by hopping between nodes. It's easier to connect with data displayed that way. And it's also interesting to see what the areas are like near the nodes you pass by every day on your commute but don't get off at.


A heatmap superimposed on a topological transit map would also have been fine. I just don't see what the one-dimensional version buys.


The data seems a little suspect. For example the L line shows Montgomery Station has having low income. AFAICT there's (a) almost zero housing at Montgomery Station and if there is it's most likely super rich as it's directly in the Financial District.

My only point is before I can even try to get any meaning from this I need to know the data makes sense. Maybe there are a bunch of low income apartments near Montgomery Station but if so they sure are well hidden.

edit: Checking the Fremont line it shows the median income at Montgomery Station as $112k where as the L line shows the same station as $23k. Something seems wrong or else I don't understand what it's showing.


I also highly doubt that the median income for Union City is $138k/year.


Given the volume of Peninsula-shuttled tech workers living within a mile of 16th and 24th st bart stations, I'm surprised to see those so low. Garbage in, garbage out from census data for transient, high-rental areas like the Mission, Noe, Castro, Bernal, Potrero. Indexing to rental rates in these areas is likely more reflective of wealth (via affordance as a proxy) http://sfist.com/2013/03/07/map_average_rent_for_1br_in_san_.... Awesome visualization though.


Within a mile doesn't matter - the OP is measuring by census tracts, which are quite small. For example, if you zoom in on this [http://projects.nytimes.com/census/2010/map], you'll see that the Mission BARTs' census tracts (201 and 209) extend to only within a couple of blocks from the stations, which seem from my daily commute to be the poorest and most run-down parts of that area. (The tracts are even small enough that there's a separate tract - 208 - for the stretch of Mission between the two BARTs.) While some of the tech workers I know do indeed live between Van Ness, Valencia, Cesar Chavez, and Market, a lot live a few blocks east or west of that line, e.g. on the west side of Valencia, or in the area between Folsom and Potrero.

A side question, though - is there a specific year when the shift of tech workers to SF picked up steam? I'm finding a wave of articles complaining about the Google shuttles in 2012-13 (about when I moved out to SF), but I don't have a good feel for how far along the process was at that point.


Fascinating.

Based on rapidly increasing rents over the past 2 years in these neighborhoods, tech worker density has increased a lot. Anyone living in these areas in 2009-2010 will tell you that they'll never break their lease because market rents are 50-150% more than what they're paying due to strict rent controls.


People have been complaining about tech workers in SF for at least 15 years now (since the first boom). It's a standard (if not especially factually supported) gripe at this point.

It's possible that the proliferation of private busing has made it more visible to residents, however. It used to just be everyone drove.


Somewhat related: https://vimeo.com/63147860 a 24 hour visualization of SF public transport ridership. Each circle represents a stop.


I've been wondering a lot recently how long it will be until the Tenderloin is completely gentrified. If SF expansion remains unchecked, it has to be just a short time, as it's literally become a pocket of poverty surrounded by yuppies


It's been a pit of crime and poverty surrounded by opulence for twenty years now. The City's government works hard to preserve the Tenderloin as it is by blocking development and gentrification with building codes, tenant rights measures, permitting, policing strategies, homelessness subsidies, and more.


The Mission has only fairly recently become young, white hipster-ville though. Soma seems to go up and down with the tech scene, so assuming the current growth isn't the same kind of bubble (I don't think it is), that is just going to continue to climb as well. Certainly if all those people gentrifying the surrounding areas stay put for any length of time, that kind of collective monetary influence is going to change any city policies that may keep the Tenderloin how it is


A lot of the people that live near the stops in downtown San Francisco may be pretty poor, but I think most of the people using BART at those stops don't live there, they just work there.


I thought it was interesting how the CalTrain curve flattened out when you look at the bullet or limited service graphs instead of the local graph.


That's because Baby Bullets hit stations that tend to be in more affluent areas. Palo Alto, Mountain View, Redwood City, San Mateo, etc. Certainly Bayshore (Visitacion Valley) isn't dragging those down.


Wow, the most striking is the sudden jump between Redwood City and Atherton on the Local Caltrain route. It jumps from ~30K straight to ~193K.


That's partly true, though the area right around the Redood City Caltrain (this uses census tracts, not the whole city) makes the difference much larger than if data for Redwood City as a whole was juxtaposed with Atherton as a whole. The area right around the Redwood City Caltrain is a bit sketchy, particularly on El Camino (the downtown on the other side is nicer). Not too sure why it hasn't gotten nicer, since it's so convenient to train service (30 mins from SoMA on the baby bullet Caltrain).


See my comment above, the number they quote for Redwood City is $30k, the census data says $45k for the same tract number.


heh, you should try L.A.

the buses that go to the poor neighborhoods doesn't even ride on the main streets when in the good parts of the city. Also, the poor people buses have tinted windows!

if you can find maps, compare Metro ($1.25, short routes) routes with Dart ($0.75?, long routes) ones.


Is this median personal income, or median household income? The page doesn't make that clear.


Median household.


There are people on the Bayshore Express line living below the poverty line. Wow.


This isn't that surprising, Bayshore is not like most SF neighborhoods and has some interesting patterns.


Yes, Oakland is poor. Thanks for sharing.


typo in the first sentence:

"both extreme povery and wealth, ..."


One thing I don't understand though - how can the people in Atherton afford their houses with "only" $200k income? (yea $200k is a lot, but those houses cost a freakin' lot more!)


I have never understood how people with median household incomes afford anything in the city.

For instance, the image someone else already linked shows even the lowest rents at almost $2k: http://sfist.com/attachments/SFist_Brock/SF-Infographic.png

Yet the OP shows several downtown stops, like Powell St. and Civic Center as having less than $24k of income. That is, the average income is less than the average annual rent.

I have no idea how that possibly works.


It possibly doesn't: those people may move in, stay a while, start failing to pay the rent, and then move out/get evicted.

Alternately, they might be getting unmarked income from relatives.


I wonder if it would look different if you excluded long-time residents who bought in decades ago, when real-estate prices were much lower. E.g. what's the median income of a household that moved to Atherton in the past 20 years?


200K income, you can easily afford to allocate 25% of that to a mortgage interest (plus 10% for principal, taxes, etc), $50K/year

3.5% mortgage -> 50K/yr * 100% / 3.5%/yr = $1.5M house price

interest rates are so low these days, you can spend 50% more on principal for the same interest, compared to 2006 when rates were ~6.25%.


$1.5M doesn't really buy a lot of house in Atherton, at least more than a block or two from El Camino Real.


In going from $200K*0.25 = $50K, you forgot about personal income tax.

And houses here cost more than $1.5M.

(Side note: I was surprised to learn that private schools around here are $30K+/year!)


Another horrible repeating of the "income inequality" trope, and then showing graphs of median income that don't get near the top l% of US earners, while citing a stat that the top 1% of earners are gaining.


Found some data on the US Census website:

  Table H-1.  Income Limits for Each Fifth and Top 5 Percent of 
  All Households:  1967 to 2011						
  												  	
  Year		Lowest	Second	Third	Fourth	Top 5 percent			
  2011		20,262 	38,520 	62,434 	101,582 186,000 			
  1967 (adj)	19,931 	38,866 	55,164 	78,663 	126,232 			
  1967		 3,000 	 5,850 	 8,303 	11,840 	 19,000 				
  ---------------------------------------------------------------------
  
  Table H-2.  Share of Aggregate Income Received by Each Fifth and 
  Top 5 Percent of Households, All Races:  1967 to 2011			
  												  	
  Year	Lowest	Second	Third	Fourth	Highest	Top 5 percent
  2011	3.2 	 8.4 	14.3 	23.0 	51.1 	22.3 
  1967	4.0 	10.8 	17.3 	24.2 	43.6 	17.2 
  ---------------------------------------------------------------------
  
  Table H-3.  Mean Household Income Received by Each Fifth and 
  Top 5 Percent, All Races:  1967 to 2011						
  
  Year	   Lowest  Second  Third   Fourth  Highest  Top 5 percent
  2011	   11,239  29,204  49,842  80,080  178,020  311,444 
  1967 (adj) 10,630  29,452  47,018  65,787  118,393  186,758 
  1967	    1,600   4,433   7,077   9,902   17,820   28,110 
  ---------------------------------------------------------------------
  
  Table H-6.  Regions--All Races by Median and Mean Income: 1975 to 2011
  
  	Median income	  Mean income	
  	Current $ 2011 $  Current $  2011 $
  2011	50,054 	  50,054  69,677     69,677
  1975 	11,800 	  44,851  13,779     52,373
  ---------------------------------------------------------------------
This data indicates to me that the bottom 3/5ths of income earners earn about what they did 45 years ago if adjusted for inflation, while the top income earners have increased. The fact that the top 20% have increased their earnings drastically does not necessarily mean to me that there's a problem.

It simply means if you can break out of the slump and make your way into the top 20% of income earners that you will be more rewarded than you were 45 years ago.


Do you honestly expect whole census tracts to have median incomes at the level of the top percentile?

A factor of ten difference between census tracts in the same city is worth examining.


Isn't that what you'd expect? There's a factor of ten difference these days between the poverty line and a fairly typical white-collar income (call it $120K.) Is it really surprising, or telling, that at the tract-level you'd find a factor of ten difference in a major city? Is there a major city where you don't see that?


You usually see a gradient where the tract-to-tract variance is less pronounced. A city will almost always contain tracts where there is a 10x difference between the highest and lowest numbers, but it is not as frequent that those tracts will be located right next to each other. There are several other cities which are like this, Delhi and to some extent Philadelphia both come to mind and I think it is notable there as well.

And, for whatever it's worth, I also think the variance even when the tracts are not next to each other is equally worthy of examination. Just because it is common does not mean it isn't something we shouldn't talk about. In many ways, when the neighborhoods are next to each other it is a good thing for visibility.


New York has this... I don't think it's atypical. Tracts are so small that it only takes a single development to raise or depress median income grossly. Take a look at the middle of Manhattan: there are tracts with $10K next to tracts with $120K, but they might have as few as 20 households in tract, which means that a single building with 11 low-income tenants would pull the median down to the poverty line.

http://www.wnyc.org/blogs/wnyc-news-blog/2011/dec/08/census-...


$120K is typical? Even if you only isolate the executive/management positions from SF, which is one of the most inflated markets in the world, the median is still only about $100k. Obviously, the average of all white-collar jobs will be far lower. In more representative parts of the country, I'd imagine that it's below $50k (which is only two to four times the poverty level, not ten.)

Honestly, where do people get the idea that the average American makes $120k. That salary is astronomical.

http://www.bls.gov/ncs/ocs/sp/ncbl1627.pdf


I specifically called it out as "white-collar" income, not overall median income.

It's household income, not per capita. The median household income for SF is $73. I don't think it's out of bounds to assume that for a white collar worker (call it a 75th percentile income) that number is well over $120.

(The median household income for a number of cities in this area exceeds $110K. They're not all execs. http://citylab.news21.com/data/types/19/)

About 30% of the US gets a bachelor's degree, which seems like a reasonable proxy for the number of people working in white collar office jobs. So a median white collar worker is probably about 80th percentile income. For the overall US, that works out to $105K to $110K. http://en.wikipedia.org/wiki/Household_income_in_the_United_...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: