Amazing. If you said to someone "Hey, I wanted to know where you went after the cab picked you up last year, so I called up the cab company and asked them where they dropped you off and they told me", they would be outraged at (your behavior and) the breach of privacy shown by the cab company. But the city released a dataset that allows exactly this query. What were they thinking?
Something else that could be mentioned in the linked article: if someone you were with got in a cab in 2013, and they told you where they were going, and you remember the approximate time and location, you can tell whether it was their true destination regardless of how many other people were being picked up at the time, because you don't have to find the exact ride they took, you only have to see whether any rides went to the place they told you.
This search is even extremely resistant to the differential privacy suggested by the post's authors. I'd be much happier simply stating that location data is not de-identifiable, and no-one should use a cab in a city that logs location data if they aren't happy with an adversary knowing where they went.
What I wondered about that data set is, if two people living/working at two locations, consistently take taxis to meet at various other locations at the same times, could that pattern be identified in the data?
That is, are there locations A and B such that there are matching trips to locations M1,M2,... at times T1,T2,... i.e. (A,M1,T1),(B,M1,T1),(A,M2,T2),(B,M2,T2) and perhaps reverse trips (M1,A,T1+x) etc?
Further classification of M* -- hotels, for example -- could classify the nature of the meetings. You might be able to identify the addresses of people having affairs, or other deliberately secret rendezvous.
This would be relatively difficult for Manhattan, some parts of the outer boroughs though are a different matter.
I was concerned when this first hit HN because I have a friend that lives in a fairly sparsely part of town and his (now ex) girl friend has a possessive ex-husband that doesn't like her seeing other men. He isn't going to be able to make sense of the data himself, but if someone weaponizes it the way you are talking about it could be a real problem for people with stalkers/psycho-exes.
I wonder if you could build a small, money generating business from answering exactly that. If you could get more recent data dumps at some interval, you could even provide an email alert system.
That's almost certainly true. For any longtime NYC resident who knows things[1] it's obvious from that map that the most prominent visible destination from the Hustler Club data is Sin City in the South Bronx.
[1] Specifically things like where notable strip clubs are
I just messaged a friend who used to work in the field, and she confirmed that a lot of her coworkers took cabs to work -- she clarified that there's often nowhere for the dancers to park at the venue (I imagine this would be particularly true in Manhattan) and taking the bus with your makeup on can be an unpleasant experience.
In fairness to the celebrities accused of being cheapskates, I thought it was the case that the trip record in the dataset didn't include a tip amount if it was paid in cash.
I don't know how the data is obtained, but it's probably more likely that the driver is tipped but the actual tip amount is unreported than the actual tip amount being zero.
As a general fan of open data like this, I've been a little worried these analyses would lead to the data not being released in the future. Hopefully if they change anything in the future, it will still be useful and interesting.
Something else that could be mentioned in the linked article: if someone you were with got in a cab in 2013, and they told you where they were going, and you remember the approximate time and location, you can tell whether it was their true destination regardless of how many other people were being picked up at the time, because you don't have to find the exact ride they took, you only have to see whether any rides went to the place they told you.
This search is even extremely resistant to the differential privacy suggested by the post's authors. I'd be much happier simply stating that location data is not de-identifiable, and no-one should use a cab in a city that logs location data if they aren't happy with an adversary knowing where they went.