On top of being an excellent researcher, Hannah Bast is one of the best professors I've had pleasure of studying under and working with, so I'm happy to see her getting credit for this.
I'd love to play around with this. Do you have an estimate for the RAM and CPU requirements for the Toronto dataset? I skimmed the project site and didn't see this discussed.
The robustness experiments are highly parallelized and maintain a copy of the (modified) network per thread. I vaguely remember the NYC test would take around a day on 16? cores and use up to 64GB of RAM.
You can find query time evaluation in the performance recap on the results page I linked. For NYC it's around 2.2s for the Dijkstra (baseline) and 27ms for the TP-based search.
For single-threaded pre-computation and shortest-path queries, I would expect you to need around 8GB for NYC, less for Toronto and you can get the Honolulu feeds to run on <2GB (which was my local test set).
That's really cool. I've always been interested in pathfinding, but I'd naiveley assumed that the restricted world of mass transit would be simpler than free-form pathfinding.
Relatedly but somewhat off topic, I wonder why Google doesn't provide route maps too? While the transit route planning is neat, I often want a larger view of the route or system, and many municipalities just feed you a PDF from Hell.
Google has the rapid transit lines in many cities, see for example London. In some of the Bay Area, they hint bus lines as well, but it's a bit of a mess (https://goo.gl/maps/rVPgGSBRD7G2). These maps are auto-generated it seems, so there are some issues - check out LA Union station (https://goo.gl/maps/GeW28w9MPpQ2).
Where the underlying transit data does not contain the exact shapes of routes, they interpolate the data with something that looks like Splines. You can see that in London, especially for the Docklands Light Railway (https://goo.gl/maps/uGmyBZZwDhy).
Apple maps makes much prettier transit maps, although I hear rumors they create them by hand.
Creating them by hand isn't necessarily a bad thing.
The post describes that there are on the order of 20,000 cities and towns covered; I expect that many of these are municipalities that are a part of a larger group of cities. WolframAlpha says that the 20,000th largest city coincidentally has a population of about 20,000 people, barely large enough to have its own public transit. And many states, companies, and countries will have common infrastructure for getting maps of all the public transit within the country. Taking a wild guess, there are probably significantly less than 500 separate transit systems to be mapped; a team of just 10 people could cover them in a year with a full person-week given to each location.
A mix of automation and manual entry is ideal for many tasks.
I imagine that's not enough, because changes to bus routes tends to be bursty. For example, I'd imagine thousands of those transit systems to suddenly add routes in September to coincide with the start of the school year, and then remove those routes in April or June to coincide with the end of the school year. With a team of ten, some of those systems' maps wouldn't get updated until Christmas, which would be too late to be useful.
Yes, it's not a bad thing. You get more accurate data (crucial for estimating fares for distance-based services).
We're a team of 2 people at Moving Gauteng [0], and we maintain transit data of 7 and growing transit services. We have 1100 routes mapped, with detailed schedules and other info.
The problem with drawing simple lines to approximate what route the service takes is that you become limited by the inaccuracy of your data.
There are many fare calculating rules, and a good deal are based on distance covered. Having a polyline that's > 95% accurate helps greatly in calculating the correct result for the user.
I agree on your point that a mix of automation and manual entry's good. In some instances it's better to track the bus by GPS and snap the GPS points to a route using GIS software like ArcGIS (or even Mapbox as they have such an API).
We at Transit App have about 15000 routes, which were managed by one or two people or so until relatively recently. Manual mapping, an extra task, would not be feasible.
You also have to consider that just having good shapes for routes (which is often in the data anyway) doesn't mean you automagically get good transit maps. You have to decide when lines are travelling together, which Google apparently does by considering the stops that lines travel on. You can see lines stay together as long as they travel the same stops, and immediately branch at a stop when they branch, even though the physical branch happens later.
I presume you get data from GTFS feeds. If so, it'd be more feasible to build a service that converts a 'poorly mapped' route into something better almost automatically. Manual involvement would be necessary to only correct defects from the process.
Although we manually map, we reuse most route segments, especially trains where 2 similar lines could travel 80-90% on the same tracks. We just chop and screw where necessary. The whole thing takes a few minutes to draw a new route based on existing ones.
But if they get the stops right, does it matter if the line is a little vague? I mean, I just want to see all the routes, the lines don't matter. Big bulging stops with thin dottet splines connecting them would convey the important info even if it looks like the bus is flying.
I wonder if they consider how reliable the official schedule is. For example if a bus is supposed to come once an hour on the hour. It would be a bad plan to arrive just 1 minute before the top if the hour since the schedule is probably an approximation.
In my city the transit authority considers five minutes early to 1 minute late to be on-time. I don't know how normal this is, but it's actually pretty infuriating to have transit apps seem to treat it as the inverse.
A lot of issues like this tend to get glossed over in transit planning apps. Another issue is transfer timings, where the more transfers you do getting to your destination the more likely one of the transfers goes awry and throws off the whole schedule. For that reason I, as a person who exclusively uses transit in a city where it's not terribly good, almost always avoid a trip with 2 or more transfers if there's a 1 or less transfer option with more walking. Convincing transit apps to give you these kinds of options can be difficult (though google does have some knobs for controlling these things).
What city is that? That is a rider-hostile definition of "on-time". In my city if the bus arrives at a stop early it will just stand there until the scheduled time.
There are stops (outside transit centres) where the bus will wait for the scheduled time, called "timing stops". There tend to be maybe 5 or 6 of them on a given cross-city route, but there's no public information about where they are. It's pretty easy to figure them out, though.
Note that at non-timing stops they don't even wait if they're five minutes ahead of schedule most of the time.
That's good, because it means that your transit agency(ies) respect timing points.
It's difficult to always know when the bus will pass a certain place, even with Bus Rapid Transit systems that have dedicated lanes. Timing points are points on the schedule where the bus is guaranteed to leave on time or late at that point.
So I presume that in your case, if the bus is late, it doesn't wait, but if it's early it waits.
In the "Frequency-Based Search for Public Transit" paper they compare their algo's preprocessing time with RAPTOR - but RAPTOR doesn't require preprocessing AFAIK.
Transfer patterns also behave robustly given real-time updates on the network, which we have shown here: http://stromboli.informatik.uni-freiburg.de/student-projects...
On top of being an excellent researcher, Hannah Bast is one of the best professors I've had pleasure of studying under and working with, so I'm happy to see her getting credit for this.