Hacker News new | past | comments | ask | show | jobs | submit login
Large-Scale Generation of Transit Maps from OpenStreetMap Data (tandfonline.com)
250 points by chippy 4 months ago | hide | past | favorite | 32 comments





Thanks for pointing out in the paper that lot of Spain and Italy is untagged !!

It's interesting to see the first high-speed train line in spain (Madrid-Seville) doesn't appear. I'll see if I can fix that.


Really cool, but what is happening in Taiwan in the long distance rail map?


Really cool! Very impressed by the results thus far, but thrlere is also some room for improvement. Probably caused in parts by parsing, and in part by input data quality.

- Taiwan has an insane rail network of a few hundred lines - half of the streamlines in Amsterdam are tram 7, apparently it takes you everywhere. - there seems to be an assumption all lines stop on all stations they pass - quite a few regional Train lines are committed in the Netherlands

And finally I woul love a bus version too!

Keep up the good work


This includes some railways but not others. For example the Edinburgh tram is missing.


The whole East Coast Mainline is missing!


I was kinda expecting a reference to https://blog.transitapp.com/how-we-built-the-worlds-pretties... but I didn't find any while skimming it, even though the content seems to be very similar


A 2018 paper by the same authors [0] (referenced in this work) references an earlier blog post by the transitapp people on Medium, but the link is no longer working.

> One approach that seems to use a model similar to ours was described by Anton Dubreau in a blog post

[0] https://ad-publications.informatik.uni-freiburg.de/ACM_effic...


Download link for the PDF: https://www.tandfonline.com/doi/pdf/10.1080/00087041.2024.23...

Such a great piece of work, very interesting.


This is really cool. I'm fairly impressed with the maps of the London underground, although there are some oddities in Octilinear mode. It's also missing quite a few Overground services.

It's really nice to be able to switch seamlessly between a Geographic, Octilinear and Geo-Octilinear view of the maps, because each of them tells you something useful. I would use this if TFL added it to their maps app.

https://loom.cs.uni-freiburg.de/global#subway-lightrail/octi...


This is really cool, but I wonder why the UK rail (commuter or long distance) network isn't shown very completely. There are _lots_ more trains around Birmingham or up north that don't get displayed. Even the network in the SE is very partial. Is this data really not in OSM?


They discuss in the paper that those routes are less trivial to query. For that reason, and I guess to mimic the traditional local transit maps the online version only queries routes with the `route` key `light_rail`, `tram` or `subway`.


What's novel/interesting about this? Speaking as an ignorant outsider, it seems like they're 'just' querying existing data and plotting it. Obviously this is a gross simplification, but I'd be really interested to hear what's hard about this problem.


From the introduction,

  Since the days of Harry Beck, transit maps have mostly been created manually by professional map designers (Garland Citation1994; Wu et al. Citation2020). The primary focus was on static maps, either distributed in print or electronically. These maps are typically schematic, and the classic octilinear design (network segment orientations are multiples of 45∘) is still prevalent. In the late 1990s, the graph drawing community started to investigate the problem of drawing such maps automatically. The following questions were investigated: (1) How can graphs be drawn in an octilinear fashion? (2) Which hard criteria should a transit map fulfil? (3) Which soft criteria should be optimized? Several methods have since been proposed (see below). A set of soft and hard criteria, first described by Nöllenburg (Citation2005), has since been generally accepted. The important sub-problem of finding an optimal line ordering of lines travelling through network segments has also been identified very early by Benkert et al. (Citation2006).
It not trivial to automatically generate an optimally understandable octalinear transit map, and this group have combined bits of 30 years of research to do it in one go for every* bit of public transit on the planet.

* every bit that's in OSM, I suppose


Thanks, seems like the devil is in the details. It's a cool piece of work, very impressive browsing the generated maps.


ok yes BUT .. a reason that transit maps were carefully composed is because people "who probably need assistance when using transit" plus "people who do not speak this human language" plus "people who depend on completeness and accuracy to a high degree" are all, at the same time, using one and only one map.

hurrah for computer science BUT this is also graphic design, with human factors, and simultaneously authoritative data that does matter to many real people. Easy tag-on criticism is "who needs all transit maps worldwide at all times" ? Isn't it obviously more important to have reliable, accurate, readable maps for the people who are using the system heavily in that area, instead of stretching all of those qualities to get a toy-prize for armchair readers and the world cloud servers on the Internet? common sense plays a role in the guaging accolades here IMHO


I think you're describing one of the core motivations for these decades of research.

In most places still, changes to these maps need to go through lots of people process. This implies they're slow to update, and therefore often not entirely accurate or optimally understandable. For example, consider service works or temporary outtages.

The outcome of this research is not a toy-prize for armchair readers, but a generalized algorithmic approach to building necessary and important accessibility tools.


as research it is interesting and gets wide attention, yes.

the reasoning above is a basis for rational discussion?

difficult to say in a diplomatic way on a computer science forum, that "replacing" human graphic design using algorithms does not get unlimited upvotes from me for real reasons


It seems like it's missing most of Tokyo?

Granted, Tokyo has a blended commuter / subway through-service system (eg, Fukutoshin-line trains continue into the Toyoko-line), but those trains don't seem to show up in either Rail or Subway views.


It is missing most of Tokyo. All they did is look at the subways, specifically the Tokyo Metro system, and maybe a few other lines. The JR rail lines aren't on there, and no transit map of Tokyo is complete without those. The Yamanote line, in particular, is a crucial piece of Tokyo's transit infrastructure, and it's perfectly normal for people to transit between the underground Tokyo Metro and above-ground JR East lines.

Most likely, they just left out the JR Rail and other lines because the map is way too complicated with them, or perhaps it broke their algorithm.

They also didn't show the entire Tokyo metro area, which is much larger than this and includes Yokohama. It's understandable, though: the entire Tokyo metro area is enormous (though perhaps not compared to many American cities like Phoenix, if you just look at land area, but those cities have nowhere near the density of Tokyo), and a map of the entire thing is usually too complicated to bother with, so zoomed-in maps are more useful. This one is showing central Tokyo (the part encircled by the Yamanote line) and parts not too far from it.


More generally if you try to do the whole world in one go you are not going to reach useful quality because every place has something unique about it in the transit network itself and also the map is curated by different people who do things differently. The map might look similar to an outsider but the particular codes, conventions and methods will be different.

The answer for this, I think, is that either the OSM data (input) needs to adapt to fit what this system can read or patch rules and patch data can be applied to fix up the output.

Either way it is a distributed project, people in Tokyo or Hannover or any place where it is wrong are the people who would know what is right so they should be engaged in the solution.


Here's the official Tokyo Metro transit map: https://ontheworldmap.com/japan/city/tokyo/tokyo-subway-map....

It shows the subways, but also adds on the relevant JR rail lines and lines from other companies. Honestly, I don't see how this project could improve on this; the current map is already packed full of information.


This is really excellent work! It does seem to be missing commuter train/rail systems in Canada, such as the GO Train system in Ontario.


This work is so impressive


Amazing stuff at so many levels.

Would live to hear more about the motivation for using RDF/SPARQL in the technology stack as these are frequently seen as arcane and here is a very intuitive use case.


If you can get the math right you can frequently develop a very good system for representing data in RDF and writing SPARQL queries against it. A week of high-quality thinking can save you six months of time developing an alternate query system; the custom query system might be better but it probably won't be. It's easy to make something that is faster for specialized queries but unlikely you can build something that will let you write complex and versatile queries better than SPARQL.

The key though is coining good identifiers, developing a good set of properties, and understanding how datatype properties work and using them well. It's very easy to develop a bad standard like Dublin Core that, unfortunately, perpetuates the bad stereotypes people have of the RDF world.

The SPARQL spec is dense reading

https://www.w3.org/TR/sparql12-query/

but it's a tiny spec. The SQL spec on the other hand is broken up into numerous $200 documents and if you did look at them you'd find it's much much messier. If you felt SPARQL needed something extra it's a good base to work from to develop some kind of SPARQL++ and the same is true with the RDF model. (e.g. add something to every triple to record provenance, for instance)

My two complaints with SPARQL are: (1) there are two official ways to represent ordered collections and a third unofficial one; if you are good at SPARQL you can write queries that can do the obvious things you want to do with ordered collections (like you'd see in JSON query languages like N1QL or AQL) but there ought to be built in functions that just do it, (2) you can write path queries like

   ?s (ex:motherOf|ex:fatherOf)+ ?o .
which will match ?o being an ancestor of ?s. Sometimes you need to capture the matching path and SPARQL as it is doesn't provide a way to do that.


I'm just here to say that I fully agree with your comment. Hardcore RDF person myself but I wouldn't trade it for anything else. Once you master it it feels like cheating and/or magic sauce :)


Can you explain why you think Dublin Core is a bad standard?


(1) You can line it up side by side with the 1970 MARC standard

https://www.loc.gov/marc/

and, in terms of capabilities, MARC comes out way ahead. MARC is a standard for a university library, Dublin Core seems to be a standard that almost works for an elementary school library.

(2) Specifically, people who write a paper or a book will get prickly about the order that authors are listed in, but Dublin Core doesn't provide a good answer, particularly if you want to use authority records. I mean

   :Paper
       dcterms:creator "Alpher, Ralph" ;
       dcterms:creator "Bethe, Hans" ;
       dcterms:creator "Gamow, George " .
doesn't cut it because when you get the results back they could come back in any order. RDF has two different ways to represent ordered collections and they could have let you (required you to) write

    :Paper
       dcterms:creator ("Alpher, Ralph" "Bethe, Hans" "Gamow, George") .
which looks just like a Lisp list and internally is structured like one, but they didn't. In the XMP specification Larry Masinter specified that you do this

https://github.com/adobe/XMP-Toolkit-SDK/blob/main/docs/XMPS...

and boy there was a lot of good ideas in the XMP spec but Adobe wound up NERFing the implementation because Adobe was accused of throwing it's weight around too much. Sure you could write

   :Paper dcterms:creator "Ralph Alpher, Hans Bethe, George Gamow" .
but that won't work if you want to use URIs that point to authority records like the DC spec advises you to do. People hear RDF and think "Nothing to see here, move on" because of standards like Dublin Core that simultaneously seem inadequate and over complicated at the same time.


Very cool, I've been working on getting Qlever setup and using it as my main triple store. Was excited to see this hit my feed


How Exciting project! How come there are no lines in Hanover, for example?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: