Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Overture Maps Foundation releases open map dataset (overturemaps.org)
402 points by chippy on July 26, 2023 | hide | past | favorite | 100 comments


For context, Overture Maps is a project that intends to enable large players in the geospatial space (TomTom, Amazon, Microsoft, but notably _not_ Google) to leverage open data sets (OpenStreetMap among them) alongside proprietary data and processes that they own.

The consortium intends to enable a framework for enhancing geospatial data based in open data sets (e.g. OSM) with their own proprietary processes, and re-release it with a permissive license (the Community Database License Agreement - CDLAv2), but keep the data and processes required to create that dataset proprietary.

The project has created a lot of conversation in the OpenStreetMap community, but in general I think it's good to see so many resources put into the OSM-adjacent world.


Another bit of context: OSM mappers were quite discontent with edits of many of Overture members. Be it low-paid contractors or bulk importing stuff that AI derived from satellite imagery.


> bulk importing stuff that AI derived from satellite imagery.

That's a thing? This must explain why Google Maps turned into shit over the last few years. Google seems obsessed with replacing humans with half-assed automation.


I don't understand why big corporations find it so hard to see that the real potential in AI is usually in augmenting, not _replacing_, human labor

Imagine how useful tools like that can be if they just had a human review process. That added step in the process not only improves the quality of the final data but provides important feedback to improve the tool itself


> I don't understand why big corporations find it so hard to see that the real potential in AI is usually in augmenting, not _replacing_, human labor

IMHO because now I have twice as much work: knowing the problem domain and reviewing someone else's contribution. It's like code review 24/7 from untrusted contributors: it requires more focus than just trying to understand and author the original contribution


Not entirely like that; it's mostly fixing small edge cases when the bulk of the work has been done correctly. Think about e.g. automatic background removal from photos: most of the time you only need to introduce small corrections, instead of painstakingly tracing shapes by hand. But without these corrections the results are often slightly but visibly incorrect.


> Not entirely like that; it's mostly fixing small edge cases when the bulk of the work has been done correctly

Well, I guess one can just continue to prompt any such AI over and over "have you done it correctly? No? Then try harder"

I recognize that my experiences and expectations from AI differ from seemingly the vast majority of users who get benefit from it. I'm glad for them, but I don't ever want to opt-in to having "AI assistance" because with the current state of the art it generates more work for me than value in my life


Yes, indeed. Google AI mapped skylights (roof windows) on a shopping mall near me as buildings.


In my city they made several buildings that overlap a river, and several more that overlap each other. It's been like that for at least 3 years.


Sounds like a fun city!


Do you know when these contributions began? Has the scene changed much since then?


First known corporation detected by https://piebro.github.io/openstreetmap-statistics/#b34d was in 2008.


How do they do that? I thought OSM was licensed in such a way that if you combine it with other data, the whole thing needed to be released with the same license?


You are mostly right, except that in addition to the same license, the Share-Alike provision in the OSM license (the ODbL [0]) also enables derivatives to be re-released with "A compatible license."

I am not a lawyer, but I would assume someone at Overture has deemed their license (the CDLAv2 [1]) to be "compatible" with ODbL. (edit: From the link in OP, the OSM-based data is being released with ODbL. I think I'm wrong about license compatibility. Again, not a lawyer.).

[0]: https://opendatacommons.org/licenses/odbl/1-0/ [1]: https://cdla.dev/permissive-2-0/


CDLA Permissive is not downstream compatible with ODbL. ODbL is a sharealike licence, CDLA Permissive (as the name suggests) isn’t. Overture can’t relicense OSM-derived content under CDLA Permissive.

From a brief skim of the landing page, it looks like the OSM-derived content is not being offered permissively. See https://overturemaps.org/download/.


You're right, I was mistaken.


These things don’t matter when one side has 100x the lawyers of the other side.


Not exactly.

Are they releasing OSM data or OSM derived data on CDLAv2 anywhere, right now?

There was issue like this in past but it was resolved.


Are they releasing OSM data or OSM derived data on CDLAv2 anywhere?


It would be super helpful if they at least released the name of their data sources.

The places layer would be invaluable for (for example, stuff I'm working on now) geospatial healthcare access analytics if I could rely on the places being a reasonably accurate source of provider locations.

It's a lot of work to tie all the CMS data with health plan data etc.. and then to geocode it all with OSM tiger geocoders. If I could rely on that data already being present and scrubbed, it would be a godsend.


Your project is really interesting to me - are you just looking at the physical buildings (e.g. you have 5 urgent cares within X distance), or are you trying to cross-reference provider data to number doctors/specialties? Or something else entirely?

Totally understand if you can’t discuss (or don’t want to), but it’s in my domain and I’ve spent a fair amount of time thinking about how to do it without coming up with much, so I’m super curious about your thoughts.


I came to ask for this info so thanks for the succinct summary!


If you are looking for maps themselves (a la Google Maps) I highly recommend checking out Protomaps [0]. It provides a single file (PMTile format) that contains all the map data and you can request ranges from the single file to get the required vector data you need for a given area/zoom.

So you can setup something like the following:

    S3 (host the myfile.pmtiles) <- Lambda (takes the x/y/z from the path and requests the correct range) <- Cloudfront cache tile response 
Then you can setup tiles.mydomain.com (or use the cloudfront domain directly) and then use Leaflet or similar on the frontend to fetch/render the tiles. For Leaflet you use the protomaps plugin/lib and give it a url like "https://tiles.yourdomain.com/20230408/{z}/{x}/{y}.mvt" where "20230408" maps to "20230408.pmtiles" in your S3 bucket. Now I can drop new pmtiles files into that bucket and update my clients to use the new source. And since the tiles are in vector format you can theme them however you want in the client which is neat. Lastly you don't have to use the 100+GB whole-earth tileset. You can use a tool [1] (provided by the same guy) to download a dataset for just a given geographical region.

The .pmtiles file is a little over 100GB but the whole setup took me only an hour or two max to get running and will cost way less than Google Maps to run.

[0] https://protomaps.com/

[1] https://app.protomaps.com/downloads/small_map


their demo isn't doing them any favors since it seems to stop at some random zoomlevel, eliding the kind of detail that I'd care about if I were evaluating that solution

digging into their GH repos surfaces https://protomaps.github.io/PMTiles/?url=https%3A%2F%2Fr2-pu... which is at least more detailed if not quite a GMaps killer


I agree, if it weren't for your comment, I was about to ditch it as a candidate because it stopped zooming at a bad level. But your link shows that it contains a tremendous amount of data.

Edit: And I missed parents link to https://app.protomaps.com/downloads/small_map which is actually better.


I just updated https://protomaps.com it to use the more detailed file you found, thanks. So instead of a zoom level of 12 it should work down to 20+ - the actual dataset contains up to zoom 15, which is ~1 billion tiles in a single 110GB file.

It's important to me that I showcase the system working as advertised - you are free to download the 110GB file from that static storage with no restrictions. Unfortunately, the demo is slow because I'm hosting it on Cloudflare R2 which has no outgoing bandwidth fees. Before, it was on DigitalOcean Spaces, which is low latency but would cost me $1-2 out of my own pocket each time someone clicked the link.

The Lambda/Workers integration as described in the other comment is the preferred solution to this for developers operating this at planet scale. Of course, the static storage solution is perfect for small to medium-sized areas, even on GitHub Pages.


I just wanted to say thank you for everything you've done for Protomaps. It was very easy to get started with and other than a small Vue/ES6-related issue I was able to get it integrated into my codebase very quickly.


> Unfortunately, the demo is slow because I'm hosting it on Cloudflare R2 which has no outgoing bandwidth fees.

Can you expand on this a little? Cloudflare is supposed to be pretty competent as CDN and with caching.


This is Cloudflare's new blob storage product R2. Putting the free Cloudflare CDN and cache infront of R2 doesn't work with HTTP Range headers, likely for product differentiation from their Video CDN. The latency of R2 is also much higher than S3 or GCP, but I know the R2 team is working hard on improving this.


FWIW you can add your own caching logic with Workers, if the CDN for some reason doesn't do Range. The Cache API exposed to Workers is quite useful.


That is exactly how the open source Workers integration works:

https://protomaps.com/docs/cdn/cloudflare

However, it is an optional acceleration layer on top of the PMTiles access pattern; and it doesn't make a compelling demo, it's just a Z/X/Y layer which is how map tiles have worked for the past 20 years.


Hmm, thats cool, but I need to have raster maps pushed to my endpoints. Do you know any easy way to use mvt maps and generate tiles from them on the fly (CGI backend + caching) and then use Leaflet to render those?

I have my own tiny personal project for maps renderer using Leaflet. For now I use 3 sources for tiles (Google, OSM and Esri) but having more sources would be very cool.


I don't personally know how to do that. It appears the PMTiles format does support raster maps but the dataset I use is only vector as far as I know.


So one Lambda execution per tile?


It shouldn’t be required if the range requests are right and you have a compatible renderer in the client.

To expand a bit -- There are adapters for mapbox(gl/libre) and leaflet, and so on that will allow you to use a .pmtiles directly from the client with a range request supporting server, like s3 or nginx.

I've used protomaps basemaps to generate an Ireland and EU basemaps for a project from OSM data. I still haven't quite figured out the tuning for which features should be shown, combined, or excluded yet, and the generated layer names are sort of, but not quite compatible with similar base layers + styles from mapbox (which uses the same base data, but a different, proprietary conversion). This part takes a reasonable skill as a GIS person and some taste in the design of the feature symbology.

The downside of doing this on S3 directly is that you might wind up publishing a link to a 100 gig file that would cost $10 if someone just downloaded it. Lambda invocations seem cheap compared to that risk. OTOH, throwing it on a hetzner box is quite reasonable.


The transformation of OpenStreetMap features into the basemap is open source and available under a permissive license now at http://github.com/protomaps/basemaps

This is the focus of most development for the next few months - like you said, it's also a matter of aesthetics and context, like if the map is underlying another PMTiles data layer it should be lower contrast then if it's overlaid with pin markers. The end goal is to have a flexible basemap solution adaptable to most apps but you are free to roll your own proprietary style on top of the open source tiles if you'd like (there are already companies doing this)


Yes, that's what I was using for my basemaps, it's quite fast for Ireland on a relatively fast machine (minutes), but Europe as a whole was hours. (hetzner quad core 8 thread ~3yr old intel machine with 64G ram)

I've got a couple of goals, or at least evaluations for goals --

1) Have a stripped down basemap like Carto Light, but with different emphasis for the "bicycling friendly" roads, and selectively higher contrast/emphasis for those minor through roads that discourage car traffic but are perfect for a bike. (e.g. Irish boreens, those 1 lane or double track "paved" L roads that are everywhere outside of the major cities.) I find that I simply can't see the minor roads on apple/google maps when I'm out unless I'm zoomed in to the point that I can barely see a network. (fading eyesight and super low contrast).

2) Another lighter base map, but with transit focused features to compliment some of the open data in Ireland around transit -- the routes and timetables and so on. I'm aiming for a site that can be 100% statically hosted but will show the routes in a more friendly manner than Bus Eireann's site.

3) If it goes well, I might be doing this in a more commercial context for some clients who are currently using mapbox for their basemaps but due to political concerns need to insert in different names/boundaries for disputed areas/features. We'd love to be able to self host, but quality is a major concern there.

The bugs that I'm seeing in the 1 and 2 cases are things like handling the look of freeway interchanges/flyovers at close zooms, consistently getting rivers to show at appropriate zooms because they're into two separate feature types depending on the width, with the narrow bits getting dropped. And the general mismatch between the styling I'm used to from the mapbox converted layer names/feature types and what's coming out of pmtiles/basemaps. The feature bits look like tippecanoe coalesce/drop densest features tuning, but I'm also looking at just dropping out entire feature sets to drop the size of the tiles. That should help the coalesce/dropping behavior as well.


Thanks - for the issues related to freeway interchanges and rivers, it would be most helpful to report these on GitHub using screenshots and links to lat/lon positions, or even links to OSM nodes and ways.

OSM is a freeform dataset and not a cartographic product, so most of the basemap work now and in the future is on getting good results of this transformation for 200+ countries. The mismatch vs. existing maps must exist because we need to ensure to downstream commercial users that all end products of the map generation are openly licensed - that's why this is being pursued as an independent, self-funded project with support from GitHub Sponsors: http://github.com/sponsors/protomaps


Ok, I’ll dig a bit more to get some good test cases with the local data that I’m familiar with, and make sure that it’s not something that I’m doing wrong.


Library author here - yes, one execution happens per tile, if you have CloudFront in front those will only happen when the PoP has a miss.

With 512MB RAM these generally complete in under 100ms so the unit costs of Lambda are within a few multiples of Cloudflare Workers (.5USD / million invocations)


Hey, want to say I'm super excited by protomaps. I'm struggling a bit to convert existing OSM pbf files into the pmtiles format. Specifically, I'm trying to convert this guy https://daylightmap.org/

Do you have any tips for doing so without renting memory in the cloud?


The daylight PBFs are difficult to combine without a lot of RAM, there's no way around it right now. Either do that or hold out for potential newer releases of the data delivered as Parquet files.


Like the other comment says I don't think it's quite that bad but also CloudFront (or whatever CDN you use) is going to cache the response. In my case I'm rendering restaurant locations on a map. I only have X number of items that each have a map of the restaurants they are available at so unless people pan/zoom around a lot I only expect a handful of lamba calls and the resulting tile data will be cached in CloudFront for everyone else.


> Administrative Boundaries: A global open dataset of national and regional administrative boundaries, this boundary data includes regional names which have been translated into over 40 different languages to support international use.

Oh that is going to be fun. If I recall correctly Google Maps alters the boundaries of places based on the views of the location the map is being requested from to avoid getting in the middle of disputes.

Not "correctly" showing boundaries is a crime in many countries.

Edit: here is a source https://qz.com/224821/see-how-borders-change-on-google-maps-...


Unless you are making a google maps rival, in 99% of cases you won't need those countries with controversial borders.

And unless you have millions and millions of users, no one will care even in those countries.


I like that they provide this data, however when you try to actually retrieve it, It looks like they went out of their way to make it as convoluted as possible to try to retrieve any of the actual data.... so you have to use DuckDB and then do the import using that? why not just support mysql dump files? why require someone to have DuckDB? is DuckDB that popular? also the links they provide don't work so it doesn't look like any of it's available.... how is somebody supposed to use this stuff? they require you to also have Amazon S3 with some query language that's non-standard to be able to talk to it.... I don't get it I'm sure I'm not the only one but it needs to be more generic instead of the way they're doing it


yeah, I thought about mentioning that it's kind of rude to lock that data behind AWS or Azure accounts, but realistically no one can be surprised that Big Cloud Providers want people to use more big cloud services

There's nothing stopping either AWS or Azure from granting `s3:Get` and `s3:List` on those buckets to enable unauthenticated reads, if they were thus interested


I think they did?

aws --no-sign-request s3 ls s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/

works fine for me.


Ah, my mistake for not trying it; I saw the s3:// and assumed incorrectly. Thanks so much for trying that and reporting back!


I have an interest in this topic as a contributor to AllThePlaces[1], an open source project collating Scrapy spiders (MIT license) that crawl websites of franchises/retail chains that you'd find listed in name-suggestion-index[2] to retrieve location data (CC-0 license). The project is just short of collecting 3 million points of interest from almost 1700 spiders.

Overture Maps appears to be quite a closed and proprietary project, with claims of openness limited to being able to download a data set and accompanying schema specification. Some issues that immediately come to mind:

1. There is no published description for how the data was generated. End users thus are given no assurance of how accurate and complete the data is.

a. As an example, administrative boundaries are frightfully complex and include disputed boundaries, significant ambiguity in definition of boundaries, and trade-off between precision of boundaries versus performance of algorithms using administrative boundary data. Which definition of a boundary does Overture Maps adhere to, or can it support multiple definitions?

b. It's probable that Microsoft have contributed ld+json/microdata geographic data from BingBot crawls of the Internet. This data is notoriously incorrect, including fields mixed up and invalidly repurposed, "CLOSED" in field names to denote closure of a place 5 years ago but the web page remains online, and much ambiguity in opening hours specifications. For AllThePlaces, many of the spiders developed require human consideration, sometimes of considerable complexity, to piece together horribly messy data that is published by shop and restaurant franchises, and other organisations providing location data via their websites.

c. For location information where +/- 1-5m accuracy and precision may be required (e.g. individual shops within a shopping centre[3]), source data is typically provided by the authoritative sources with 1mm precision and +/- 10-100m accuracy. AllThePlaces, Overture Maps, Google Maps and similar still need human editors (OpenStreetMap editors) to do on-the-ground surveys to pinpoint precise locations and to standardise the definition of a location (e.g. for a point, should it be the centroid of the largest regular polygon which could be placed in the overall irregular polygon, the center of mass of a planar lamina, the location of the main entrance, or some other definition?).

d. If Overture Maps is dependent on BingBot for place data, they'll miss an enormous number of points of interest that BingBot would never be able to find. For example, an undocumented REST/JSON/GraphQL API call or modification to parameters to an observed store locator API call may be necessary to return all locations and relevant fields of data. Website developers routinely do stupid things with robots.txt such as instruct a bot to crawl 10k pages (1GB+) from a sitemap last updated 5 years ago rather than make 10 fast API calls for up-to-date data (5MB). Overture Maps would be free to consume data from AllThePlaces as it is CC-0 licensed, and possibly correlate it with other data sources such as BingBot crawl data, a government database of licensed commercial premises or postal address geocoding data. However the messiness of data in various sources would be approaching impossible to reconcile, even for humans, and Overture Maps would possibly have to decide whether to err on the side of having duplicates, or lack completeness.

2. There is no published tooling for how someone else can reproduce the same data.

a. AllThePlaces users fairly frequently experience the wrath of Cloudflare, Imperva and other Internet-breaking third parties, as well as custom geographic blocking schemes and more rarely, overzealous rate limiting mechanisms. If Overture Maps is dependent on BingBot crawls, they'll have a slight advantage over AllThePlaces due to deliberate whitelisting of BingBot from the likes of Cloudflare, Imperva, customer firewalls, etc. However, no matter whether you're AllThePlaces or Overture Maps or anyone else, if you want to capture as many points of interest as possible across the world, use of residential ISP subnets and anti-bot-detection software is increasingly required. They'll need people in dozens of countries each crawling websites targeted to the same country, using residential ISP address space. Otherwise they end up with an American view of the world, or a European view of the world, or something else that isn't the full picture.

b. If Overture Maps has locations incorrect for a franchise/brand due to a data cleansing problem or sourcing data from a bad source (perhaps non-authoritative), there are no software repositories for the franchise/brand to raise an issue or submit a patch against.

[1] https://www.alltheplaces.xyz/

[2] http://nsi.guide/

[3] Example Australian shopping centre as captured by AllThePlaces: https://www.alltheplaces.xyz/map/#18.07/-33.834646/150.98952...


> 2. There is no published tooling for how someone else can reproduce the same data.

This is true, and is actually the whole point of Overture.

Overture was developed to enable private companies to leverage open data (like OpenStreetMap) but also combine it with their proprietary data and processes.

The intention is to share the result with a relatively permissive license (a new thing called the Community Database License Agreement) but keep the process and underlying data proprietary.


Yes, exactly. Overture is, for better or worse, looking to create a dataset that is more difficult to mess with / contribute to.

I'm big on OpenStreetMap, but I can't deny its a bit of a liability for Facebook and other companies that display maps to their users. There is the occasional vandalism edit that simply can't be shown to an end-user. Facebook put significant effort into maintaining a moderated version of the OSM database that lags behind the real-time edits. Facebook, Microsoft, TomTom, etc. know this is a ton of work and want to pool their resources. Making it open also helps to openly compete with Google, the other big map data provider.

If you want to contribute to Overture as an end-user, AFAICT your best option is to edit OpenStreetMap and see if your changes eventually get pulled in. Overture has promised the OSM community that they'll make much of their data available to be contributed back, we'll see if that pans out.

When it comes to AllThePlaces -- as an OSM nerd it seems like there is an opportunity to build a better bridge between this and OpenStreetMap, to make it easier to quickly update businesses in an area. Recently there has been a pretty successful push to link OSM data with WikiData, using tools like the OSM ↔ Wikidata matcher [0]. For POIs, it's a lot of work to add a bunch of local businesses, even with tools like EveryDoor [1]. It would be so cool to see AllThePlaces integration into RapID for example, if there isn't already(?)

[0] https://osm.wikidata.link/

[1] https://every-door.app/


> There is the occasional vandalism edit that simply can't be shown to an end-user.

I remember the same sort of arguments being made about how web sites could not possibly ever accept user comments or submissions, or could not ever risk having users sending links to one another. Those all proved to be false.


> I remember the same sort of arguments being made about how web sites could not possibly ever accept user comments or submissions

The heyday of comments sections on news websites is now in the past. Not long ago, Lonely Planet took down its renowned Thorn Tree forums, which had been a big part of the travel internet since the 1990s. Friends who run a major website for a particular hobby told me that they canceled their plans to launch a forum, since their site is advertising-supported and a forum could damage their relationships with advertisers.

Reliable moderation costs money, and if you don’t moderate heavily enough, you’re going to get user comments that tarnish your brand (or at least scare execs into thinking that the brand will be tarnished).


Editing OpenStreetMap isn't quite equivalent to commenting on a post. OSM allows you to edit anything, which is fantastic but also allows for more serious vandalism. We have seen major cities renamed to racial slurs, for example. As with Wikipedia, the community is generally very good about correcting these issues quickly. It's an uncommon problem that a lot of people work to mitigate. But I stand by what I said: vandalism on OSM is in many cases unacceptable to show to end-users.

The OSM basemap is used in many official publications, in many social media applications, etc. I'd actually recommend using the basemap for most simple mapping cases, as long as it's being continuously updated from upstream (or, it is the upstream basemap). If you take a snapshot of that data, however, you risk capturing some bad stuff. That is a real risk for a company like Meta.


I used to work in the industry. A few things I’ve seen make it into production maps shown to largeNumberOfUsers are phallic objects drawn as lines on the map and series of what appear to be random test lines drawn as roads in the Arctic. Neither of these examples are great publicity if they are discovered on a finished and shipped product people are paying for.

Interestingly, they’re also a lot harder to catch than the slur naming example


> We have seen major cities renamed to racial slurs, for example.

Once in 19 years, I think?

It’s not great but let’s not pretend it’s more of a problem than it is.


No, it's not a huge issue within the OSM community. But surely you see why it's an issue for large companies looking to use the OSM basemap in their projects.


Perfection is an impossible standard to uphold. Even if you do everything in-house, and even if you are Disney, you cannot avoid the occasional scandal:

https://en.wikipedia.org/w/index.php?title=The_Rescuers&oldi...


> not possibly ever accept user comments or submissions, or could not ever risk having users sending links to one another. Those all proved to be false

Funny thing is that this part is most often the cause of a data breach when looking at majority of pentesting reports.

One thing is to expose something to few ppl you know and another thing is a possibility to send things to millions of ppl that are constantly abused like Twitter or Youtube.


Comments are not core parts of content.

What this is taking about is closer to Wikipedia. And vandalism is a real thing there, so portion of topics is moderated.


Did they? I've seen a lot of e.g. newspaper sites removing comment sections, or switching them to not display comments until they've been moderated.


There are many links to legal content that Instagram and FB messenger will not allow you to send to your friends, so I'm not sure how vindicated you are.


That’s more of an indictment of Meta than an argument against uncensored user-to-user communication.


> It would be so cool to see AllThePlaces integration into RapID for example, if there isn't already(?)

Part of the problem is not entirely clear copyright/copyright-like status of this dataset. Thanks to https://en.wikipedia.org/wiki/Database_right and similar things (in general OSM is really careful with legal status of datasets being imported).


Ah, great catch. After posting my comment I came across this GitHub issue that touches on this point as well [0]

As always, I will be vigilant about checking licenses before adding data to OSM!

[0] https://github.com/alltheplaces/alltheplaces/issues/5133


Sadly copyright and copyright-like restrictions are really complex. I am not 100% entirely sure whether concerns that I raised in this issue are really problematic, but...


Wouldn't the closed processes and underlying data severely limit communities such as OSM from using Overture Maps results for anything other than a validation of what OSM already knows from other sources?

Perhaps Overture Maps has used impressively accurate satellite imagery tracing to detect the demolition and rebuild of a structure somewhere in Sudan, and can output a new polygon. No OSM mapper is setting foot in Sudan, and recent satellite imagery for the area is not available through companies that share such data for OSM use.

The issue for an OSM mapper who sees the conflict between OSM (with the old building) and Overture Maps (with the new building) is they don't have any information to know which result is accurate. Is OSM just out of date? Has Overture Maps produced the result from outdated satellite imagery and OSM is more up-to-date? Is the result form Overture Maps the result of a mistake in an automated tracing algorithm?


> Wouldn't the closed processes and underlying data severely limit communities such as OSM from using Overture Maps results for anything other than a validation of what OSM already knows from other sources?

Seems like a play at the old Microsoft "Embrace, Extend" approach. Whether or not there's an Extinguish after that is yet to be determined.


These are all valid questions, and commonly raised concerns, about the Overture Project.


Sounds like that's going to be a problem for proving/reproducing results independently then. :(


Wonder what could be reasoning behind this. Is it they dont want disclose data collection practices or cant do from legal points.

Things like it could come from alexa, pc or any devices devices with forced opt ins that keep scanning all your neighbours wifi networks and mac addresses.

Believe there was also an initiative where amazon devices will provide adhoc internet connectivity by piggy backing on other amazon devices on different networks with connectivity.

So all the openness but without any controls. There should already be a better term for things like this.


strong agree -- additionally, there is clear public relations going on, announcing maps like this as "trusted" or "official" backing, presumably for decisions over commerce, safety, insurance claims or other non-security but valuable, data-oriented activity. The public or public oversight, has what recourse here?


The GitHub repo is here; https://github.com/OvertureMaps/data -- Licenses look super permissive - I'm not that familiar with the state of Open Mapping but the GERS idea looks great too.

https://docs.overturemaps.org/gers/


The CDLA license feels like NIH... but at least it looks genuinely permissive, as opposed to fake open source.

Curiously, two of the four layers use the ODbL.


> Curiously, two of the four layers use the ODbL.

Probably this data is reformatted OpenStreetMap data or derived from it (so is licensed as ODBL and requires attributing OpenStreetMap)


It's unclear to me how Overture manages to "license wash" OSM data here.

> Transportation: The OMF’s Transportation layer represents a worldwide road network derived from data in the OpenStreetMap project. This community-built data has been recast into the Overture data format which provides consistent segmentation of the data and a linear reference system to support additions of data such as speed limits or real-time traffic.

The OSM ODbL is crystal clear that OSM contributors have to be credited. I don't believe that CDLA Permissive v2.0 magically allows Overture to bypass it.

--EDIT: I missed that they're using different licenses per dataset, the transport theme is OBDL, which I'm sure will trip up users who are not careful.


Seems that any new data generated out of Overture Maps could be uploaded to OSM if they chose to do so? The CDLA license is permissive;

https://cdla.dev/permissive-2-0/

My read is that they're not going to avoid credited OSM, but rather they're going to credit OSM / maintain the license for the parts they use from OSM and then the rest will be CDLA 2.0 licensed for anyone to use.


Overture could be seen as hostile to OpenStreetMap, but I do not think they are washing any licenses here. The article explicitly says that the CDLA license applies to data provided by Meta and Microsoft, not data from OSM. I think it's unfortunate that Meta and Microsoft aren't contributing their data to OSM instead of releasing it under a different license, but c'est la vie.


Great stuff, at some point I feel like Google is going to have to try to extract even more revenue out of maps than they currently get and it will really empower these sorts of collaborative map systems.

I wish there was a way for people to fund satellite imagery that got pushed into these systems after purchase. Sunnyvale, for example, paid for a lot of imagery of the city that they use/used in staff discussions about traffic, zoning, etc. It would be nice if they could then push those images into the open data set.


There's really no technical reason why imagery like that couldn't be contributed, and IIRC Google does use municipally-owned imagery in its products. A single city might not be a big enough source for them to bother, I don't know, but I know particularly in Europe I have seen copyright notices for national and regional governments when looking at imagery.

Though there are a lot of sources for satellite imagery right now; Google may not be that hard-up for new stuff. I suspect the commercial vendors they work with probably image the entire CONUS area every 12 months or so.

The imagery that would seem to be more in-demand would be the aerial photos used at very high zoom levels. I'm not a geospatial expert but I think these images are combined with some sort of LIDAR or multi-spectrum imagery to have height maps in tandem with the visuals. That strikes me as pretty expensive to obtain.


What's the difference between this and Open Street Maps?


OpenStreetMap is community of people creating fully open map database. From roads, shops and rivers to tourism attractions, hiking routes and hospitals, with all kind of detail. Anyone can contribute (and as someone quite involved in OSM: if you want to map something, especially area near you: you are welcome!).

Overture is a new group of companies releasing some datasets on open licenses, but methods used to create them remain proprietary. Some of released data is theirs, many datasets are repackaged OpenStreetMap data.


Ha I was just wondering the same and Overture has a FAQ that answers that exact question;

Overture is a data-centric map project, not a community of individual map editors. Therefore, Overture is intended to be complementary to OSM. We combine OSM with other sources to produce new open map data sets. Overture data will be available for use by the OpenStreetMap community under compatible open data licenses. Overture members are encouraged to contribute to OSM directly.


We've focused on designing a data schema that is easy for developers to quickly understand and use in building map products.

https://docs.overturemaps.org/

OSM focuses on open map editing, but due to it's flexible schema, it can be hard to extract needed information from OSM.

Overture seems to focus on providing map data (from multiple sources) that can be used more easily.

EDIT: OSM is also trying to improve the OSM data model for easier processing. https://github.com/osmlab/osm-data-model


In some ways OSM is more impressive than Wikipedia.

There’s a gravel path near my house that maybe sees 20 people using it daily. Due to some work done nearby, the path was partially moved a few meters to the side. OSM reflected this new reality the day after.


This really depends on the density of contributors. In Paris every single tree is mapped on OSM, while in more remote areas it’s not rare to see entire roads or even villages missing.


> This really depends on the density of contributors. In Paris every single tree is mapped on OSM

Could also be that OSM leverages an open data set.

If my French doesn’t deceive me, https://opendata.paris.fr/explore/dataset/les-arbres/informa... has data on 207,688 trees, tagged with species, height, and circumference.


OSM is predominantly human-made. This is predominantly machine-made.


Machine-made …from OSM and other sources.


This license-washes Open Street Maps data into proprietary systems so big corps can use it without contributing back.


To start, the admin boundaries and POIs are given a more permissive license. They’re mission is to make permissive map datasets.


Anyone loaded it to Bigquery yet?

I know this is "booo-google", but I just want to write some joins with other tables I happen to have in bigquery. I'm wondering if there is some "community" BQ rather than maintaining import of my own.


Has anyone been able to visually assess this dataset's accuracy against OSM / Google Maps etc for any given region?

Looks like it will be a while before that can be done, seeing as it uses a custom schema.


Between this, Natural Earth, OpenStreetMap, USGS, and others, the availability of map data today would be stunning to early cartographers.


I tried my hand at pulling data from opentopology recently to use in blender and thought it was an amazing source of data for my country - I later realised that it seems like only the US and NZ are well represented there though (guess I was just lucky - thank you LINZ).


Will OSM incorporate this information into their own map?


Does this add impedances to the OSM street segments?


POIs, buildings, transportation network, and admin boundaries layers.. on my!


Is there any quicksand on the map? What about Lil Terry, is he on the map?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: