Hacker News new | past | comments | ask | show | jobs | submit login

The tragedy is that good implementations are rare because correct implementations are really difficult. Everyone grossly underestimates how difficult it is to implement precise, correct behavior globally in software without weird emergent problems.

Odd edge cases are the default state of reality for geospatial unless you are extremely careful. Even if you are diligent and assume a global high-precision ellipsoid as your Earth model, you still have to ensure that the computational geometry is correct in that context for every possible use case. Operations on real surfaces are difficult to implement on discrete computers.

As a simple example, correct ellipsoid computation in double precision sometimes can't even be computed in quad precision, never mind double, yet most efforts at computational geometry limit themselves to double precision because that is what computers support natively. Manipulating relationships in real-world coordinate systems is not for amateurs.




My impression has been that basically ESRI seems to have more correctness and the open source stuff has done the first 90% and left the second 90% on the table. What do you think of libraries like shapely, if you've tried it? I make maps of the county where I live and there are parcels that ESRI doesn't seem to mind but shapely refuses to ponder. An example:

1: https://drive.google.com/file/d/1Fw5SdZ-aAiHrZERnIkVgll_OFnt...


ESRI is more complete than the open source implementations generally, but is not usable for large-scale geospatial analytics. I have not used Shapely but it appears to be based on the same geometry routines as PostGIS. If you look at the underlying implementations of most open source for geospatial algorithms, of which there are only a few, it is usually missing code paths for complex numerical analysis edge cases regardless of what the documentation says. (Sometimes there are even code comments to that effect.) At sufficient scale, you will find all the edge cases in real-world applications quickly even though you may not notice if you have nothing to compare it to. When we developed the first “analytics grade” ellipsoid geospatial geometry engine several years ago (an enormous technical undertaking involving a team of mathematicians), it created distress at some companies because the visible discrepancies relative to other geospatial analysis systems they were using were much larger than they imagined.

A challenge of open source software is that it largely only gets created when the technical complexity and expertise barrier is low enough. Above that threshold, the small number of people capable of contributing combined with the large number of man-hours required for a correct implementation is beyond the resources typically available to such a project. Geospatial is a good example of this pattern (database engines are another).


The open source stuff is mostly really fidgety and works correctly with a rather narrow set of inputs. To make most libraries be actually easily usable for a wide range of inputs is a tremendous amount of work.

The core of them mostly do the thing they claim to do, but if you feed them data that's just slightly off, then it'll break in interesting ways.

Speaking based on a lot of experience[1]. Nothing in the geospatial or environmental data world is easy and there will always be an unexpected problem lurking right behind the corner, regardless of how many of them you've already fixed or worked around.

Still, the complexity of the problem space also makes it fascinating. As an example, the definition and measurement of something that's seemingly very simple: mean seal level is actually tremendously complicated if you think about it. The Wikipedia article talks about averaging over decades of measurements to get it right for example.

[1] I'm a part of the team behind https://data.planetos.com/datasets


What do you mean by double and quad precision?


Refers to the number of "words", in this case 2 or 4. The term word is not used so much anymore because it's kind of lost its usefulness in modern computer architectures. It used to refer to the size of CPU registers and/or bus widths. I guess the last generation where this was still as important was 32 bit, and that kind of stuck, hence double word floating point.

[Edit] More informed description: https://en.wikipedia.org/wiki/Word_(computer_architecture)


Yes, these days when the typical word is 64-bit sized, these names have to go.


I would guess 64bit floating point (double) Vs 128bit floating point. Eg f32 and f128


Probably 64-bit and 128-bit floating point numbers.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: