The tragedy is that good implementations are rare because correct implementation...

jeffbee · on Oct 30, 2020

My impression has been that basically ESRI seems to have more correctness and the open source stuff has done the first 90% and left the second 90% on the table. What do you think of libraries like shapely, if you've tried it? I make maps of the county where I live and there are parcels that ESRI doesn't seem to mind but shapely refuses to ponder. An example:

1: https://drive.google.com/file/d/1Fw5SdZ-aAiHrZERnIkVgll_OFnt...

jandrewrogers · on Oct 30, 2020

ESRI is more complete than the open source implementations generally, but is not usable for large-scale geospatial analytics. I have not used Shapely but it appears to be based on the same geometry routines as PostGIS. If you look at the underlying implementations of most open source for geospatial algorithms, of which there are only a few, it is usually missing code paths for complex numerical analysis edge cases regardless of what the documentation says. (Sometimes there are even code comments to that effect.) At sufficient scale, you will find all the edge cases in real-world applications quickly even though you may not notice if you have nothing to compare it to. When we developed the first “analytics grade” ellipsoid geospatial geometry engine several years ago (an enormous technical undertaking involving a team of mathematicians), it created distress at some companies because the visible discrepancies relative to other geospatial analysis systems they were using were much larger than they imagined.

A challenge of open source software is that it largely only gets created when the technical complexity and expertise barrier is low enough. Above that threshold, the small number of people capable of contributing combined with the large number of man-hours required for a correct implementation is beyond the resources typically available to such a project. Geospatial is a good example of this pattern (database engines are another).

terryf · on Oct 30, 2020

The open source stuff is mostly really fidgety and works correctly with a rather narrow set of inputs. To make most libraries be actually easily usable for a wide range of inputs is a tremendous amount of work.

The core of them mostly do the thing they claim to do, but if you feed them data that's just slightly off, then it'll break in interesting ways.

Speaking based on a lot of experience[1]. Nothing in the geospatial or environmental data world is easy and there will always be an unexpected problem lurking right behind the corner, regardless of how many of them you've already fixed or worked around.

Still, the complexity of the problem space also makes it fascinating. As an example, the definition and measurement of something that's seemingly very simple: mean seal level is actually tremendously complicated if you think about it. The Wikipedia article talks about averaging over decades of measurements to get it right for example.

[1] I'm a part of the team behind https://data.planetos.com/datasets

BlueTemplar · on Oct 30, 2020

What do you mean by double and quad precision?

jackfoxy · on Oct 31, 2020

Refers to the number of "words", in this case 2 or 4. The term word is not used so much anymore because it's kind of lost its usefulness in modern computer architectures. It used to refer to the size of CPU registers and/or bus widths. I guess the last generation where this was still as important was 32 bit, and that kind of stuck, hence double word floating point.

[Edit] More informed description: https://en.wikipedia.org/wiki/Word_(computer_architecture)

BlueTemplar · on Oct 31, 2020

Yes, these days when the typical word is 64-bit sized, these names have to go.

nmca · on Oct 30, 2020

I would guess 64bit floating point (double) Vs 128bit floating point. Eg f32 and f128

Lvl999Noob · on Oct 30, 2020

Probably 64-bit and 128-bit floating point numbers.