Hacker News new | past | comments | ask | show | jobs | submit login
Gpsd bug may create a 1024 week time warp on October 23 (gitlab.com/gpsd)
137 points by oger on Aug 2, 2021 | hide | past | favorite | 98 comments



From the gpsd homepage (https://gpsd.gitlab.io/gpsd/index.html): “GPSD is everywhere in mobile embedded systems. It underlies the map service on Android phones. It's ubiquitous in drones, robot submarines, and driverless cars. It's increasingly common in recent generations of manned aircraft, marine navigation systems, and military vehicles.”

FTA: “Will it be cherry-picked back to the 3.20, 3.21, and 3.22 branches?

gpsd does not have enough volunteers to maintain "branches". Some distros try to cherry pick, but usually make things worse.

This bug was announced on gpsd-dev and gpsd-users email lists. So the packagers for several distros already saw it. What they do is what they do.”

So, it seems gpsd is like the tz database, a few volunteers maintaining an essential part of our software infrastructure.


> So, it seems gpsd is like the tz database, a few volunteers maintaining an essential part of our software infrastructure.

More than that. Software is now running the economy, controlling safety and security of the physical world.

gpsd, like Tz database, cURL, SQLite and the Linux Kernel, should be seen as critical planetary infrastructure, period. Safety of our economy and our physical well-being increasingly depends on them being operational.

And yes, it's worrying that we ended up in a situation, where the building blocks of our technological society are maintained by underpaid volunteers.


>FTA: “Will it be cherry-picked back to the 3.20, 3.21, and 3.22 branches?

Just update.


I saw in another comment here by offmycloud [1] that this affects:

> Android phones and tablets. "In addition, the Android smartphone operating system (from version 4.0 onwards and possibly earlier; we don't know for sure when the change happened) uses GPSD to monitor the phone's on-board GPS, so every location-aware Android app is indirectly a GPSD client."

Can someone explain how the patch for this will reach all Android devices (especially the large number of devices running older versions of the OS and not getting any updates at all)? What exactly are the consequences for these users?

[1]: https://news.ycombinator.com/item?id=28045191


It seems like the bug was introduced in 2019, so it's presumably phones released since then. Still a problem, but not "all Android phones" bad.


Global warming is breaking our software.

From https://gitlab.com/gpsd/gpsd/-/issues/144#note_633612324

> Until last year, leap seconds had been very predicable. The effect of global warming on earth rotational speeds was only very recently seen, or even predicted. But, yes, going forward, that needs to change.


And _negative_ leap seconds!

My mind is blown.


Somewhat unrelated: Can someone explain the rationale for writing comparisons in the ordering they're using (e.g., 2180 < week)?

I've seen similar before and always thought it seemed error-prone to not write them the way they'd be spoken aloud, but happy to entertain other explanations.


This is probably just me, but I always tend to write a bigger number in the right, so that I can picture a number line in my head.

(Which means I rarely use > and >= operators. It's always < or <=.)

Now, come to think about it, this might have something to do with my native language (Japanese). In Japanese, where the verb always comes at the end of a sentence, you can say "a < b" and "b > a" using the same order and same adjective.

   a < b ... a -than b -toward bigger (a よりも b のほうが 大きい)
   b > a ... b -toward a -than bigger (b のほうが a よりも 大きい)


I've seen style guides recommend that to avoid typos on == that accidentally result in assignment: write "123 == foo" instead of "foo == 123" so that you can't accidentally write "foo = 123".


Surely such code would be caught by some other tool, and isn’t worth the mental overhead to translate it back into the more natural form?


It's not only about not accidentally writing `if (x=0)`.

The `if (0==x)` style also makes it obvious that the check is correct when reviewing/reading code. Sure, a linter might catch this. But this way the reader doesn't need to rely on that. Besides many codebases allow variable assignment as part of conditional/loop expressions, and sometimes sadly it's easier to write code this way than to get a team to use a linter.

Regarding it being unnatural... you get used to it, and especially in C one needs to take care to check the return code the right way (0!=, 0==, -1!=, 0<, !, etc.), whereas the other side of the check is often more straightforward (a function call, a variable etc.), so it's nice to have the constant up front. It takes very little extra space at the front. As a bonus all the constants will visually line up nicely that way.


Both GCC and Clang have -Wparentheses, enabled with -Wall, which will warn on constructs like `if(x=0)`. MSVC also has a warning for this.


"Natural" is both not universal, and re-trainable. `value [op] var` is quite common in many circles, it is natural to many of those coders.


> Surely such code would be caught by some other tool

This technique was invented back in the 1980s, back then compilers have no static analysis capabilities that we take for granted today. I think the reason of keep using it in 2020 is a matter of habit.


It's caught by the compiler since you can't assign to a constant. Why would you need another tool?


Or we can take advantage of our compilers instead of mountains of untested linting cruft.


I can get on board with this, as it scans the same both directions


When working with lots of timeseries information I've found it helpful to always order comparisons so that left-to-right is always increasing, so I would write 2180 < week, or even (not 2180 < week) for the negation of the condition. This becomes almost required when there are multiple values being tested: (a <= 10 && 10 <= b && b <= c) is much easier to read than (a <= 10 && b >= 10 && b <= c). As a perspective, it's more focussed on establishing an invariant of the resulting data than writing a single predicate.


I assume it's a habit formed from guarding against unintentional assignment.

Less so for most comparisons, but for equality, it'll throw an error if you're using the wrong operator instead of having unintended side effects.

week = 2180 will set the week to 2180 in a lot of languages.

2180 = week will always throw an error.

So if I want to compare, it's safer to use the form of 2180 == week because if I forget the second '=', the compiler will tell me before I make bigger problems.


This mistake also requires that the assignment operator has a result (and further, that the result can be silently coerced into a boolean for whatever reason)

In both Rust and Swift, this mistake doesn't compile, their assignment operators don't have a result and so it can't very well be true (or false) ‡

‡ Technically the result of Rust's assignment operators is the empty tuple, which is the closest to "doesn't have a result" in the type system. I don't know about Swift.


As far as I'm aware, only C and C++ a use for Yoda conditions have.


Big the C family is. Copied to many languages its quirks have been.


That is a stellar point, thank you for this.


Even more unrelated, but equally pedantic, I've always thought it's weird when people write "null != val" instead of "val != null" for the same reason.

When said out loud, "null is not val" just feels wrong.


Comes from the null==val idiom, and that comes from avoiding the popular "if (val=null)" bug in C/C++.

Sure, we could just admit that assignments in conditions are a permanently stupid idea. Instead, an entire industry backwards the conditions wrote.


I find initializing condtions like `if(Type variable = ...)` to be very nice in C++ to avoid excessive nesting while still keeping the variable scoped to the block. Of course, I also enable -Wparentheses for things like `if(val=null)`, which you get e.g. when using -Wall with both GCC and Clang.


Isn't that just Yoda notation? https://en.wikipedia.org/wiki/Yoda_conditions


In PowerShell you'll get a linter warning because things like (look at the -):

if (@() -eq $null) { 'true' }else { 'false' } # false

if (@() -ne $null) { 'true' }else { 'false' } # false


And there also exist expressions for the left-hand side that make both expressions truthy, like [0]:

  $v = $null, $null, 1

  ($v -eq $null) -and ($v -ne $null) # True
In both these cases, this is caused by the language having built-in binary operators for when the left-hand side expression is a collection type that perform the operation elementwise and return an array.

Interestingly, it seems like PowerShell's operator overload resolution in general depends entirely on the type of the LHS. I say 'seems' because I couldn't find any sort of language specification when I looked into it a while ago like what C# and VB.NET have, and testing seemed to confirm that this was the case. Now, searching the PowerShell Core source, it seems from [1] that this is indeed the implementation.

This contrasts with C# [2] and VB.NET [3], where binary operator overload resolution is treated as if the candidate operator implementations were a two-parameter method group, making the resolution process 'commutative' (though not always commutative in practice as the operators themselves can still have different LHS and RHS types {Edit: example from the CLR [4]: +(Point, Size) but not +(Size, Point)}).

[0] https://blog.iisreset.me/schrodingers-argumentlist/

[1] https://github.com/PowerShell/PowerShell/blob/master/src/Sys...

[2] https://docs.microsoft.com/en-us/dotnet/csharp/language-refe...

[3] https://docs.microsoft.com/en-us/dotnet/visual-basic/referen...

[4] https://source.dot.net/#System.Drawing.Primitives/System/Dra...


Can't speak for that particular example, but sometimes you might write a < x so that later you can naturally && it with x < b.


> Can someone explain the rationale for writing comparisons in the ordering they're using (e.g., 2180 < week)?

https://en.wikipedia.org/wiki/Yoda_conditions

> it seemed error-prone to not write them the way they'd be spoken aloud

It is written the way it'd be spoken aloud. If you're not speaking it that way then you need to change the way you think. Programming is another language after all.


I don't know about you, but "week is greater than 2180" sounds more natural than "2180 is less than week"


> I don't know about you, but "week is greater than 2180" sounds more natural than "2180 is less than week"

To you perhaps. To me, I think "if I put both sides on the number line, which way is being questioned? and is that question answered true or false?"

And therefore I almost always do an equals or less-than comparison because that's how I think about the number line: 0 in the center with negatives on the left and positives on the right.

So `if (week < 2048)` is just as valid and easy to think about as `if (!(2048 <= week))`. But then `if (!(2048 <= week))` provides an additional guarantee: that I won't accidentally assign to `week`.


I don’t understand if this is an actual bug.

I have heard that the timing on gps is somehow delivered as weeks and that the bitsize of the variable keeping track of the weeks is to small. So every now and then the weeks reset and this is managed through overrides in the clients. Is this bug not just referencing that thing, the override of the week rollover?


Yes and no. GPS has 10 bits for the week number (so 1024 weeks)

This code is using the number of leap seconds that have happened to sanity group of 1024 weeks we are in. The assumption is that by December of 2022, we would have another leap second, so if we had fewer than 19 total leap seconds, then something has gone wrong. However due to incorrect arithmetic, this sanity check is looking at October 2021.

Further comments point out that the sanity check need not be in production code at all, but should be moved to test code.


GPS also has a field indicating when the previous or next leap second was or will be; this field is 8 bits or about 5 years. The last leap second was in 2016 and no future leap second has been announced. So GPS needs to mark a non-leap to keep the week offset in bounds.

This happened before in 2003 during the previous long gap in leap seconds (1998-2005).


Being unfamiliar with GPSD, what devices/services would this be likely to affect?


NTP servers.


Android phones and tablets.

"In addition, the Android smartphone operating system (from version 4.0 onwards and possibly earlier; we don't know for sure when the change happened) uses GPSD to monitor the phone's on-board GPS, so every location-aware Android app is indirectly a GPSD client."



Why do people use gpsd instead of just reading $GPGLL or $GPRMC from /dev/ttyACM0 or /dev/ttyUSB0 or whatever, which always seemed far more reliable to me?


The faq answers this [1]. The issue is that GNSS vendors and standards need to clean up their act before you can do this reliably across different receivers.

[1] https://gpsd.gitlab.io/gpsd/faq.html#why_not_parse_nmea


I suppose. But I've had so many more problems with gpsd (especially when e.g. USB enumerates devices randomly) that on outdoor robots I've switched to parsing NMEA strings and binary data over serial directly, specifically for reliability reasons.

Also I've had gpsd think some other non-GPS serial device was GPS, took up the port, and I got frustrated at its incompetency and apt-get uninstalled it.


Yeah, if gpsd sees anything with the right serial chip it'll assume it's a receiver. The whole thing pretty big pile of hacks, but it has utility nonetheless.


Allows multiple applications to use a GPS source.


Okay, then why not just take the /dev/ttyACM0 output and redirect it to a standard TCP port with like 5 lines of code?

And can't multiple applications read a /dev/tty device read-only with just e.g. "cat /dev/ttyACM0"?


Technically you can (in that you won't usually get an error), but the result is not what you would want: each byte only gets delivered to one of the readers, in a somewhat unpredictable manner, either mangling the data or starving one of the readers


No, multiple applications cannot read the same data from a tty device.


This brings back so many nightmares at my old job. These kind of things would creep up on us all that time and we'd spend a week or so scratching our heads in how the hell did our systems would travel time.


Note the heading is an error. In the comments of the bug :

"Ooops, 16 Oct, 2021, was supposed to b 31 Dec 2022. My calendar error. That needs to be fixed."


My understanding of the comments is that the week 2180 code was supposed to be 2022-12-31, but the code is actually 2021-10-24 (end of week 2180).

In other words, the heading is correct.


hah, yes, came back to correct my statement. Thanks!


It is surprising that this bug found in just three month ago from the day.


As a business, on a scale of 1-10 how worried should I be?


Depends, how much does your business depend on GPS time?


Well, some other comments indicate that this might affect NTP servers? Would that indicate some kind of follow on effect on database timestamps? Timezone localisation/conversions? Replication setups?


Most people using NTP servers do not use GPS units directly, but sync them to public NTP servers - for example https://www.ntppool.org/en/. Redhat for example ships these as the default upstream NTP servers. if the ntppool.org servers use GPS and gpsd, they are likely to have patched the issue well in advance.

I work in a business where we do use serial and network connected GPS devices as stratum 0 timesources for NTP, and yes we have concerns about the implications of this bug on some of our remote devices. If the gpsd starts sending incorrect time/date to the local ntpd it will probably be marked as a false ticker. We have multiple GPS based NTP servers in our datacenters as fallbacks, however we will probably need to check for a firmware update from them for this issue from the vendor.


Great, thanks! I'll continue to keep an eye on the situation at least. :)


It most likely does not. You can get accurate (down to ~10ms) time from other NTP sources. What you want from a GPS based NTP server is the PPS output, which is accurate to a few ns.


We just replaced some old GPS time-servers that have a similar bug.


What did you replace them with?


Is global warming really affecting the Earth’s rotation?


Yes. It's the classic ice skater effect. Climate gets warmer => Ice (on mountains) melts. Molten ice = water flows down into the ocean, moment of inertia of the Earth is reduced => Earth's rotation speeds up.

Just to put things into perspective: The elevation of the land surface at the Earth's south pole is over 2800m above sea level. The highest point in greenland is over 3600m above sea level.


oh cool, because that's my birthday


How do we not have a way to predict leap seconds?


The number of leap seconds required is determined by the Earth's rotation speed, which isn't constant. In the same way that an ice skater extending his arms slows down, shifts in mass on the Earth can alter its rotation speed. Earthquakes, icemelt, atmospheric warming, and even the filling of the Three Gorges Dam can have an effect at the scale required for GPS synchronization.


This isn't even really a GPS thing, it's UTC time that bounces around because it wants to stay synchronized to the day/night cycle. GPS time is continuous.


We decided to have leap seconds to match UTC which otherwise needn't care and could be based on TAI, against UT, which depends on the gentle spinning of the large rock we all live on.

The IERS https://www.iers.org/ is in charge of monitoring the spinning of the Earth. On the basis of their assessment a decision is made every six months whether to inject (or indeed remove) a leap second.

If we decided not to match UTC to UT and thus we did not care precisely how quickly or slowly the Earth is spinning, we could abolish leap seconds.

If you meant, "Why can't we precisely predict the motions of a vast rock floating in space years into the future" then I don't know what to tell you. We're not God?


> If you meant, "Why can't we precisely predict the motions of a vast rock floating in space years into the future" then I don't know what to tell you. We're not God?

IMO this feels like the more interesting one to explore: are leap seconds wholly from things that are within measurement error in existing rotation, or is the Earth's rotational rate actually changing? If we had leap minutes as the smallest increment, could we predict them out centuries in advance? etc


No, they are not measurement error. The moment of inertia of the earth changes over time in irregular ways (e.g, melting glaciers, etc). Even a major earthquake can show up in the earth’s rotation.

Leap seconds account for the accumulated difference in the rotational period from time to time.


Why not just go back to the Julian calendar? You'd avoid a lot of software bugs.


You jest, but I like to use Julian days (fun fact: different Julian!) to calculate day ranges. It's built in to SQLite so it's really convenient.


Indeed, a lot of astronomers use them that way.


Leap seconds correct for the difference between the time as measured by atomic clocks and the time determined by solar observations.

Turns out the rotation speed of Earth varies. Things like tides, earthquakes, and climate change can affect it. There is no formula for that, the only thing you can do is measure and issue a leap second when required.


Quoting from that thread:

> I don't think gpsd has any reason to be predicting when a future leap-second is going to occu

Until last year, leap seconds had been very predicable. The effect of global warming on earth rotational speeds was only very recently seen, or even predicted. But, yes, going forward, that needs to change.

> And the code in question is clearly expecting only positive leap-seconds.

Yes, because until 2020, the thought of a negative leap second was unthinkable. I would welcome you testing that and seeing what falls out.


We're floating through space and that space doesn't have a uniform shape and then there's the n-body-problem.

Interestingly enough we're finding out that a lot of our solar cycles are probably due to massive things in very long orbits in our solar system.


It'd be cool if we could just invent a new standard time system that's independent from things like that which add variation or unexpected randomness.

I mean sure it would be annoying but its a one time change and our generation has to endure the pain of upgrading our systems, but it would be worth it no?


I'd be interested to hear how you're going to bring it into alignment with GMT and the other Sun-and-Earth-based measuring systems which people are actually going to want to use. Note that the solar day does vary unpredictably in length (https://en.wikipedia.org/wiki/Day_length_fluctuations).


How hard is it to affect the Earth's rotation?

It looks like it you would need to change the Earth's rotational energy by ~1.4*10^22 J to change the length of a mean solar day by 1/365 seconds (which would cause UTC to change by 1 second per year). If energy costs 1 cent per kilowatt hour, this is only around $40 trillion, which is much less than I was expecting.

If energy becomes a few orders of magnitude cheaper and someone knows a reasonable mechanism to put that energy into the earth's rotation, Google or someone similar might find it easier to keep days at 86400 seconds than to deal with leap seconds.


You must work for SAP.

(The joke is, if your business workflow is different to their software, it's the business model that has to be adjusted.)


TAI is pretty well aligned.

It will drift slightly over the centuries. Is that a big deal compared to even the variation inside a time zone?


There's not really anything to invent or upgrade. Just never add any more leap seconds (the past ones can't be safely removed). We need to lobby the IERS harder.


We already have one of those. https://en.m.wikipedia.org/wiki/International_Atomic_Time TAI is just UTC before the leap second adjustments.


Switching everyone from UTC to TAI is too much work and if you try to use TAI while everyone else is on UTC you'll run into off-by-37 bugs everywhere. It's better to keep using UTC but add no more leap seconds to it.


If people didn't need leap seconds, they would already not be using them. There's absolutely no use case for adding leap seconds for a few decades and then stopping. Either put them in or don't.


At the time leap seconds were introduced, it was much rarer for anyone to be able to tell (much less care) that someone on a another continent had a clock a few seconds off from theirs (and those who cared most were probably astronomers, which is why we ended up with leap seconds). There's a reasonable argument that the number of bugs (and extra work for programmers) now caused by them is enough that we should just stop adding them (and perhaps change time zones every few millennia).

Of course, the 'correct' way to fix it would be to use TAI rather than UTC just about everywhere, but that change would be hard to implement compared to just not adding more leap seconds.


Leap seconds are the right thing to do but people didn't realize all the bugs and costs they would trigger. Now that we understand these costs we can and should change our minds.


Well, leap something is arguably the right thing to do, but I'm not convinced that seconds are the best size. It's very possible that a leap minute every century or two would cause less disruption. It's also possible that making leaps a lot more frequent would cause less disruption, because good code is well-tested and oblivious code is less impacted.


> It's very possible that a leap minute every century or two would cause less disruption.

Sonuds like "make it a problem that will not happen in my life time".


Which actually makes the problem more disruptive as when it needs to be solved the solution has to be rediscovered rather than retained as it is with problems that are more frequent.


> If people didn't need leap seconds, they would already not be using them.

Why do you believe that?

> Either put them in or don't.

Sure, if you have a time machine we can go tell them not to add leap seconds.


> If people didn't need leap seconds, they would already not be using them.

Google already don't. But I think they had to patch their kernels etc. to achieve that.

Leap seconds were a bad solution and we should remove them from general-purpose computer systems (some very specialised systems may need them). But it's a massive coordination problem and most people just don't care enough to change anything.


Leap Smear doesn't mean just ignoring leap seconds.

https://developers.google.com/time/smear


No, but it has most of the same advantages and disadvantages as just ignoring them. I would bet that the only reason they apply a smear rather than just ignoring the leap second entirely is to keep their clocks in sync with the outside world.


I believe we used to, but only recently have we began to understand that the Earth's rotation is not slowing down in a linear/predictable fashion. A comment on the bug says this has something to do with global warming, but I don't really have much context here.


The Earth's rotation is complicated. See https://www.smithsonianmag.com/smart-news/global-warming-cha... for how global warming is theoretically speeding up the Earth's rotation (melting ice causes land to rise at the poles, and moving rock turns out to be more important than water migrating from pole to equator) and https://phys.org/news/2015-12-scientists-reveal-rotation-ear... for how interactions of the core, mantle and crust are currently slowing the day down.


In the same vain there is no algorithm to precisely predict moon phases. A complete cycle takes proximately 27 days but not really. To find out you have to look into the sky which can also be a bit subjective.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: