Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
We had a unit test which only failed on Sundays (qntm.org)
43 points by ColinWright on Dec 2, 2015 | hide | past | favorite | 40 comments


Once upon a time in a Fab far away...

We had received three reports over as many months of a system that failed mysteriously in a manufacturing setting. We sent a field tech out each time. He examined the system, examined logs, found that an error had happened, but found no explanation for it.

On the fourth report, while examining the system, the field tech noticed that single thread of sunlight was passing through a high window and into the viewport of the system. He immediately realized that the optical sensor on the automated arm would be spoiled by that beam had the system been operating.

We had the manufacturer cover the window, and filed it as a bug that only happens at a specific time of day in a specific season when the sun is shining.


This reminds me of the case of the "500 mile email" [1]

[1] http://www.ibiblio.org/harris/500milemail.html


Corner cases involving time are endless (you can use this joke for free):

- Can your code handle a datetime if the logic expects midnight times?

- What happens if you have a duration that starts on November 1 and ends on November 2, and the user decides to change the end time by an hour? Daylight savings?

- If I fly from Denver to Arizona in December, how does it deal with the time zones? Oh great, you need a table.

- What's the next business day? Another table, this time with holidays.

- When does an all day event start? Midnight? Midnight where?

- Pre-gregorian time? Warhammer 40K time?

Then there's platform based problems:

- Is there an easy way to do time arithmetic?

- Is there an easy way to extract time parts? (iOS I'm looking at you!)

- Are time zone aware times different to UTC times?


Because of questions like these, the widest table in every dimensional model I've ever built has been the date dimension.


Yeeeep. Can be tens to hundreds of columns depending on the requirements. Fiscal years, UTC+14, holidays, etc, etc, etc.


Work days, index field for every granularity of time (for fiscal calendars that don't align to calendar granularities), day number of every time span, display fields stored in text ("I don't care about the intricacies of date formatting in locales, just make it always look like this "), abbreviations of display fields.

How about the retail requirement of capturing store open/close status for year on year comparisons? Date dimension becomes Store_Date dimension, with indicators for what to include/exclude in comparison totals?

I give trainings on dimensional modeling and query design/optimization. The vast majority of my examples are for dates, for two reasons.

1. Everyone needs some form of date dimension.

2. If you can solve date problems, you can apply any other logic you want trivially.

2 is not 100% true, but sometimes it seems that way.


I've come to simply expect that any code that deals with time is going to be hard to build abstractions on top of, whether it be test cases or workflows. The number of ways in which we use time is combinatorially large.

When introducing awareness of time into a system, I slow development down by an order of magnitude until I have workflows built up to deal with all the edge cases. The workflows have to be iterated on manually and slowly so the domain model can emerge.


I remember a similar bug in the early 1980s when a line-of-business system went haywire every Wednesday. Every other day it worked fine.

We eventually found a maintenance programmer had changed "Wensday" in a database to "Wednesday". Unfortunately this was passed to a C char day[9], so the string had no trailing '\0'. Hilarity ensued.


Related: OpenOffice won't print on Tuesdays: https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619


Is this really a bad thing? If there's a bug on only one day of the week, then writing your unit test to work on the current day will actually catch that sort of bug. Whereas if you hardcode a day then it will never catch that bug. So it seems there is at least one advantage to this sort of testing in practice.


A better way to test it would be to deterministically generate the days that you want to test with. That might be one date for each day of the week, or one date for each day of the year, or some set of dates that include some normal dates and edge cases like leap days.

The problem with having non-deterministic tests is that when they break you don't know if it's because the test broke or the code broke and even if it was the code you don't know when it broke.

What if someone checked in a change that broke how things were handled on Sundays but the tests were only run on Sundays once/year? You'd have to run each build through the test suite, one by one, and you'd have to do it on that day, just to figure out which check in broke.


Unless the tests are almost never run on the weekends (which is what I assumed the author was implying).


Much less interesting than I was expecting. The program under test behaved consistently; the test itself contained logic to perform a different assertion on Sundays.


but the discussion with all the other, more interesting, bugs makes up for it


Date/Time is a fine example of something that initially seems simple, but turns out to be exceedingly complex at times. For this reason, I usually use a proper library (like Joda Time for Java, Moment.js for JavaScript, though Java 8 has improved things a lot with the built-in APIs) when dealing with any date/time data, especially when manipulating it, i.e. finding out a duration between two dates, adding a duration to a date, etc.

Also, it's helpful if you have some sort of DateService that you can mock rather than using something like "new Date()..." in your code. The article touches on this; in general you should not have tests that rely on or deal with the current datetime as that's just asking for non-deterministic behaviour. By having your code call out to a DateService to get the current date-time, rather than creating it on its own, unit testing becomes easier.

My favourite date/time story: http://stackoverflow.com/questions/6841333/why-is-subtractin...


Using a library is definitely the way to go when you can. For serialization, ALWAYS use 8601.


Time is a cross-cutting concern, so rather than a service it's better expressed as an effect or monad.


Date formatting is the source of so many of these headscratchers :(

I'm reminded of the difference between 'yyyy' and 'YYYY' https://news.ycombinator.com/item?id=8810157


If I'm parsing the article correctly (And I'm not sure I am), the test was written to care about the current day, because the code under test defaulted to the current day when no day was specified.

Someone please tell me what "should" have been done. While I recognize that unrepeatable tests can be bad, I don't see how one can test that defaulting code works correctly without messing with whatever that defaulting code relies on (in this case, the system date). I'd think I'd prefer to have some conditional logic in the test than screw with the system clock.

Then again, enough doesn't make sense here (why was this test failing?) that I'm probably misunderstanding the basis.


Our systems generally rely on a (dependency injected) 'currentDateTimeService'. In unit-tests, the currentDateTimeService is dependency-injected with a mock service, always returning the same date.

But here, I guess the better solution would be to fix the tested code: don't gracefully fall-back, but fail fast. I've seen too many places where graceful defaulting code ended up corrupting important data.


If I have an operation that is dependent on "now", I typically generate that time at the beginning of the operation and pass it as a parameter to everything else. So now all the important bits I can test with whatever timestamp I want. And if I really want to, I could test that the operation initialization generates the correct timestamp. But that's pretty close to testing whether "new Date()" works, which if it doesn't there's not a whole lot I can do about it anyway.


You usually just mock the date provider, so it just returns the same date every time.


The funner bug is when your testers find a bug that happens when they work late (after 6pm CST) that you cannot duplicate the next morning.


I've fixed a bug that broke a web app any time the user did something after 10pm. Turns out someone had written their own date/time parser and validated the time using a regex along the lines of "^[01]?\d:[0-5]\d:[0-5]\d".


Wouldn't this be an integration test if it was testing a dependency? I believe unit tests should have all dependencies mocked out to tightly control the scope of that which you are actually testing.


Two things I hate dealing most while developing software: graphics and date/time management. Every time a new project starts, you have to deal with the same problems over and over again..


A fun one that I came across was a test (in an old rails app I was maintaining) that failed only on the 29th January in years preceding leap year.

The test was:

  Subscription.new (starts_at: 1.month.from_now).ends_at.should == 13.months.from_now
On 29th Jan 2015, 13.months.from_now is 29th Feb 2016; but 1.month.from_now = 28th Feb, plus a year gives 28th Feb 2016.

(Possible conclusion: when using activesupport-style magic date helpers, look up their exact semantics and be sure that what they're doing is what you mean them to do...)


Another possible conclusion: fix this several years from now, the first time it causes a test failure and still probably before it causes any production problems.


If you have time bugs that only manifest at the DST changeover then everyone knows why your bugs have happened and you look incompetent. I suspect this applies if subscriptions handle leap years poorly too.


My point is more that if it costs more to fix it now, than it does to fix it later (including the opportunity cost), then there is little point in fixing an extreme edge case now. Especially if it means having to understand every intricacy of every library you work with and spend a long time thinking of permutations that have little effect on the business.

It depends on the situation you are in as to whether that cost calculation works out in your favour.

It is not always that is has to always work in 100% of edge cases, all thought about up-front and with no possible bugs arising due to not 100% understanding every line of code in every library. Sometimes it makes more sense to fix a bug that happens every four years when it causes a problem. In this case the bug may manifest itself in a user potentially seeing a day off by one error on a subscription details page. In which case it may never make sense to fix that page. As the user may never look at that page, and even if they did, they may never care.


> Sometimes it makes more sense to fix a bug that happens every four years when it causes a problem.

Sometimes yes, but you need to actually perform the risk assessment/cost-benefit analysis. Many bugs are costly enough that it's worth fixing them pre-emptively rather than always waiting for a problem to happen before you do anything about it.



The article was surprisingly complicated, I was expecting something like yesterday is DAYOFWEEK(NOW())-1 which works every day except Sunday.


tl;dr

Badly-written test, nothing to see here.


ISO weeks run Monday-Sunday, not Sunday-Saturday.


We had a test that was calculating a total value of a counter (so based on differences between counter values) for "yesterday" (as well as "today", "this week", etc). I committed code and within a few minutes get notified that my code caused a failure in this test -- unexpected as my change didn't even touch anything remotely related to this functionality.

As I start asking if anyone has any idea, someone else mentions they've seen that before too, many weeks ago, but then it "fixed itself".. Today was the 1st, and when I looked at the previous failure for this test, it was also on the 1st, but it was a couple months ago. Last month the tests were passing on the 1st. So I start to dig into it.

To put this in perspective, we had dozens of other tests that were checking the same calculation for various time spans across months (including explicitly for months with both 30 and 31 days, and February-to-March for both leap and non-leap years), and all sorts of combinations of different values, missing data, etc, that had mostly been there for well over a year. We had actually spent a fairly significant amount of time thinking about how to test this for all the different combinations of dates and data.

As it was the 1st, one of the other tests was actually running with the exact same start and end date/times as this "yesterday" test, but was passing. So I started looking at the mock data each was using.

Turns out it was in fact a legitimate bug, but only happened if it was currently the 1st, AND yesterday was not the 31st, AND there were no values at all for the current month (or anytime later).

The mock data for the "yesterday" test didn't have any values after whatever yesterday was. The explicit date test for 31st-to-1st happened to have a 0 value on the 1st, which meant this bug didn't happen.

Mostly out of curiosity, I looked further back in test history. There were in fact 3 or 4 separate times this test failed, all of which were on the 1st of either March, May, July, October or December. But not all of those dates -- because sometimes the 1st was on a weekend, or just no code was pushed, and no build was run.

This was also in production, but probably was never seen by any customers (none had reported it) because the "yesterday" value was only ever displayed in the UI, and most of the time data is added hourly, so by the time a user logged in on the 1st (say, 8 am), it was almost certain there was some piece of data added (even if it was 0).

We added an explicit test with the data for this situation, and of course fixed the bug.

However, this will live on as by far the most obscure time-based test failure I've ever had to deal with.


> I committed code and within a few minutes get notified that my code caused a failure in this test

Forgive me as I'm coming from a .NET background where everything is tightly integration into Visual Studio, but are you not able to run your tests before committing? We are strongly encouraged to run the full test suite prior to committing any code for exactly this reason.


Yeah, and this is also .NET. I can't remember if I ran the full test suite or not (I have to admit, I don't always -- even though it's not best practice -- if it's a fairly isolated/minor change).. but in this case, due to the problem, if I had done the code change on the 30th, it would have passed at that time anyway. The nightly build (in the morning hours of the 1st) would have failed.


We run all our tests nightly (as well on a per commit basis) as a matter of policy. It is incredibly useful when something like this pops up - and with a somewhat obtuse code base like ours, it does a bit too often.


So do we -- and it's how I noticed this.

We have a "debug" build that compiles everything and runs unit tests, which runs on every commit on every branch (we use gitflow, so all actual work happens in feature branches).

There's also a nightly (or manually triggered) "release" build that runs on the master and any release/* branches (if there are changes), and additionally does some i18n stuff (which includes the convoluted step of compiling a VB.NET app and then decompiling it into C# so we can run gettext on it), builds installers and does some other packaging tasks.

The problem is this bug was dependent on when the tests ran. Commit on November 30th, and the debug build would be fine, but release build would fail the next night (Dec 1). Commit on October 30th, both will be fine as the release build runs on October 31st. November 1st is a Sunday so chances are no one was working on Saturday, which means no build, which means this failure isn't visible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: