Multiple assertions are fine in a unit test

TeMPOraL · on Nov 5, 2022

> The excellent book xUnit Test Patterns describes a test smell named Assertion Roulette. It describes situations where it may be difficult to determine exactly which assertion caused a test failure.

How is that even possible in the first place?

The entire job of an assertion is to wave a flag saying "here! condition failed!". In programming languages and test frameworks I worked with, this typically includes providing at minimum the expression put in the assertion, verbatim, and precise coordinates of the assertion - i.e. name of the source file + line number.

I've never seen a case where it would be hard to tell which assertion failed. On the contrary, the most common problem I see is knowing which assertion failed, but not how the code got there, because someone helpfully stuffed it into a helper function that gets called by other helper functions in the test suite, and the testing framework doesn't report the call stack. But it's not that big of a deal anyway; the main problem I have with it is that I can't gleam the exact source of failure from CI logs, and have to run the thing myself.

hedora · on Nov 5, 2022

> I've never seen a case where it would be hard to tell which assertion failed.

There are a set of unit testing frameworks that do everything they can to hide test output (junit), or vomit multiple screens of binary control code emoji soup to stdout (ginkgo), or just hide the actual stdout behind an authwall in a uuid named s3 object (code build).

Sadly, the people with the strongest opinions about using a "proper" unit test framework with lots of third party tooling integrations flock to such systems, then stack them.

I once saw a dozen-person team's productivity drop to zero for a quarter because junit broke backwards compatibility.

Instead of porting ~ 100,000 legacy (spaghetti) tests, I suggested forking + recompiling the old version for the new jdk. This was apparently heresey.

sitkack · on Nov 5, 2022

You should write an episode of Seinfeld.

I was a TL on a project and I had two "eng" on the project that would make test with a single method and then 120 lines of Tasmanian Devil test cases. One of those people liked to write 600 line cron jobs to do critical business functions.

This scarred me.

ckastner · on Nov 5, 2022

> One of those people liked to write 600 line cron jobs to do critical business functions.

I was a long-time maintainer of Debian's cron, a fork of Vixie cron (all cron implementations I'm aware of are forks of Vixie cron, or its successor, ISC cron).

There are a ton of reasons why I wouldn't do this, the primary one being is that cron really just executes jobs, period. It doesn't serialize them, it doesn't check for load, logging is really rudimentary, etc.

A few years ago somebody noticed that the cron daemon could be DoS'ed by a user submitting a huge crontab. I implemented a 1000-line limit to crontabs thinking "nobody would ever have 1000-line crontabs". I was wrong, quickly received bug reports.

I then increased it to 10K lines, but as far as I recall, users were hitting even that limit. Crazy.

kragen · on Nov 5, 2022

Is Dillon cron a fork of Vixie cron?

ckastner · on Nov 5, 2022

Hadn't heard of it before, and it appears not to be.

There indeed exist a few non-Vixie-cron-derivative implementations but as far as I'm aware, all major Linux and BSD distributions use a Vixie cron derivative.

Edit: I see now where I caused confusion. In my original post, I should have said all default cron implementations.

kragen · on Nov 6, 2022

I thought Dillon cron was the default cron in Slackware? Hard to be a more major Linux distribution than Slackware, in terms of historical impact if not current popularity.

ckastner · on Nov 6, 2022

Could be. Slackware is a popular name, but I would call it "niche" rather than a major distribution. Just my personal view, obviously.

ttkciar · on Nov 7, 2022

It always gives me a spell of cognitive dissonance when someone points out that Slackware is no longer a "major" distribution.

It used to be the major distribution. Funny how times change.

kragen · on Nov 6, 2022

I just confirmed with a Slackware user today, it still does use Dillon cron. I had a vague memory from before I switched from Slackware to Debian late last millennium.

ch4s3 · on Nov 5, 2022

Junit is especially bad about this. I often wonder how many of these maxims are from people using substandard Java tools and confusing their workarounds with deeper insights.

turblety · on Nov 5, 2022

Yup, it's why I built `just-tap` [1] which trys to minimise as much magic that a lot of these frameworks try to "help" you with.

1. https://github.com/markwylde/just-tap

hedora · on Nov 5, 2022

Here are a few mistakes I've seen in other frameworks:

- Make it possible to disable timeouts. Otherwise, people will need a different runner for integration, long running (e.g., find slow leaks), and benchmark tests. At that point, your runner is automatically just tech debt.

- It is probably possible to nest before and afters, and to have more than one nesting per process, either from multiple suites, or due to class inheritance, etc. Now, you have a tree of hooks. Document whether it is walked in breadth first or depth first order, then never change the decision (or disallow having trees of hooks, either by detecting them at runtime, or by picking a hook registration mechanism that makes them inexpressible).

greggman3 · on Nov 5, 2022

We recently switched our system to a heartbeat system instead of a timeout system. The testing framework expects to see messages (printf, console.log, etc...) often. So a test testing a bunch of combinations might take 45 seconds to run but for each combination it's printing "PASS: Combination 1, 2, 3" every few ms.

This way the framework can kill the test if it doesn't see one of these messages in a short amount of time.

This fixed our timeout issues. We had tests that took too long, specially in debug builds and we'd end up having to set too large a timeout. Now though, we can keep the timeout for the heartbeat really short and our timeout issues have mostly gone away.

hinkley · on Nov 6, 2022

Never disable the timeouts. What you want is a way to set the timeouts once for an entire suite. Unit, functional, and integration tests all have a different threshold from each other. But in general within one kind your outliers almost always have something wrong with them. They’re either written wrong or the code is. And once I’m a while it’s okay to override the timeout on one test while you’re busy working on something else.

The problems isn’t with breaking rules. The problem is with promising yourself or others that you will fix it “later” and then breaking that promise.

jasonhansel · on Nov 5, 2022

I'd add one more: clearly document what determines the order in which tests are run.

On the one hand, running tests in any order should produce the same result, and would in any decent test suite.

On the other hand, if the order is random or nondeterministic, it's really annoying when 2% of PRs randomly fail CI, not because of any change in the code, but because CI happened to run unrelated tests in an unexpected order.

thomaslangston · on Nov 5, 2022

Test order should be random, so that the ability to run them in parallel and distribute them across multiple hosts is not lost by missing enforcement of test isolation.

schubart · on Nov 5, 2022

> On the one hand, running tests in any order should produce the same result, and would in any decent test suite.

Therefore the tool should run the tests in random order, to flush out the non-decent tests. IMHO.

Jare · on Nov 5, 2022

If you do this, then the tool should be running the tests all the time, not just on new commits.

LawTalkingGuy · on Nov 7, 2022

The tests are fatally broken. It means you can't trust them to properly check new work even.

The solution is to use random ordering and print the ordering seed with each run so it can be repeated when it triggers an error. Immediately halt all new work until randomly run tests don't have problems.

This isn't as bad as it sounds, generally it's a few classes of things that cause the interference which each will fix many tests. It's unlikely that the code actually has a 2%+ density of global-variable use, for example.

saghm · on Nov 6, 2022

The Ruby `minitest` API used to have a way to disable non-deterministic test ordering, but it was intentionally named in a condescending way: https://www.rubydoc.info/gems/minitest/Minitest%2FTest.i_suc...!

I sometimes run into issues not so much due to order dependencies specifically, but due to tests running in parallel sometimes causing failures due to races. It's almost always been way more work to convert a fully serial test suite into a parallel one than it is to just write it that way from the start, so I think there's some merit in having test frameworks default to non-deterministic ordering (or parallel execution if that's feasible) with the ability to disable that and run things serially. I'm not dogmatic enough to think that fully parallel/random order tests are the right choice for every possible use case, but I think there's value in having people first run into the ordering/race issues they're introducing before deciding to run things fully serially so that they hopefully will consider the potential future work needed if they ever decide to reverse that decision.

zmj · on Nov 5, 2022

I’ll disagree with this. Every time I’ve seen that, the interference between tests was also possible between requests in production. I’d rather my test framework give me a 2% chance of noticing the bug than 0%.

regularfry · on Nov 5, 2022

What's annoying is not being able to reproduce the 2% cases so you can't fix it even when you've noticed them. Sensible test tools give you the random seed they used to order the tests so you can reproduce the sequence.

autarch · on Nov 5, 2022

TAP is better than some things, but it has some serious issues that I wrote about on my blog a while back - https://blog.urth.org/2017/01/21/tap-is-great-except-when-it...

Basically it's nearly impossible to fully parse it correctly.

hedora · on Nov 5, 2022

Is test2 a flag for TAP?

If you have to pick one or the other, then you're breaking the common flow (human debugging code before pushing) so that management can have better reports.

The right solution would be to add a environment variable or CLI parameter that told tap to produce machine readable output, preferably with a separate tool that could convert the machine readable junk to whatever TAP currently writes to stdout/stderr.

autarch · on Nov 5, 2022

Test2 is a Perl distribution that replaces a bunch of older test stuff. See https://metacpan.org/pod/Test2

But unlike TAP, it's fairly Perl-specific as opposed to just being an output format. I imagine you could adapt the ideas in it to Node but it'd be more complex than simply implement TAP in JS.

And yes, I think the idea of having different output formats makes sense. With Test2, the test _harness_ produces TAP from the underlying machine-readable format, rather than having the test code itself directly product TAP. The harness is a separate program that executes the tests.

sitkack · on Nov 5, 2022

What is this madness?

Nothing should have to be parsed. Write test results to sqlite, done. You can generate reports directly off those test databases using anything of your choice.

    your-program test-re.sqlite output.html

hedora · on Nov 5, 2022

Yeah, but sqlite doesn't scale, and SQL isn't a functional language:

https://web.archive.org/web/20110114031716/https://browserto...

jlarocco · on Nov 5, 2022

> There are a set of unit testing frameworks that do everything they can to hide test output (junit), or vomit multiple screens of binary control code emoji soup to stdout (ginkgo), or just hide the actual stdout behind an authwall in a uuid named s3 object (code build).

The test runner in VS2019 does this, too and it's incredibly frustrating. I get to see debug output about DLLs loading and unloading (almost never useful), but not the test's stdout and stderr (always useful). Brilliant. At least their command line tool does it right.

torginus · on Nov 5, 2022

I remember writing a small .NET test library for that exact problem - You could pass in a lambda with a complex condition, and it evaluated every piece of the expression separately and pretty printed what part of the condition failed.

So essentially you could write

     Assert(()=>width>0 && x + width < screenWidth)

And you would get:

      Assertion failed: 
      x is 1500
      width is 600
      screenWidth is 1920

It used Expression<T> to do the magic. Amazing debug messages. No moralizing required.

This was a huge boon for us as it was a legacy codebase and we ran tens of thousands of automated tests and it was really difficult to figure out why they failed.

paphillips · on Nov 5, 2022

Related to this, for anyone not fully up to date on recent C# features there is also the CallerArgumentExpression [1], [2] feature introduced in C# 10. While it is not a pretty printer for an expression, it does allow the full expression passed from the call site as an argument value to be captured and used within the method. This can be useful for custom assert extensions.

For example:

    public void CheckIsTrue(bool value, [CallerArgumentExpression("value")] string? expression = null)
    {
        if (!value) 
        { 
            Debug.WriteLine($"Failed: '{expression}'"); 
        }
    }

So if you call like this: CheckIsTrue(foo != bar && baz == true), when the value is false it prints "Failed: 'foo != bar && baz == true'".

[1] https://learn.microsoft.com/en-us/dotnet/csharp/language-ref... [2] https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

sasmithjr · on Nov 5, 2022

I love using Unquote[0] in F# for similar reasons; it uses F#'s code quotations. Assuming the variables have been defined with the values you state, the assertion is written as:

  test <@ width > 0 && x + width < screenWidth @>

And part of the output is:

  width > 0 && x + width < screenWidth
  500 > 0 && x + width < screenWidth
  true && x + width < screenWidth
  true && 1500 + 500 < 1920
  true && 2000 < 1920
  true && false
  false

[0]: https://github.com/SwensenSoftware/unquote

nerdponx · on Nov 5, 2022

This is what the Python test framework Pytest does, among many other similar useful and magical things. I believe that the Python developer ecosystem as a whole would be substantially less productive without it.

jmount · on Nov 5, 2022

Nice framework. Also once you allow more than one assertion, there is no need for top-level && in assertions (making them simpler tests).

torginus · on Nov 5, 2022

While there is no strict need, sometimes assertions logically belong together.

erik_seaberg · on Nov 5, 2022

You definitely want x || y, x ^^ y, and x implies y.

superjan · on Nov 5, 2022

Is this shared somewhere?

torginus · on Nov 5, 2022

Nah, it was some corporate project. But if you are interested, I could rewrite it. It would be a fun weekend project.

mattmanser · on Nov 5, 2022

It's also worth figuring out for yourself though! They're suprisingly easy to use (look up Expression<>).

It's incredibly useful once you know how, and encourages you to stop using reflection, passing magic strings as arguments, or having to use nameof().

opminion · on Nov 5, 2022

https://stackoverflow.com/a/700325 to write out the internal structure of a LambdaExpression.

If you just want the assignments then it's simpler:

Add to the Evaluate method a test for MemberExpression and then:

The variable name is:

((MemberExpression)expr).Member.Name

The value is:

Expression.Lambda(expr).Compile().DynamicInvoke()

mananaysiempre · on Nov 5, 2022

FWIW, in Python, this is pytest’s entire thing (although it ends up grovelling into bytecode to achieve it).

throwaway6977 · on Nov 5, 2022

Not quite the same, but available on nuget- 'fluentAssertions' gives you something akin to this. I've had decent success with having our juniors use it vs less verbose assertion libraries. I don't know about evaluating individual expressions in a line separately, but it does give you clean syntax and similar error messages that are very readable-

"GetProductPage().GetProductPrice().Should().Be().LessThan(...)"

beezlewax · on Nov 5, 2022

Is there something like this available for javascript?

xeromal · on Nov 5, 2022

I'm not aware of any related to testing, but perhaps you could use something like this in tandem with some tests to pull it off?

https://github.com/gvergnaud/ts-pattern

tobyhinloopen · on Nov 5, 2022

Not really, no.

I like to use Jest’s toMatchObject to combine multiple assertions in a single assertion. If the assertion fails, the full object on both sides is shown in logs. You can easily debug tests that way.

The only way to make it even possible is to do some eval magic or to use a pre-processor like babel or a typescript compiler plugin.

But if you find something, Lemme know.

fiddlerwoaroof · on Nov 5, 2022

Well, Function.toString() should print the code of a lambda in JavaScript. So I think you could do it without a pre-processor: use Babel as a library to parse the body of the function; run each sub-expression separately and display the results.

tobyhinloopen · on Nov 6, 2022

Can you actually do that, since you don’t have the context the function is running in?

You can inspect the source, but functions don’t run in isolation so you cannot extract the source and run it somewhere else.

Simple example:

    var x = 0;
    var y = function() { x++; };

You can’t reliably execute function y with just y’s function body.

Maybe if you replace the function with another by injecting code into a copy of the source, but I’m not sure if that’s even possible.

fiddlerwoaroof · on Nov 6, 2022

You can just document a constraint that these function must use the purely functional subset of JavaScript, which is enough for the sorts of assertions I typically write. Alternatively, you could constrain it to the subset that has no unbound variables or side effects.

dalmo3 · on Nov 5, 2022

It might not be the general solution you're looking for, but idiomatic Jest does the job for me.

   expect(width).toBeGreaterThan(0); expect(x+width).toBeLessThan(screenWidth);

When an assertion fails it will tell you: "Expected: X; Received: Y"

eurasiantiger · on Nov 5, 2022

You could parse the failing expression using babel and get an AST back.

fomine3 · on Nov 7, 2022

power-assert

magic_hamster · on Nov 5, 2022

> How is that even possible in the first place? The entire job of an assertion is to wave a flag saying "here! condition failed!".

I envy you for never having seen tests atrocious enough where this is not only possible, but the common case.

Depending on language, framework and obviously usage, assertions might not be as informative as providing the basic functionality of failing the test - and that's it.

Now imagine this barebones use of assertions in tests which are entirely too long, not isolating the test cases properly, or even completely irrelevant to what's (supposedly) being tested!

If that's not enough, imagine this nightmare failing not after it has been written, but, let's say 18 months later, while being part of a massive test suite running for a while. All you have is a the name of the test that failed, you look into it to find a 630 lines long test "case" with 22 nondescript assertions along the way. You might know which line failed the test, but not always. And of course debugging the test function line by line doesn't work because the test depends on intricate timing for some reason. The person who wrote this might not be around and now this is your dragon to slay.

I think I should stop here before triggering myself any further. Therapy is expensive.

FartyMcFarter · on Nov 5, 2022

> You might know which line failed the test, but not always.

If that's the case, the test framework itself is severely flawed and needs fixing even more than the tests do.

There's no excuse to have an assert function that doesn't print out the location of the failure.

magic_hamster · on Nov 5, 2022

Even if the framework is fine, you can see something like an elaborate if-else tree, or even a try-catch block, and after it's all done, there's a condition check with `fail()`. So the point of failure could be manually detached from the actual point of failure.

Granted, this is not the way to do things. But it happens anyway.

mynameisvlad · on Nov 5, 2022

I mean in your example it’s someone choosing not to use asserts. Which is a problem, don’t get me wrong, but it’s not the problem being talked about here.

The comment thread is about “Assertion Roulette” — having so many assertions you don’t know which went off. Which really seems like a test framework issue more than a test issue.

ramraj07 · on Nov 5, 2022

So because some idiot somewhere wrote a 100 assertion unit test we should ban anyone from writing even 2 assertions in one test?

magic_hamster · on Nov 5, 2022

Not at all. It makes sense in some tests. I addressed the part asking how it's even possible to not know what happened.

As for multiple asserts, that is really meaningless. The test case should test one thing. If it requires several asserts that's okay. But having a very long test function with a lot of assertions, is strongly indicating that you're testing more than one thing, and when the test fails it will be harder to know what actually happened.

Joeboy · on Nov 5, 2022

I guess you might be talking about a different language / environment than I'm used to, but even in the 100 assertion test case you get useful tracebacks in python. Testing lots of things at the same time means strictly speaking you're writing an integration test rather than a unit test, but still I don't see how it's a bad test. It's easy and stops buggy PRs going into production.

The test failures I see that are actually hard to debug are ones where the failures are difficult to reproduce due to random input, tests running in parallel and sharing the same filesystem etc. I don't think I've ever not known what assert was failing (although I guess in theory you could make that happen by catching AssertionError).

RHSeeger · on Nov 5, 2022

> Testing lots of things at the same time means strictly speaking you're writing an integration test rather than a unit test

There's nothing wrong with integration tests, but they're not unit tests. It's fine to have both, but the requirements for a good unit test and those for a good integration test diverge. The title of this post, at least, was specific to unit tests.

bluGill · on Nov 5, 2022

A unit tests tests one unit. And integration tests covers more than one unit. I think everyone agrees with that, but nobody has defined unit.

The longer I program the more I am convinced that the larger your unit the better. The unit tests is a statement that you will never refactor across this line, and that eliminates a lot of flexibility that I want.

It turns out that debugging failed integration tests is easy,the bug is in the last thing you changed. Sure the test covers hundreds of lines, but you only changed one.

__alexs · on Nov 5, 2022

I recently went to the effort of trying to work out where the term unit test came from in some desperate effort to find what a unit was meant to be.

After much googling and buying or ancient text books I hit a dead end. At this point I think "unit" is just noise that confuses people into making distinctions that don't exist.

switchbak · on Nov 5, 2022

As I recall the TDD mailing list has some background on the use of the word "unit", it goes WAY back, I believe it goes back to the mainframe/ punch card era. Regardless, I think it roughly translates to C's notion of the unit of compilation.

Which is obviously not what people really mean these days, but the phrase stuck. The early Xp'ers even found it an issue back then.

For a while people tried to push the term "micro tests", but that didn't really take off.

I agree with Gerard Mezaros and Martin Fowler and typically follow their (very mainstream) definitions on this stuff. Integration and functional testing have their own ambiguities too, it's definitely a frustrating situation to not have solidly defined foundational terms.

Izkata · on Nov 5, 2022

IIRC the "unit" in "unit test" was meant to mean "semantic unit" ("the access module", for example, should be distinct with a well-defined interface that all the tests go through), but very quickly turned into "syntactic units" ("a single function", for example, where the "well-defined interface" ends up just being function arguments/return value) because most people didn't understand what the original proponents meant.

branko_d · on Nov 6, 2022

I have a Web API which calls a DB API which calls a stored procedure which executes several SQL statements.

The Web API has a well-defined and documented interface. Is it a “unit”?

Izkata · on Nov 6, 2022

In the semantic sense? No, or at least probably not if that's where you jump to. You're still thinking in terms of syntax.

Think in terms of business rules, not the code structure: What's one thing your API does?

tremguy · on Nov 7, 2022

See "Software Engineering at Google" https://abseil.io/resources/swe-book/html/ch11.html, the definition of a "small test".

This is roughly my definition of unit test: "tests run in a single process"

JackFr · on Nov 5, 2022

> It turns out that debugging failed integration tests is easy,the bug is in the last thing you changed. Sure the test covers hundreds of lines, but you only changed one.

That’s not true.

A correct change might expose an existing bug which hadn’t been tested or expose flaky behavior which existed but hadn’t been exercised. In both cases the solution is not to revert the correct change, but to fix the buggy behavior.

sdenton4 · on Nov 5, 2022

Michael Pollan was right: Write tests, not too many, mostly integration.

teddyh · on Nov 5, 2022

Contrary viewpoint: Integrated Tests Are A Scam (J.B. Rainsberger):

https://www.youtube.com/watch?v=fhFa4tkFUFw

sdenton4 · on Nov 5, 2022

Watched a bit of this... It's typical test-driven zealotry; the main criticism of integration tests seems to be that they don't force your hand in system design in the way that unit tests do? Which seems very silly, but then, I'm not a person who goes to conferences about testing philosophy.

switchbak · on Nov 5, 2022

Did you miss his follow-up? "Integration tests are a scam is a scam". For real. I like J.B., but I think he muddies the water too much and overall understanding suffers.

RHSeeger · on Nov 5, 2022

> The unit tests is a statement that you will never refactor across this line, and that eliminates a lot of flexibility that I want.

I certainly don't see it as that. I see it as "this is the smallest thing I _can_ test usefully". Mind you, those do tend to correlate, but they're not the same thing.

dmitriid · on Nov 5, 2022

> this is the smallest thing I _can_ test usefully

Then you're testing useless things.

Usefulness is when different parts of a program work together as a coherent whole. Testing DB access layer and service layer separately (as units are often defined) has no meaning (but is often enforced).

Queue in memes about "unit tests with 100% code coverage, no integration tests" https://mobile.twitter.com/thepracticaldev/status/6876720861...

RHSeeger · on Nov 5, 2022

>> this is the smallest thing I _can_ test usefully

> Then you're testing useless things.

We'll have to agree to disagree then.

> Testing DB access layer and service layer separately (as units are often defined)

Not at all. For me, a unit is a small part of a layer; one method. Testing the various parts in one system/layer is another type of test. Testing that different systems work together is yet another.

I tend to think in terms of the following

- Unit test = my code works

- Functional test = my design works

- Integration test = my code is using your 3rd party stuff correctly (databases, etc)

- Factory Acceptance Test = my system works

- Site Acceptance Test = your code sucks, this totally isn't what I asked for!?!

The "my code works" part is the smallest piece possible. Think "the sorting function" of a library that can return it's results sorted in a specific order.

dmitriid · on Nov 5, 2022

And the only actual useful tests are functional (depending on how you write them) and above.

If those fail, it means that neither your design nor your code works.

The absolute vast majority of unit tests are meaningless because you just repeat them again in the higher level tests.

RHSeeger · on Nov 6, 2022

That seems like a silly opinion to me. I use unit tests to make sure that individual units work like I expect them to. And I use them to test edge cases that can be tested separately from their caller. If I had to test all the use cases for each function, all combined together, there number of tests would grow by the multiplication of the partitions of each one, N x M x O x P, ... rather than the sum, plus a much smaller set of tests for how they work together (N + M + O + P + N_M + M_O + O_P, etc). It's much simpler to thoroughly test each unit. Then test how they work together.

dmitriid · on Nov 6, 2022

> If I had to test all the use cases for each function, all combined together, there number of tests would grow by the multiplication of the partitions of each one

Why would they? Do these edge cases not appear when the caller is invoked? Do you not test these edge cases and the behavior when the caller is invoked?

As an example: you tested that your db layer doesn't fail when getting certain data and returns response X (or throws exception Y). But your service layer has no idea what to do with this, and so simply fails or falls back to some generic handler.

Does this represent how the app should behave? No. You have to write a functional or an integration test for that exact same data to test that the response is correct. So why write the same thing twice (or more)?

You can see this with Twitter: the backend always returns a proper error description for any situation (e.g. "File too large", or "Video aspect ratio is incorrect"). However, all you see is "Something went wrong, try again later".

> It's much simpler to thoroughly test each unit. Then test how they work together.

Me, telling you: test how they work together, unit tests are usually useless

You: no, this increases the number of tests. Instead, you have to... write at least double the amount of tests: first for the units, and then test the exact same scenarios for the combination of units.

----

Edit: what I'm writing is especially true for typical microservices. It's harder for monoliths, GUI apps etc. But even there: if you write a test for a unit, but then need to write the exact same test for the exact same scenarios to test a combination of units, then those unit tests are useless.

RHSeeger · on Nov 6, 2022

Unit one - returns a useful test for each type of error condition that can occur (N). Test that, for each type of error condition that can occur. One test for each error condition.

Unit two - calls unit one - test that, if unit one returns an error, it is treated appropriately. One test, covers all error conditions because they're all returned the same way from Unit one.

Unit three - same idea as unit one

If you were to test the behavior of unit one _through_ units 2 and 3, you'd need 2*N tests. If you were to test the behavior of unit one separately, you'd need N+2 tests.

You're missing the point that you don't need to test "the exact same scenarios for the combination of units", because the partitions of <inputs to outputs> is not the same as the partitions for <outputs>. And for each unit, you only need to test how it handles the partitions of <outputs> for the items, it calls; not that of <inputs to outputs>.

dmitriid · on Nov 6, 2022

> If you were to test the behavior of unit one _through_ units 2 and 3, you'd need 2*N tests.

There are only two possible responses to that:

1. No, there are not 2*N tests because unit 3 does not cover, or need, all of the behavior and cases that flow through those units. Then unit testing unneeded behaviors is unnecessary.

2. Unit 3 actually goes through all those 2*N cases. So, by not testing them at the unit 3 level you have no idea that the system behaves as needed. Literally this https://twitter.com/ThePracticalDev/status/68767208615275315...

> You're missing the point that you don't need to test "the exact same scenarios for the combination of units", because the partitions of <inputs to outputs>

This makes no sense at all. Yes, you've tested those "inputs/outputs" in isolation. Now, what tests the flow of data? That unit 1 outputs data required by unit 2? That unit 3 outputs data that is correctly propagated by unit 2 back to unit 1?

Once you start testing the actual flow... all your unit tests are immediately entirely unnecessary because you need to test all the same cases, and edge cases to ensure that everything fits together correctly.

So, where I would write a single functional test (and/or, hopefully, an integration test) that shows me how my system actually behaves, you will have multiple tests for each unit, and on top of that you will still need a functional test, at least, for the same scenarios.

RHSeeger · on Nov 6, 2022

> Once you start testing the actual flow... all your unit tests are immediately entirely unnecessary because you need to test all the same cases, and edge cases to ensure that everything fits together correctly.

You don't, but it's clear that I am unable to explain why to you. I apologize for not being better able to express what I mean.

dmitriid · on Nov 7, 2022

> You don't

If you don't, then you you have no idea if your units fit together properly :)

I've been bitten by this when developing microservices. And as I said in an edit above, it becomes less clear what to test in more monolithic apps and in GUIs, but in general the idea still holds.

Imagine a typical simple microservice. It will have many units working together:

- the controller that accepts an HTTP request

- the service layer that orchestrates data retrieved from various sources

- the wrappers for various external services that let you get data with a single method call

- a db wrapper that also lets you get necessary data with one method call

So you write extensive unit tests for your DB wrapper. You think of and test every single edge case you can think of: invalid calls, incomplete data etc.

Then you write extensive unit tests for your service layer. You think of and test every single edge case you can think of: invalid calls, external services returning invalid data etc.

Then you write extensive unit tests for your controller. Repeat above.

So now you have three layers of extensive tests, and that's just unit tests.

You'll find that most (if not all) of those are unnecessary for one simple reason: you never tested how they actually behave. That is, when the microservice is actually invoked with an actual HTTP request.

And this is where it turns out that:

- those edge cases you so thoroughly tested for the DB layer? Unnecessary because invalid and incomplete data is actually handled at the controller layer, or service layer

- or that errors raised or returned by service wrappers, or the db layer either don't get propagated up, or are handled by a generic catch all so that the call returns a nonsensical stuff like `HTTP 200: {error: "Server error"}`

- or that those edge cases actually exist, but since you tested them in isolation, and you didn't test the whole flow, the service just fails with a HTTP 500 error on invalid invocation

Or, instead, you can just write a single suite of functional tests that test all of that for the actual controller<->service<->wrappers flow covering the exact same scenarios.

jcelerier · on Nov 5, 2022

Why ? In c++ if I have a test case with

    REQUIRES(v1::foo(0) == v2::foo(0));
    REQUIRES(v1::foo(1) == v2::foo(1));

And the second assert fails the error message will tell me exactly that, the line, and the value of both function calls if they are printable. What more do you want to know "what actually happened"?

jjav · on Nov 5, 2022

> when the test fails it will be harder to know what actually happened

This should not ever be possible in any semi-sane test environment.

One could in theory write a single test function with thousands of asserts for all kinds of conditions and it still should be 100% obvious which one failed when something fails. Not that I'd suggest going to that extreme either, but it illustrates that it'll work fine.

ambrozk · on Nov 6, 2022

> when the test fails it will be harder to know what actually happened.

Yeah, and if you write one assertion at a time, it will be harder to write the tests. Decreasing #assertions/test decreases the speed of test debugging while increasing the time spent writing non-production code. It's a tradeoff. Declaring that the optimal number of assertions per test is 1 completely ignores the reality of this tradeoff.

magic_hamster · on Nov 6, 2022

That's true and it boils down to what's acceptable in your team (or just you). I worked in some places where coverage was the only metric and in places where every single function had to have all cases covered, and testing took longer than writing the code.

As for me, I tend to write reasonable tests and cover several cases that guard the intended behavior of each function (if someone decides the function should behave differently in the future, a test should fail). One emerging pattern is that sometimes during testing I realize I need to refactor something, which might have been lost on me if I skimmed on tests. It's both a sanity check and a guardrail for future readers.

mjul · on Nov 5, 2022

Just use you best judgement and you will be fine.

Patterns are context specific advice about a solution and its trade-offs, not hard rules for every situation.

Notice how pattern books will often state the relevant Context, the Problem itself, the Forces influencing it, and a Solution.

(This formulaic approach is also why pattern books tend to be a dry read).

robertlagrant · on Nov 5, 2022

Devious commenter was describing a (normal) scenario where a unit test is not precise. No need to follow up with an aggressive "so what you're saying is".

robertlagrant · on Nov 6, 2022

Previous* hah.

simplotek · on Nov 5, 2022

> So because some idiot somewhere wrote a 100 assertion unit test we should ban anyone from writing even 2 assertions in one test?

You're falling prey to slippery slope fallacy, which at best is specious reasoning.

The rationale is easy to understand. Running 100 assertions in a single test renders tests unusable. Running 10 assertions suffers from the same problem. Test sets are user-friendly if they dump a single specific error message for a single specific failed assertion, thus allowing developers to quickly pinpoint root causes by simply glancing through the test logs.

Arguing whether two or three or five assertions should be banned misses the whole point and completely ignores the root cause that led to this guideline.

BlargMcLarg · on Nov 5, 2022

>Test sets are user-friendly if they dump a single specific error message for a single specific failed assertion, thus allowing developers to quickly pinpoint root causes by simply glancing through the test logs.

As if this actually happens in practice, regardless of multiple or single asserts. Anything that isn't non-trivial will at most tell you what doesn't work, but it won't tell you why it doesn't work. Maybe allowing an educated guess when multiple tests fail to function.

You want test sets to be user friendly? Start at taking down all this dogmatism and listening to the people as to why they dislike writing tests. We're pushing 'guidelines' (really more like rules) while individuals think to themselves 'F this, Jake's going to complain about something trivial again, and we know these tests do jack-all because our code is a mess and doing anything beyond this simple algorithm is a hell in a handbasket".

These discussions are beyond useless when all people do is talk while doing zero to actually tackle the issues of the majority not willing to write tests. "Laziness" is a cop-out.

marcosdumay · on Nov 5, 2022

> Depending on language, framework and obviously usage, assertions might not be as informative as providing the basic functionality of failing the test - and that's it.

Well, avoiding ecosystems where people act dumb is a sure way to improve one's life. For a start, you won't need to do stupid things in reaction of your tools.

Yes, it's not always possible. But the practices you create for surviving it are part of the dumb ecosystem survival kit, not part of any best practices BOK.

ethbr0 · on Nov 5, 2022

> ... where this is not only possible, but the common case.

Couldn't not read that in Peter Sellers' voice https://m.youtube.com/watch?v=2yfXgu37iyI&t=2m36s

> ... to find a 630 lines long test "case" with 22 nondescript assertions along the way.

This is where tech team managers are abrogating their responsibility and job.

It's the job of the organization to set policy standards to outlaw things like this.

It's the job of the developer to cut as many corners of those policies as possible to ship code ASAP.

And it's the job of a tech team manager to set up a detailed but efficient process (code review sign offs!) that paper over the gap between the two in a sane way.

... none of which helps immediately with a legacy codebase that's @$&@'d, though.

RHSeeger · on Nov 5, 2022

> It's the job of the developer to cut as many corners of those policies as possible to ship code ASAP.

I can't tell if this is supposed to be humor, or if you actually believe it. It's certainly not my job as a developer to ship worse code so that I can release it ASAP. Rather, it's my job to push back against ASAP where it conflicts with writing better code.

ethbr0 · on Nov 5, 2022

You are not most developers.

And furthermore, you are not the developer most non-tech companies want.

Those sorts of companies want to lock the door to the development section, occasionally slide policy from memos under the door, and get software projects delivered on time, without wasting any more thought on how the sausage gets made.

mrits · on Nov 5, 2022

22 assertions in a test is a lot better than 22 separate tests that fail for the same reason.

Jabbles · on Nov 5, 2022

With 22 separate tests you have the possibility of knowing that only a subset of them fail. Knowing which fail and which pass may help you debug.

In Go, in general, tests fail and continue, rather than causing the test to stop early, so you can tell which of those 22 checks failed. Other languages may have the option to do something similar.

https://pkg.go.dev/testing#T.Fail

bluGill · on Nov 5, 2022

They are the same. I don't care, as I'll fix them one at a time, and if the fix happens to fix more than one great.

mrits · on Nov 5, 2022

Fixing one at a time like that is a good way to get into an endless cycle. Zoom out a bit and make a plan before you start coding.

bluGill · on Nov 5, 2022

That should be understood wothout saying. Even with a plan I'm going to fix one at a time.

Kuinox · on Nov 5, 2022

Tech debt can often be fixed by more tech debt instead of fixing the root problem.

torginus · on Nov 5, 2022

xUnit is terrible. It has a horrible culture, the author seems to have a god complex. It is overly complex and opinionated in the worst way possible.

Many times I searched for 'how do I do something with xUnit' and found a github issue with people struggling with the same thing, and the author flat out refusing to incorporate the feature as it was against his principles.

Other times I found that I needed to do was override some core xUnit class so it would do the thing I wanted it to do - sounds complex, all right lets see the docs. Oh there are none, 'just read the source' according to the author.

Another thing that bit us in the ass is they refused to support .NET Standard, a common subset of .NET Framework and Core, making migration hell.

HereBeBeasties · on Nov 5, 2022

NUnit isn't much better. You'd think that they would have good test coverage and therefore high confidence to make changes, especially to fix actual bugs, but I gave up trying to get patches in because the core devs seem so afraid of breaking anything, even when it's an obviously-isolated private, hidden bug fix or performance improvement.

"We can't land this tiny fix because we have a release planned within three months" sort of thing.

torginus · on Nov 5, 2022

Tbh, one of the differences between xUnit and nUnit, is the way generated test cases work, like specifying test cases in an xml file. nUnit has the TestCaseSource attribute for this, while xUnit has the Fact attribute.

One of the key differences, is there are no test cases generated, nUnit tests just wont run, while xUnit will throw.

Since it's completely legal and sensible for a certain kind of test to have no entries in an xml, we needed to hack around this quirk. When I (and countless others) have mentioned this on the xunit github, the author berated us that how dare we request this.

So nUnit might be buggy, but xUnit is fundamentally unfixable.

zdragnar · on Nov 5, 2022

How ironic, a unit test framework is so fragile and poorly covered by tests they are not able to determine if a change would cause any regressions.

codenesium · on Nov 5, 2022

I'm curious what things you're trying to do that requires you to overload xunit classes? We use xunit for everything and haven't found any gaps so far.

danuker · on Nov 5, 2022

I use the following pattern for testing regexes:

    expected_positive = [
        'abc',
        'def', 
        ...]
    for text in expected_positive:
        self.assertTrue(matcher(text), f"Failed: {text}")

Before I added the assertion error message, `f"Failed: {text}"`, it was quite difficult to tell WHICH example failed.

profunctor · on Nov 5, 2022

If you’re using pytest you just paramaterize the tests and it tells you the exact failing case. Seems to be a basic feature I would be surprised to know doesn’t exist across almost all commonly used frameworks.

danuker · on Nov 5, 2022

We are using unittest.

But thanks for bringing it up, it seems it also has "subtest" support, which might be easier than interpolating the error message in some cases:

https://docs.python.org/3/library/unittest.html#distinguishi...

masklinn · on Nov 5, 2022

> We are using unittest.

Unless you need the library style in order to drive it, switch to pytest. Seriously.

- assert rewriting is stellar, so much more comfortable than having to find the right assert* method, and tell people they're using the wrong one in reviews

- runner is a lot more practical and flexible: nodeids, marks, -k, --lf, --sw, ...

- extensions further add flexibility e.g. timeouts, maxfail, xdist (though it has a few drawbacks)

- no need for classes (you can have them, mind, but if functions are sufficient, you can use functions)

- fixtures > setup/teardown

- parameterized tests

jjgreen · on Nov 5, 2022

You mean "subTest", I switched to pytest just to get away from the camel (and it's in the standard library!)

Dunedan · on Nov 5, 2022

There is also `parameterized` [1] if you want to do it by using decorators.

[1]: https://pypi.org/project/parameterized/

ericvsmith · on Nov 5, 2022

Came here to mention unittest.subtest. It's for exactly this case.

olex · on Nov 5, 2022

This seems like a test design issue to me. Best practice is to avoid for-each loops with assertions within tests - using parametrized tests and feeding the looped values as input is almost always a better option. Figuring out which one failed and why is one advantage it gives you in comparison. Another one is that all inputs will always be tested - your example stops on the first one that fails, and does not evaluate the others after that.

dd82 · on Nov 5, 2022

not really. one thing this is useful for is extracting out various attributes in an object when you really don't want to compare the entire thing. Or comparing dict attributes, and figuring which one is the incorrect one.

for example,

    expected_results = {...}
    actual_obj = some_intance.method_call(...)

    for key, val in expected_results.items():
        assert getattr(actual_obj, key) == val, f"Mismatch for {key} attribute"

You could shift this off to a parametrized test, but that means you're making N more calls to the method being tested, which can have its own issues with cost of test setup and teardown. With this method, you see which key breaks, and re-run after fixing.

olex · on Nov 5, 2022

Ok, in this case a parametrized test is not the best approach, I agree. But I would still want to avoid the for-each and "failing fast". One approach would be to gather the required attributes in an array or a struct of some sort, and then do a single assert comparison with an expected value, showing all the differences at once. However, this requires the assertion framework to be able to make such a comparison and return a nicely readable error message, ideally with a diff.

dd82 · on Nov 5, 2022

Right, and not many actually do. with python and pytest, you could leverage difflib, but that's an additional thing that adds unnecessary complexity. My approach is simple enough, good enough, and doesn't require additional fudging around with the basics of the language's test libs.

also,

>your example stops on the first one that fails, and does not evaluate the others after that.

I would argue this is desirable behavior. there are soft checks, ie, https://pypi.org/project/pytest-check/, that basically replace assertions as raised exceptions and do your approach. But I do want my tests to raise errors at the point of failure when a change occurs. If there's alot of changes occurring, that raises larger questions of "why" and "is the way we're executing this change a good one"?

danuker · on Nov 5, 2022

> your example stops on the first one that fails, and does not evaluate the others after that.

I don't think this is a big problem; trying to focus on multiple examples at once is difficult.

It might be a problem if tests are slow and you are forced to work on all of them at once.

But in that case I'd try to make the tests faster (getting rid of network requests, disk/DB access by faking them away or hoisting to the caller).

olex · on Nov 5, 2022

Hm. I think my main issue there is not the speed, but rather seeing the whole picture at once. You mentioned you use this pattern to test regular expressions; say you modify the regexp in question with some new feature requirement, and now the very first of a dozen test inputs fails. You fix it, but then each one of the following keeps failing, and you can only find an elegant solution that works for all of them after seeing all the failures, having ran the test and modified the code a dozen times. Wouldn't it be nicer to see all fails right away and be able to find a solution to all of them, instead of fixing the inputs one-by-one?

danuker · on Nov 5, 2022

In my experience, from doing some TDD Katas[0] and timing myself, I found coding slower and more difficult when focusing on multiple examples at once.

I usually even comment out all the failing tests but the first one, after translating a bunch of specifications into tests, so I see the "green" when an example starts working.

Maybe it would be easier to grok multiple regex examples than algorithmic ones, but at least for myself, I am skeptical, and I prefer taking them one at a time.

[0] - https://kata-log.rocks/tdd

tsimionescu · on Nov 5, 2022

In my own experience, this has often been a good way of going in circles, where I end up undoing and redoing changes as fixing one thing breaks another, until I take a step back to find the proper algorithm by considering multiple inputs.

Of course, ymmv depending on how good your initial intuition is, and how tricky the problem is.

johtso · on Nov 5, 2022

Again pytest makes things so much nicer in this regard. Having to comment things out sucks.

With pytest you can use the -x flag to stop after the first test failure.

Even better you can use that in combination with -lf to only run the last failed test.

masklinn · on Nov 5, 2022

> With pytest you can use the -x flag to stop after the first test failure.

> Even better you can use that in combination with -lf to only run the last failed test.

Fwiw `--sw` is much better for that specific use-case.

`--lf` is more useful to run the entire test suite, then re-run just the failed tests (of the entire suite). IIRC it can have some odd interactions with `-x` or `--maxfail`, because the strange things happen to the cached "selected set".

Though it may also be because I use xdist a fair bit, and the interaction of xdist with early interruptions (x, maxfail, ...) seems less than perfect.

johtso · on Nov 7, 2022

Oh nice, didn't know about that flag!

Another option is to use a custom mark on the test you want to run, and then do something like "pytest -v -m onlyrunthis"

lancebeet · on Nov 5, 2022

That looks like a case where I would use a parameterized test rather than a for loop inside the test.

dd82 · on Nov 5, 2022

this has downsides if you're comparing attributes with a method result and checking whether said attrs match what you expect. Either you run each test N times for N attr comparisons, accepting the cost of setup/teardown, or do a loop and fire off an assert error with text on which comparison failed.

Since you already have the object right there, why not do the latter approach?

lancebeet · on Nov 5, 2022

If the setup/teardown is expensive I would do it in reusable fixtures. The reason I wouldn't choose the latter approach is that it would usually be less convenient in the long run. You'd need to replace your asserts with expects to avoid it throwing on the first error (if this isn't what you want), you'll often need to manually add data to the assertion (as GP did) that you would otherwise get for free, and you'll need to look at the assertion error rather than the test case to know what actually failed. This can be quite inconvenient if you e.g. export your test results in a CI/CD pipeline.

dd82 · on Nov 5, 2022

normally in a CI/CD pipeline, you'll see what asserts failed in the log output. Github Actions with pytest shows the context of the failed asserts in the log output. TBH, thought this was standard behavior, do you have experience with a CI pipeline that differs?

All the other points you make as negatives are all positives for me. Biggest thing is, if you're making this change that alters things so drastically, is that really a good approach.

Also, fixtures aren't magic. If you can't scope the fixture to module or session, that means by default it runs in function scope, which would be the same thing as having expensive setup/teardown. And untangling fixtures can be a bgger PITA than untangling unexpected circular imports

kazinator · on Nov 5, 2022

Think about the personality of someone who is so dissatisfied with the lack of verbosity in his test suites, that he needs a side project of writing a book about unit testing. Of course they will advocate testing one assertion per function, and make up nonsense to justify their recommendation.

Secretly, they would have the reader write 32 functions to separately test every bit of a uint32 calculation, only refraining from that advice due to the nagging suspicion that it might be loudly ridiculed.

Phrodo_00 · on Nov 5, 2022

I think the advantage of having an assertion per test is that it makes sure that all of your assertions are executed. In a lot of test frameworks (that use exceptions for assertions for example) the first assertion fail will stop the test.

That doesn't mean you have to duplicate code, you can deal with it in other ways. In Junit I like to use @TestFctory [1] where I'll write most of the test in the Factory and then each assertion will be a Test the factory creates, and since they're lambdas they have access to the TestFactoy closure.

[1] https://junit.org/junit5/docs/current/user-guide/#writing-te...

edgyquant · on Nov 5, 2022

This is a feature. I want it to fail fast so I can see the first issue not crawl the stack for error logs.

un_montagnard · on Nov 5, 2022

I personally prefer seeing everything that I broke, not just the first one

Izkata · on Nov 5, 2022

I vaguely remember a test framework I saw a decade+ ago that had both "assert*" to fail immediately and something else ("expect*" maybe?) to check and continue.

411111111111111 · on Nov 5, 2022

    it('fails', async () => {
      expect(await somePromise()).toBe(undefined)
      expect(await someOtherPromise()).toBeDefined()
    })

No chance figuring out where it failed, it's likely just gonna run into a test suite timeout with no line reference or anything.

Macha · on Nov 5, 2022

I've noticed the opposite in a Java codebase I work in. Tests where the test is assertEquals(toJson(someObject), giantJsonBlobFromADifferentFile). Of course the test runner has no idea about formatting strings that happen to be json, so I end up having to copy these out into an editor, formatting them and eyeballing the difference, or for even larger ones having to save them out to files and diff them. And of course most of the fields in the mock aren't relevant to the class under test, so I'd trade them out for 5-6 targeted asserts for the relevant fields happily.

The problem is, since it's a legacy codebase, there's many fields which are only tested incidentally by this behaviour, by tests that actually aren't intending to test that functionality.

TeMPOraL · on Nov 5, 2022

I had similar case recently, in C++. I ended up spending a few hours writing a simple JSON differ - a bit of code that would parse two strings into a DOM object graph using a rapidjson, and then walk down them simultaneously - basically, I implemented operator== which, instead of terminating early, recorded every mismatch.

Then, I packaged it into a Google Test matcher, and from now on, the problem you describe is gone. I write:

  EXPECT_THAT(someObject, IsEqAsJSON(someBlobFromADifferentFile));

and if it fails, I get output like this:

  Expected someObject to be structurally equivalent to someBlobFromADifferentFile; it is not;
   - #/object/key - missing in expected, found in actual
   - #/object/key2 - expected string, actual is integer
   - #/object/key3/array1 - array lengths differ; expected: 3, actual: 42
   - #/object/key4/array1/0/key3 - expected "foo" [string], actual "bar" [string]

Etc.

It was a rather simple exercise, and the payoff is immense. I think it's really important for programmers to learn to help themselves. If there's something that annoys you repeatedly, you owe it to yourself and others to fix it.

tsss · on Nov 5, 2022

> I think it's really important for programmers to learn to help themselves. If there's something that annoys you repeatedly, you owe it to yourself and others to fix it.

It's a cultural problem. _I_ can do that, but my colleagues will just continue to write minimum effort tests against huge json files or database dumps where you have no idea why something failed and why there are a bunch of assertions against undocumented magic numbers in the first place. It's like you're fighting against a hurricane with a leaf blower. A single person can only do so much. I end up looking bad in the daily standup because I take longer to work on my tickets but the code quality doesn't even improve in a measurable way.

david_allison · on Nov 5, 2022

This can be improved, it'd be worth Googling for a better solution than what you have.

https://github.com/skyscreamer/JSONassert seems decent.

but it can be done from scratch in a few hours (I'd recommend this if you have 'standardized' fields which you may want to ignore):

Move to a matcher library for assertions (Hamcrest is decent), and abstract `toJSON` into the a matcher, rather on the input.

This would change the assertion from:

`assertEquals(toJson(someObject), giantJsonBlobFromADifferentFile)`

to:

`assertThat(someObject, jsonEqual(giantJsonBlobFromADifferentFile))`

The difference here is subtle: it allows `jsonEqual` to control the formatting of the test failure output, so on a failure you can:

* convert both of the strings back to JSON

* perform a diff, and provide the diff in the test output.

Decent blog post on the topic: https://veskoiliev.com/use-custom-hamcrest-matchers-to-level...

Macha · on Nov 5, 2022

So the same team does actually use JSONAssert in some places, but it can still provide unhelpful comparisons.

I'd actually rather people just assert on the fields they need, but it's a larger team than I can push that on.

david_allison · on Nov 6, 2022

From a "legacy code" perspective, you're better off picking an 'easy win' (Hamcrest). Initially, you're not going to convince a team to change their testing habits if it causes them pain. Your goal is to push a testing methodology which moves closer to the 'ideal' which saves them time.

Hamcrest is a drop-in replacement for `assertEquals`, and provides obvious benefits. Politically, it's easy to convince developers onboard once you show them:

* You just need to change the syntax of an assertion - no thought required

* You (Macha) will take responsibility for improving the formatting of the output, and developers have someone to reach out to to improve their assertions.

From this: you'll get a very small subset of missionaries who will understand the direction that you're pushing the test code in, and will support your efforts (by writing their own matchers and evangelising).

The larger subset of the developer population won't particularly care, but will see the improved output from what you're proposing, and will realise that it's a single line of code to change to reap the benefits.

EDIT: I've added a lint rule into a codebase to guide developers away from `assertEquals()`. Obviously this could backfire, and don't burn your political capital on this issue.

0x445442 · on Nov 5, 2022

That test seems to be testing whether or not the library used to deserialize json works. I don’t think that’s valid unless the code base you are working on is Gson or Jackson or the like.

Assuming that’s not the case and you’re interested in the state of two object graphs then you just compare those, not the json string they deserialize to.

seadan83 · on Nov 5, 2022

As an aside, FWIW there are libraries to do JSON and XML comparisons in assertions. Output is good and you can control whether ordering is important.

charrondev · on Nov 5, 2022

The majority of the testing I’ve written have been jest tests and PHPUnit tests, and note PHPUnit is my favourite. It’s easy to built up custom assertions, and all of the in built assertions have the ability to provide an additional failure message during a failure.

Assertions throw an exception and the test runner catches them along with any exceptions thrown by the code in test, marks the test as a failure, and reports the given error message and a full stack trace.

shagie · on Nov 5, 2022

With Junit, you could use assertAll https://junit.org/junit5/docs/5.0.0-M2/api/org/junit/jupiter...

Aside from that, you could do things like

   int failures = 0;
   failures += someAssertion ? 0 : 1;
   failures += anotherAssertion ? 0 : 1;
   failures += yetAnotherAssertion ? 0 : 1;
   assertZero(failures);

With the appropriate logging of each assertion in there.

Consider the situation of "I've got an object and I want to make sure it comes out in JSON correctly"

The "one assertion" way of doing it is to assertEqual the entire json blob to some predefined string. The test fails, you know it broke, but you don't know where.

The multiple assertions approach would tell you where in there it broke and the test fails.

The point is much more one of "test one thing in a test" but testing one thing can have multiple assertions or components to it.

You don't need to have testFirstNameCorrect() and testLastNameCorrect() and so on. You can do testJSONCorrect() and test one thing that has multiple parts to verify its correctness. This becomes easier when you've got the frameworks that support it such as the assertAll("message", () -> assertSomething, () -> assertSomethingElse(), ...)

https://stackoverflow.com/q/40796756

catlifeonmars · on Nov 5, 2022

For the JSON example, some testing libraries (jest) will output a unified diff on assertion failure.

xhxivuvy · on Nov 5, 2022

It's not about it being hard to tell which assertion failed. It's about being hard to tell what the cause of the failure was.

When every test calls ->run_base_tests() before running it's own assertion sometimes things fail before you get to the root cause assertion.

The other problem of stacking assertions is that you'll see the first failure only. There may be more failures that give you a better picture of what's happening.

Having each assertion fail separately gives you a clearer picture of what's going wrong.

Fwiw, the book doesn't suggest what the reader is saying. It says what I've said above more or less.

You don't need to always stick to the rule but it generally does improve things to the point I now roll my eyes when I come across tests with stacked assertions and lots of test harness code that runs it's own assertions, I just know I'm in for a fun time.

sschueller · on Nov 5, 2022

In phpunit you can send dataset through a test function . If you don't label them you will have a jolly time finding out which one of the sets caused the failure.

spiffytech · on Nov 5, 2022

I've seen several test frameworks that don't abort a test case after the first failed assertion.

When you get many tests each emitting multiple failures because one basic thing broke, the output gets hard to sort through. It's easier when the failures are all eager.

codeflo · on Nov 5, 2022

Which ones? I’ve used at least a dozen at this point, across C++, C#, JavaScript, Rust — and all of them throw (the equivalent of) exceptions on assertion failures.

jmillikin · on Nov 5, 2022

GoogleTest (C++) and Go's built-in testing framework both support non-fatal assertions.

They're used for code like this:

  assert_eq(list.len(), 1)
  expect_eq(list[0].username, "jdoe")
  expect_eq(list[0].uid, 1000)

The idea being that if multiple properties are incorrect, then all of them will be printed out to the test log.

masklinn · on Nov 5, 2022

That seems like it'd easily get confusing, when the assertions are dependent. Which is often the case e.g. if the list is empty, testing the properties of the first item make no sense.

jmillikin · on Nov 5, 2022

That's why the first check is a hard assertion (returning on error), and the others are soft (continuing on error).

If the list is empty, then the log will contain one error about the length being zero. If the list has one item but it has the wrong properties, the log will contain two errors.

masklinn · on Nov 5, 2022

> That's why the first check is a hard assertion (returning on error), and the others are soft (continuing on error).

See that's so completely unclear I utterly missed that there were two different calls there. Doesn't exactly help that the functions are the exact same length, and significantly overlap in naming.

TeMPOraL · on Nov 5, 2022

That's just a matter of familiarity, though. And if you make a mistake, you'll discover it the first time the test fails - either you'll see too little output, or you'll see the test throw an exception or crash.

HereBeBeasties · on Nov 5, 2022

I bet if it were in monospace and you were reading code in context rather than on HN you'd have noticed.

That said, NUnit has much better syntax for this, where you put parallel/multiple assertions like this in an explicit block together: https://docs.nunit.org/articles/nunit/writing-tests/assertio...

Much less boilerplate duplication in the actual test framework, too.

masklinn · on Nov 5, 2022

> I bet if it were in monospace

It is.

> and you were reading code in context rather than on HN you'd have noticed.

(X) doubt

codeflo · on Nov 6, 2022

GoogleTest was the one we used. I forgot, but now that you mention it, I remember the expect variations. We had decided against them. It’s a confusing feature in my opinion. If that’s what people mean by “multiple assertions”, then I at least understand where there coming from.

k__ · on Nov 5, 2022

My experience with testing framework was that they do all tests and then mark the ones that failed.

carnitine · on Nov 5, 2022

That has nothing to do with not knowing which assertion in a given test has failed.

latch · on Nov 5, 2022

I haven't heard of the single-assertion thing in at least 10 years, probably 15. In the early 2000s, when I was starting out and doing .NET, it used to be something you'd hear in the community as a very general guideline, more like "there's something to be said about very focused tests, and too many assertions might be a smell." At the time, I got the impression that the practice had come over from Java and converted from a rule to a guideline (hardly the only bad practice that the .NET community adopted from Java, but thankfully they largely did move the needle forward in most cases).

(I wrote Foundations of Programming for any 2000s .NET developer out there!)

To hear this is still a fight people are having...It really makes me appreciate the value of having deep experience in multiple languages/communities/frameworks. Some people are really stuck in the same year of their 10 (or 20, or 30) years of experience.

ed25519FUUU · on Nov 5, 2022

Like you I was surprised to hear this is a thing or is even controversial. Admittedly I've only been programming for about 10 years, but I haven't heard (or seen) this come up even one time. Every test I've ever seen has usually had multiple mutations and assertions, all of them testing the same premise.

asabla · on Nov 5, 2022

> I wrote Foundations of Programming for any 2000s .NET developer out there!

Holy moly! Think I still have your book somewhere. So thank you for that.

In my last +10 years of .net development I haven't heard anything about single-assertion.

> Some people are really stuck in the same year of their 10 (or 20, or 30) years of experience.

I think this has manifested even more with the transition into .net core and now .net 5 and beyond. There are so many things changing all the time (not that I complain), which can make it difficult to pick up what's the current mantra for the language and framework.

choeger · on Nov 5, 2022

What? People really would criticize that code because it has two assertions? How are they ever testing any state changes?

And to the author: Your bubble is significantly different from mine. Pretty much every competent developer I've worked with would laugh at you for the idea that the second test case would not be perfectly fine. (But that first iteration would never pass code review either because it does nothing and thus is a waste of effort.)

pydry · on Nov 5, 2022

There's a lot of not very competent people in the industry who cling tightly to dogma.

Testing (especially unit) is an area of tech weirdly with a lot of dogmatism. I think Uncle Bob is the source of some of it.

musingsole · on Nov 5, 2022

I'm convinced if you read Uncle Bob carefully and follow all his suggestions... you'll have completely incapacitated whatever organization you infiltrated.

klysm · on Nov 5, 2022

Then you need to hire consultants to come fix it!

drewcoo · on Nov 5, 2022

> But that first iteration would never pass code review either because it does nothing and thus is a waste of effort.)

That first iteration would not be subject to code review. The author is using TDD.

https://en.wikipedia.org/wiki/Test-driven_development

mollerhoj · on Nov 5, 2022

To answer your question: We zealots test for the fact that something changes to some degree. E.g with rubys rspec library:

expect { foo.call() }.to change { bar.value }.by(2)

That is, regardless of the absolute value of bar.value, I expect foo.call() to increment it by 2.

The point of the 1 assertion per test guideline is to end up with tests that are more focused. Giving that you did not seem to think of the above technique, I'd say that this guideline might just have helped you discover a way to write better specs ;-)

Guidelines (that is, not rules) are of course allowed to be broken if you have a good reason to do so. But not knowing about common idioms is not a good reason.

You might argue that the above code is just sugar for 2 assertions, but thats beside the point: The test is more focused, there -appears- to be only one assertion, and thats what matters.

philliphaydon · on Nov 5, 2022

That’s because the example test only requires 1 assertion.

Any rule that says there should be only 1 assertion ever is stupid.

mollerhoj · on Nov 5, 2022

OP asked how any state change would be tested with a single 'assertion' and I provided an answer. Absolute rules are stupid, but our codebase has just short of 10k tests, and very few have more than one assertion.

The only reason I can really see to have more than one assertion would be to avoid having to run the setup/teardown multiple times. However, its usually a desirable goal to write code that require little setup/teardown to test anyways because that comes with other benefits. Again, it might not be practical or even possible, but that goes of almost all programming "rules"..

dd82 · on Nov 5, 2022

one assert per test seems... as you said, indicative of zealotry. if you already have your object there, why not test for the changes you expect?

So you have one test that indicates that a log error is outut. then another that tests that the property X in the return from the error is what you expect. then another test to determine that propery Y in return is what you expect?

that to me is wasteful, unclear, bloated. About the only useful result I can see that is it allows bragging about how many tests a project has.

jeremyjh · on Nov 5, 2022

If two tests call the same method with the same setup and arguments just to assert two different outcomes I would suggest that is the code smell.

dd82 · on Nov 6, 2022

Yep, but that seems to be the prevailing convention being asserted (pun intended) by many commenters.

mannykannot · on Nov 5, 2022

Furthermore, if you have a one-assertion rule, some bright spark will realize he can write a single assertion that checks for the conjunction of all the individual postconditions.

That's one way to get dogma-driven assertion roulette, as you will not know which particular error occurred.

seadan83 · on Nov 5, 2022

If all assertions are at the end of the test, then yes. Sometimes this can be made nice with custom matchers, eg:

assertThat(fooReturningOptional(), isPresentAndIs(4))

Or

assertThat(shape, hasAreaEqualTo(10))

Or

AssertThat(polygonList, hasNoIntersections())

Custom matchers can go off the deep end really easily. One of those cases of learn the principle, then learn when it does not apply

mannykannot · on Nov 5, 2022

The amount of setup and teardown necessary to test something is a property of the system under test. It is not susceptible to one's opinion as to how things should be.

mollerhoj · on Nov 5, 2022

There are usually different ways to design a system. Its often the case that designing the system such that it is easy to test (with little setup/teardown) has other benefits too. E.g. It often indicates low coupling and a more simple design.

That being said, there can of course me other tradeoffs e.g. performance and even cases where simple test setups are downright impossible.

scruple · on Nov 5, 2022

Interesting. Our (Rails) codebase is around 25,000 tests and less than half have a single assertion. Personally, there's some calculus in my head when I'm writing a test that determines if/when the scenario I'm testing needs multiple assertions.

mollerhoj · on Nov 5, 2022

rspec or minitest? ;-) Could rspecs 'expect change' idiom be the difference?

I find that reducing assertions per spec where I can a good guideline. E.g. combining expect(foo['a']).to eq(1) and expect(foo['b']).to eq(2) into expect(foo).to include('a' => 1, 'b' => 2) yields better error messages.

philliphaydon · on Nov 5, 2022

If there’s 2 tests which are identical but assert 2 different things. It should be a single test with 2 assertions.

Can always refactor to 2 tests if the test setup changes and the assertions begin to differ or become too complex.

krona · on Nov 5, 2022

You tested a postcondition. What about preconditions and invariants, do you have separate tests for those assertions too, or just not bother?

mollerhoj · on Nov 5, 2022

Please correct me if I'm wrong, but would a precondtion not just be the postcondition of the setup?

Invariants would either have to be publically available and thus easily testable with similar methods, or, one would have to use assertions in the implemention.

I try to avoid the latter, as it mixes implemations and 'test/invariants'. Granted, there are situations (usually in code that implements something very 'algorithm'-ish) where inline assertions are so useful that it would be silly to avoid them. (But implementing algos from scratch is rare in commercial code)

int_19h · on Nov 6, 2022

assertions are basically comments with teeth; why would you avoid them in any code, algorithmic or no?

AlphaSite · on Nov 5, 2022

I think a much better rule of thumb is: “A lot of small unit tests are better than a few big ones”. Same thing, but clearer intent and less rigid.

hinkley · on Nov 6, 2022

Unit tests should be cheap. Cheap to write, cheap to run, cheap to read, cheap to replace.

Near as I can tell, many people are made uncomfortable by this in practice because these tests feel childish and dare I say demeaning. So they try to do something “sophisticated” instead which is a slow and lingering death where tests are co corned.

Lacking self consciousness, you can whack out hundreds of unit tests in a couple of days, and rewrite ten of someone else’s for a feature or a bug fix. That’s fine and good.

But when your test looks like an integration test, rewriting it misses boundary conditions because the test is t clear about what it’s doing. And then you have silent regressions in code with high coverage. What a mess.

choeger · on Nov 5, 2022

I think you forgot at least one valid assertion and implied another one:

foo.call() might have a return value.

Also, the whole story invocation shouldn't throw an exception, if your language has them. This assertion is often implied (and that's fine), but it's still there.

Finally the test case is a little bit stupid, because very seldom code doesn't have any input that changes the behavior/result. So your assertion would usually involve that input.

If you follow that though consequently, you end up with property-based tests very soon. But property-based tests should have as many assertions as possible for a single point of data. Say you test addition. When writing property-based tests you would end up with three specifications: one for one number, testing the identity element and the relationship to increments. Another one for two numbers, testing commutativity and inversion via subtraction, and one for three numbers, testing associativity. In every case it would be very weird to not have all n-ary assertions for the addition operation in the same spot.

mollerhoj · on Nov 5, 2022

When you say I 'forgot' an assertion, are you implying that test should include all possible assertions on the code? That would perhaps cover more surface, but my goal (read zealot ideology) here is to have the tests help document the code:

test "pressing d key makes mario move 2 pixels right" {

expect { keyboard.d() }.to change { mario.x }.by(2)

}

I could test the value of the d() function, but I dont because I don't care what it returns.

Didnt understand the "whole story invocation" and exception part, am I missing some context?

Sure property-based testing can be invaluable in many situations. Only downside is if the tests become so complex to reason about that bugs become more likely in the tests than the implemenation.

I've sometimes made tests with a manual list of inputs and a list of expected outputs for each. I'd still call that 1 assertion tests (just run multiple times), so my definition of 1 assertion might too broad..

hinkley · on Nov 6, 2022

When you get the suites nested and configured right, and the code decomposed properly to support it, each of these assertions is two lines of code, plus the description of each constraint. So you just write four or five tests covering each one, in descending likelihood of breakage.

alkonaut · on Nov 5, 2022

An assert message says what went wrong, and on which code line. How on earth does it help to make just one? The arrange part might take seconds for a nontrivial test and that would need to be duplicated both in code and execution time to make two asserts.

If you painstakingly craft a scenario where you create a rectangle of a specific expected size why wouldn’t it be acceptable to assert both the width and height of the the rectangle after you have created it?

assert_equal(20, w, …

assert_equal(10, h, …

A dogmatic rule would just lead to an objectively worse test where you assert an expression containing both width and height in a single assert?

assert_true(w == 20 && h == 10,…)

So I can only assume the rule also prohibits any compound/Boolean expressions in the asserts then? Otherwise you can just combine any number of asserts into one (including mutating state within the expression itself to emulate multiple asserts with mutation between)!

ljm · on Nov 5, 2022

I’ve seen people take a dogmatic approach to this in Ruby without really applying any critical thought, because one assertion per test means your test is ‘clean’.

The part that is glossed over is that the test suite takes several hours to run on your machine, so you delegate it to a CI pipeline and then fork out for parallel execution (pun intended) and complex layers of caching so your suite takes 15 minutes rather than 2 and a half hours. It’s monumentally wasteful and the tests aren’t any easier to follow because of it.

The suite doesn’t have to be that slow, but it’s inevitable when every single assertion requires application state to be rebuilt from scratch, even when no state is expected to change between assertions, especially when you’re just doing assertions like ‘assert http status is 201’ and ‘assert response body is someJson’.

hamandcheese · on Nov 5, 2022

> I’ve seen people take a dogmatic approach to this in Ruby

I came up in ruby, heard this, and quickly decided it was stupid.

mollerhoj · on Nov 5, 2022

Yes, you got us rubyists there. :-( Its the unfortunate result of trying to avoid premature optimization and strive for clarity instead. Something thats usually sound advice.

Enginnering decisions have tradeoffs. When the testsuite becomes too slow, it might be time to reconsider those tradeoffs.

Usually though, I find that to road to fast tests is to reduce/remove slow things (almost always some form of IO) not to combine 10 small tests into one big.

ljm · on Nov 5, 2022

I think it’s a sound strategy more often than not, it’s just that RSpec’s DSL can make those trade-offs unclear, especially if you use Rubocop and follow its default RSpec rules.

It just so happens that your tests become IO bound because those small tests in aggregate hammer your DB and the network purely to set up state. So if you only do it once by being more deliberate with your tests, you’re in a better place.

int_19h · on Nov 6, 2022

I'd argue that it's the unfortunate result of Ruby being at the center of the Agile and XP scene back when it first became prominent (the manifesto etc) - because that scene is also where the more cultish varieties of TDD originated.

OJFord · on Nov 5, 2022

> I’ve seen people take a dogmatic approach to this in Ruby without really applying any critical thought, because one assertion per test means your test is ‘clean’.

I can't speak for Ruby, but what I would call 'clean' and happily dogmatise is that assertions should come at the end, after setup and exercise.

I don't care how many there are, but they come last. I really hate tests that look like:

    setup()
    
    exercise(but_not_for_the_last_time)

    assert state.currently = blah

    state.do_other()
    something(state)

    assert state.now = blegh

And so on. It stinks of an integration test forced into a unit testing framework.

I like them to look like:

    foo = FooFactory(prop="whatever")

    result = do(foo)

    assert result == "bar"

I.e. some setup, something clearly under test, and then the assertion(s) checking the result.

ljm · on Nov 5, 2022

I think even with integration tests they should still be treated similarly - at the end of the day you are setting expectations on an output given a certain input, there’s just a lot more going on in between.

There’s no avoiding it though when you want something end-to-end, or a synthetic test. You’re piling up a whole succession of stateful actions and if you tested them in isolation you would fail to capture bugs that depend on state. In that sense, better to run a ‘signup, authenticate and onboard’ flow in one test instead of breaking it down.

int_19h · on Nov 6, 2022

Here's a trivial rewrite that satisfies your dogmatic requirement without any meaningful difference:

    setup()
    
    exercise(but_not_for_the_last_time)

    was_blah = (state.currently == blah)

    state.do_other()
    something(state)

    assert was_blah && state.now == blegh

In fact, this last version is worse, because if do_other() can fail if state wasn't blah, then what you'll get is the exception from that failure interrupting the test before the assert would have been reported.

OJFord · on Nov 7, 2022

Exactly because that's 'without any meaningful difference' is why I don't like that either. I'm not obsessing over purely the 'assert' keyword as you perhaps think I am, it's the structure I don't like.

TeMPOraL · on Nov 5, 2022

> So I can only assume the rule also prohibits any compound/Boolean expressions in the asserts then? Otherwise you can just combine any number of asserts into one

That's what's bound to happen under that rule. People just start writing their complex tests in helper functions, and then write

  assert_true(myTotallySimpleAndFocusedTestOn(result))

ImPleadThe5th · on Nov 5, 2022

The way it's been explained to me is that because one assert failing blocks the other assertions from running you don't get a "full" picture of what went wrong.

So instead of:

- error W doesn't equal 20

Fix that

Run test again

- error H doesn't equal 10

Fix that

Run test again

It's - Error Width doesn't equal 20

- Error Height doesn't equal 10

Fix both

Run test

I think the time savings are negligible though. And it makes testing even more tedious, as if people needed any additional reasons to avoid writing tests.

hinkley · on Nov 6, 2022

Only a few programming languages have a facility to render that second assertion in a human readable way (python surprised me with this). Most C influenced languages will just present you with “assertion failed” or “expected true to be false” which means nothing. Test failure messages should be actionable, and that action is not, “read the test to see what went wrong”.