Hacker News new | past | comments | ask | show | jobs | submit login

> How is that even possible in the first place? The entire job of an assertion is to wave a flag saying "here! condition failed!".

I envy you for never having seen tests atrocious enough where this is not only possible, but the common case.

Depending on language, framework and obviously usage, assertions might not be as informative as providing the basic functionality of failing the test - and that's it.

Now imagine this barebones use of assertions in tests which are entirely too long, not isolating the test cases properly, or even completely irrelevant to what's (supposedly) being tested!

If that's not enough, imagine this nightmare failing not after it has been written, but, let's say 18 months later, while being part of a massive test suite running for a while. All you have is a the name of the test that failed, you look into it to find a 630 lines long test "case" with 22 nondescript assertions along the way. You might know which line failed the test, but not always. And of course debugging the test function line by line doesn't work because the test depends on intricate timing for some reason. The person who wrote this might not be around and now this is your dragon to slay.

I think I should stop here before triggering myself any further. Therapy is expensive.




> You might know which line failed the test, but not always.

If that's the case, the test framework itself is severely flawed and needs fixing even more than the tests do.

There's no excuse to have an assert function that doesn't print out the location of the failure.


Even if the framework is fine, you can see something like an elaborate if-else tree, or even a try-catch block, and after it's all done, there's a condition check with `fail()`. So the point of failure could be manually detached from the actual point of failure.

Granted, this is not the way to do things. But it happens anyway.


I mean in your example it’s someone choosing not to use asserts. Which is a problem, don’t get me wrong, but it’s not the problem being talked about here.

The comment thread is about “Assertion Roulette” — having so many assertions you don’t know which went off. Which really seems like a test framework issue more than a test issue.


So because some idiot somewhere wrote a 100 assertion unit test we should ban anyone from writing even 2 assertions in one test?


Not at all. It makes sense in some tests. I addressed the part asking how it's even possible to not know what happened.

As for multiple asserts, that is really meaningless. The test case should test one thing. If it requires several asserts that's okay. But having a very long test function with a lot of assertions, is strongly indicating that you're testing more than one thing, and when the test fails it will be harder to know what actually happened.


I guess you might be talking about a different language / environment than I'm used to, but even in the 100 assertion test case you get useful tracebacks in python. Testing lots of things at the same time means strictly speaking you're writing an integration test rather than a unit test, but still I don't see how it's a bad test. It's easy and stops buggy PRs going into production.

The test failures I see that are actually hard to debug are ones where the failures are difficult to reproduce due to random input, tests running in parallel and sharing the same filesystem etc. I don't think I've ever not known what assert was failing (although I guess in theory you could make that happen by catching AssertionError).


> Testing lots of things at the same time means strictly speaking you're writing an integration test rather than a unit test

There's nothing wrong with integration tests, but they're not unit tests. It's fine to have both, but the requirements for a good unit test and those for a good integration test diverge. The title of this post, at least, was specific to unit tests.


A unit tests tests one unit. And integration tests covers more than one unit. I think everyone agrees with that, but nobody has defined unit.

The longer I program the more I am convinced that the larger your unit the better. The unit tests is a statement that you will never refactor across this line, and that eliminates a lot of flexibility that I want.

It turns out that debugging failed integration tests is easy,the bug is in the last thing you changed. Sure the test covers hundreds of lines, but you only changed one.


I recently went to the effort of trying to work out where the term unit test came from in some desperate effort to find what a unit was meant to be.

After much googling and buying or ancient text books I hit a dead end. At this point I think "unit" is just noise that confuses people into making distinctions that don't exist.


As I recall the TDD mailing list has some background on the use of the word "unit", it goes WAY back, I believe it goes back to the mainframe/ punch card era. Regardless, I think it roughly translates to C's notion of the unit of compilation.

Which is obviously not what people really mean these days, but the phrase stuck. The early Xp'ers even found it an issue back then.

For a while people tried to push the term "micro tests", but that didn't really take off.

I agree with Gerard Mezaros and Martin Fowler and typically follow their (very mainstream) definitions on this stuff. Integration and functional testing have their own ambiguities too, it's definitely a frustrating situation to not have solidly defined foundational terms.


IIRC the "unit" in "unit test" was meant to mean "semantic unit" ("the access module", for example, should be distinct with a well-defined interface that all the tests go through), but very quickly turned into "syntactic units" ("a single function", for example, where the "well-defined interface" ends up just being function arguments/return value) because most people didn't understand what the original proponents meant.


I have a Web API which calls a DB API which calls a stored procedure which executes several SQL statements.

The Web API has a well-defined and documented interface. Is it a “unit”?


In the semantic sense? No, or at least probably not if that's where you jump to. You're still thinking in terms of syntax.

Think in terms of business rules, not the code structure: What's one thing your API does?


See "Software Engineering at Google" https://abseil.io/resources/swe-book/html/ch11.html, the definition of a "small test".

This is roughly my definition of unit test: "tests run in a single process"


> It turns out that debugging failed integration tests is easy,the bug is in the last thing you changed. Sure the test covers hundreds of lines, but you only changed one.

That’s not true.

A correct change might expose an existing bug which hadn’t been tested or expose flaky behavior which existed but hadn’t been exercised. In both cases the solution is not to revert the correct change, but to fix the buggy behavior.


Michael Pollan was right: Write tests, not too many, mostly integration.


Contrary viewpoint: Integrated Tests Are A Scam (J.B. Rainsberger):

https://www.youtube.com/watch?v=fhFa4tkFUFw


Watched a bit of this... It's typical test-driven zealotry; the main criticism of integration tests seems to be that they don't force your hand in system design in the way that unit tests do? Which seems very silly, but then, I'm not a person who goes to conferences about testing philosophy.


Did you miss his follow-up? "Integration tests are a scam is a scam". For real. I like J.B., but I think he muddies the water too much and overall understanding suffers.


> The unit tests is a statement that you will never refactor across this line, and that eliminates a lot of flexibility that I want.

I certainly don't see it as that. I see it as "this is the smallest thing I _can_ test usefully". Mind you, those do tend to correlate, but they're not the same thing.


> this is the smallest thing I _can_ test usefully

Then you're testing useless things.

Usefulness is when different parts of a program work together as a coherent whole. Testing DB access layer and service layer separately (as units are often defined) has no meaning (but is often enforced).

Queue in memes about "unit tests with 100% code coverage, no integration tests" https://mobile.twitter.com/thepracticaldev/status/6876720861...


>> this is the smallest thing I _can_ test usefully

> Then you're testing useless things.

We'll have to agree to disagree then.

> Testing DB access layer and service layer separately (as units are often defined)

Not at all. For me, a unit is a small part of a layer; one method. Testing the various parts in one system/layer is another type of test. Testing that different systems work together is yet another.

I tend to think in terms of the following

- Unit test = my code works

- Functional test = my design works

- Integration test = my code is using your 3rd party stuff correctly (databases, etc)

- Factory Acceptance Test = my system works

- Site Acceptance Test = your code sucks, this totally isn't what I asked for!?!

The "my code works" part is the smallest piece possible. Think "the sorting function" of a library that can return it's results sorted in a specific order.


And the only actual useful tests are functional (depending on how you write them) and above.

If those fail, it means that neither your design nor your code works.

The absolute vast majority of unit tests are meaningless because you just repeat them again in the higher level tests.


That seems like a silly opinion to me. I use unit tests to make sure that individual units work like I expect them to. And I use them to test edge cases that can be tested separately from their caller. If I had to test all the use cases for each function, all combined together, there number of tests would grow by the multiplication of the partitions of each one, N x M x O x P, ... rather than the sum, plus a much smaller set of tests for how they work together (N + M + O + P + N_M + M_O + O_P, etc). It's much simpler to thoroughly test each unit. Then test how they work together.


> If I had to test all the use cases for each function, all combined together, there number of tests would grow by the multiplication of the partitions of each one

Why would they? Do these edge cases not appear when the caller is invoked? Do you not test these edge cases and the behavior when the caller is invoked?

As an example: you tested that your db layer doesn't fail when getting certain data and returns response X (or throws exception Y). But your service layer has no idea what to do with this, and so simply fails or falls back to some generic handler.

Does this represent how the app should behave? No. You have to write a functional or an integration test for that exact same data to test that the response is correct. So why write the same thing twice (or more)?

You can see this with Twitter: the backend always returns a proper error description for any situation (e.g. "File too large", or "Video aspect ratio is incorrect"). However, all you see is "Something went wrong, try again later".

> It's much simpler to thoroughly test each unit. Then test how they work together.

Me, telling you: test how they work together, unit tests are usually useless

You: no, this increases the number of tests. Instead, you have to... write at least double the amount of tests: first for the units, and then test the exact same scenarios for the combination of units.

----

Edit: what I'm writing is especially true for typical microservices. It's harder for monoliths, GUI apps etc. But even there: if you write a test for a unit, but then need to write the exact same test for the exact same scenarios to test a combination of units, then those unit tests are useless.


Unit one - returns a useful test for each type of error condition that can occur (N). Test that, for each type of error condition that can occur. One test for each error condition.

Unit two - calls unit one - test that, if unit one returns an error, it is treated appropriately. One test, covers all error conditions because they're all returned the same way from Unit one.

Unit three - same idea as unit one

If you were to test the behavior of unit one _through_ units 2 and 3, you'd need 2*N tests. If you were to test the behavior of unit one separately, you'd need N+2 tests.

You're missing the point that you don't need to test "the exact same scenarios for the combination of units", because the partitions of <inputs to outputs> is not the same as the partitions for <outputs>. And for each unit, you only need to test how it handles the partitions of <outputs> for the items, it calls; not that of <inputs to outputs>.


> If you were to test the behavior of unit one _through_ units 2 and 3, you'd need 2*N tests.

There are only two possible responses to that:

1. No, there are not 2*N tests because unit 3 does not cover, or need, all of the behavior and cases that flow through those units. Then unit testing unneeded behaviors is unnecessary.

2. Unit 3 actually goes through all those 2*N cases. So, by not testing them at the unit 3 level you have no idea that the system behaves as needed. Literally this https://twitter.com/ThePracticalDev/status/68767208615275315...

> You're missing the point that you don't need to test "the exact same scenarios for the combination of units", because the partitions of <inputs to outputs>

This makes no sense at all. Yes, you've tested those "inputs/outputs" in isolation. Now, what tests the flow of data? That unit 1 outputs data required by unit 2? That unit 3 outputs data that is correctly propagated by unit 2 back to unit 1?

Once you start testing the actual flow... all your unit tests are immediately entirely unnecessary because you need to test all the same cases, and edge cases to ensure that everything fits together correctly.

So, where I would write a single functional test (and/or, hopefully, an integration test) that shows me how my system actually behaves, you will have multiple tests for each unit, and on top of that you will still need a functional test, at least, for the same scenarios.


> Once you start testing the actual flow... all your unit tests are immediately entirely unnecessary because you need to test all the same cases, and edge cases to ensure that everything fits together correctly.

You don't, but it's clear that I am unable to explain why to you. I apologize for not being better able to express what I mean.


> You don't

If you don't, then you you have no idea if your units fit together properly :)

I've been bitten by this when developing microservices. And as I said in an edit above, it becomes less clear what to test in more monolithic apps and in GUIs, but in general the idea still holds.

Imagine a typical simple microservice. It will have many units working together:

- the controller that accepts an HTTP request

- the service layer that orchestrates data retrieved from various sources

- the wrappers for various external services that let you get data with a single method call

- a db wrapper that also lets you get necessary data with one method call

So you write extensive unit tests for your DB wrapper. You think of and test every single edge case you can think of: invalid calls, incomplete data etc.

Then you write extensive unit tests for your service layer. You think of and test every single edge case you can think of: invalid calls, external services returning invalid data etc.

Then you write extensive unit tests for your controller. Repeat above.

So now you have three layers of extensive tests, and that's just unit tests.

You'll find that most (if not all) of those are unnecessary for one simple reason: you never tested how they actually behave. That is, when the microservice is actually invoked with an actual HTTP request.

And this is where it turns out that:

- those edge cases you so thoroughly tested for the DB layer? Unnecessary because invalid and incomplete data is actually handled at the controller layer, or service layer

- or that errors raised or returned by service wrappers, or the db layer either don't get propagated up, or are handled by a generic catch all so that the call returns a nonsensical stuff like `HTTP 200: {error: "Server error"}`

- or that those edge cases actually exist, but since you tested them in isolation, and you didn't test the whole flow, the service just fails with a HTTP 500 error on invalid invocation

Or, instead, you can just write a single suite of functional tests that test all of that for the actual controller<->service<->wrappers flow covering the exact same scenarios.


Why ? In c++ if I have a test case with

    REQUIRES(v1::foo(0) == v2::foo(0));
    REQUIRES(v1::foo(1) == v2::foo(1));
And the second assert fails the error message will tell me exactly that, the line, and the value of both function calls if they are printable. What more do you want to know "what actually happened"?


> when the test fails it will be harder to know what actually happened

This should not ever be possible in any semi-sane test environment.

One could in theory write a single test function with thousands of asserts for all kinds of conditions and it still should be 100% obvious which one failed when something fails. Not that I'd suggest going to that extreme either, but it illustrates that it'll work fine.


> when the test fails it will be harder to know what actually happened.

Yeah, and if you write one assertion at a time, it will be harder to write the tests. Decreasing #assertions/test decreases the speed of test debugging while increasing the time spent writing non-production code. It's a tradeoff. Declaring that the optimal number of assertions per test is 1 completely ignores the reality of this tradeoff.


That's true and it boils down to what's acceptable in your team (or just you). I worked in some places where coverage was the only metric and in places where every single function had to have all cases covered, and testing took longer than writing the code.

As for me, I tend to write reasonable tests and cover several cases that guard the intended behavior of each function (if someone decides the function should behave differently in the future, a test should fail). One emerging pattern is that sometimes during testing I realize I need to refactor something, which might have been lost on me if I skimmed on tests. It's both a sanity check and a guardrail for future readers.


Just use you best judgement and you will be fine.

Patterns are context specific advice about a solution and its trade-offs, not hard rules for every situation.

Notice how pattern books will often state the relevant Context, the Problem itself, the Forces influencing it, and a Solution.

(This formulaic approach is also why pattern books tend to be a dry read).


Devious commenter was describing a (normal) scenario where a unit test is not precise. No need to follow up with an aggressive "so what you're saying is".


Previous* hah.


> So because some idiot somewhere wrote a 100 assertion unit test we should ban anyone from writing even 2 assertions in one test?

You're falling prey to slippery slope fallacy, which at best is specious reasoning.

The rationale is easy to understand. Running 100 assertions in a single test renders tests unusable. Running 10 assertions suffers from the same problem. Test sets are user-friendly if they dump a single specific error message for a single specific failed assertion, thus allowing developers to quickly pinpoint root causes by simply glancing through the test logs.

Arguing whether two or three or five assertions should be banned misses the whole point and completely ignores the root cause that led to this guideline.


>Test sets are user-friendly if they dump a single specific error message for a single specific failed assertion, thus allowing developers to quickly pinpoint root causes by simply glancing through the test logs.

As if this actually happens in practice, regardless of multiple or single asserts. Anything that isn't non-trivial will at most tell you what doesn't work, but it won't tell you why it doesn't work. Maybe allowing an educated guess when multiple tests fail to function.

You want test sets to be user friendly? Start at taking down all this dogmatism and listening to the people as to why they dislike writing tests. We're pushing 'guidelines' (really more like rules) while individuals think to themselves 'F this, Jake's going to complain about something trivial again, and we know these tests do jack-all because our code is a mess and doing anything beyond this simple algorithm is a hell in a handbasket".

These discussions are beyond useless when all people do is talk while doing zero to actually tackle the issues of the majority not willing to write tests. "Laziness" is a cop-out.


> Depending on language, framework and obviously usage, assertions might not be as informative as providing the basic functionality of failing the test - and that's it.

Well, avoiding ecosystems where people act dumb is a sure way to improve one's life. For a start, you won't need to do stupid things in reaction of your tools.

Yes, it's not always possible. But the practices you create for surviving it are part of the dumb ecosystem survival kit, not part of any best practices BOK.


> ... where this is not only possible, but the common case.

Couldn't not read that in Peter Sellers' voice https://m.youtube.com/watch?v=2yfXgu37iyI&t=2m36s

> ... to find a 630 lines long test "case" with 22 nondescript assertions along the way.

This is where tech team managers are abrogating their responsibility and job.

It's the job of the organization to set policy standards to outlaw things like this.

It's the job of the developer to cut as many corners of those policies as possible to ship code ASAP.

And it's the job of a tech team manager to set up a detailed but efficient process (code review sign offs!) that paper over the gap between the two in a sane way.

... none of which helps immediately with a legacy codebase that's @$&@'d, though.


> It's the job of the developer to cut as many corners of those policies as possible to ship code ASAP.

I can't tell if this is supposed to be humor, or if you actually believe it. It's certainly not my job as a developer to ship worse code so that I can release it ASAP. Rather, it's my job to push back against ASAP where it conflicts with writing better code.


You are not most developers.

And furthermore, you are not the developer most non-tech companies want.

Those sorts of companies want to lock the door to the development section, occasionally slide policy from memos under the door, and get software projects delivered on time, without wasting any more thought on how the sausage gets made.


22 assertions in a test is a lot better than 22 separate tests that fail for the same reason.


With 22 separate tests you have the possibility of knowing that only a subset of them fail. Knowing which fail and which pass may help you debug.

In Go, in general, tests fail and continue, rather than causing the test to stop early, so you can tell which of those 22 checks failed. Other languages may have the option to do something similar.

https://pkg.go.dev/testing#T.Fail


They are the same. I don't care, as I'll fix them one at a time, and if the fix happens to fix more than one great.


Fixing one at a time like that is a good way to get into an endless cycle. Zoom out a bit and make a plan before you start coding.


That should be understood wothout saying. Even with a plan I'm going to fix one at a time.


Tech debt can often be fixed by more tech debt instead of fixing the root problem.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: