I've heard a lot of people say this but I don't find it to be true. Realistic an...

pooktrain · on March 15, 2017

I'm not sure what you mean by "Realistic" tests. Perhaps you mean integrated tests that exercise, say, the entire stack from handling the HTTP request down to the database, if we're talking about a REST API.

If you don't care much about covering large portions of the state space and your tolerance for defects is high, then I'm sure your claim about improved productivity is true.

My tolerance for defects is low, which compels me to try and cover the state space more - happy paths, unhappy paths, etc. I've tried doing this with integrated tests, and found it to be very difficult to do in a timely manner. When I switched to using DI, coding to interfaces, and using mocks across architectural boundaries, my ability to cover the state space went up, and my time to deliver code with few defects went down.

I think this is because often the code I care about exercising is limited to a small section of the call stack. In an integrated test, I have to set up too much state and have the machine do too much irrelevant work just to verify this small portion of the code. With DI, interfaces, and mocks, I can isolate just this behavior much more quickly.

That leaves coupling and maintenance. This is a valid concern in the mockist style, but I haven't found it to be a limiting factor in maintaining my code. I think it comes down to testing the right things. If you're using mocks to verify a contract between two objects, and you want to change that contract, naturally your tests for the contract will break. In that case, I think it's fine to just delete them and write new tests that establish your contract. This kind of approach isn't right in all situations, but mockist style testing has helped my write some stable code that would have been a real pain to test due to external dependencies otherwise.

EDIT: That is not to say that your tolerance for defects is high, but rather that productivity is relative to your goals. Sometimes the time spent preventing defects and keeping maintainability high isn't worth it if the repercussions don't translate to dollars and cents.

crdoconnor · on March 16, 2017

>I'm not sure what you mean by "Realistic" tests.

* Where possible, use the real thing - that is, use an actual database in preference to, e.g. mock objects representing an object that you use to interact with a database.

* Where you need to use a mock (when the real thing is too expensive to use), make it a realistic mock.

>Perhaps you mean integrated tests that exercise, say, the entire stack from handling the HTTP request down to the database, if we're talking about a REST API.

Exactly, but if, say, there was some algorithmic code in there that could be exercised realistically without using the HTTP stack then the HTTP stack is not necessarily necessary.

>If you don't care much about covering large portions of the state space

Code coverage is entirely orthogonal to the type of test you are doing.

>your tolerance for defects is high, then I'm sure your claim about improved productivity is true.

You appear to be fairly confused about the distinction between test coverage and modes of testing.

>My tolerance for defects is low

If you put no emphasis on test realism then I suspect that your tolerance will naturally have to be quite high.

>I've tried doing this with integrated tests, and found it to be very difficult to do in a timely manner.

I found this too sometimes. It's a tooling issue. Integration test tooling is more expensive to build initially but it tends to be much more reusable than unit test tooling. SMTP client stubs may be quicker to build but relatively useless in future projects that use different libraries (sometimes even different versions of the same library) whereas a mock SMTP server is useful forever.

>In an integrated test, I have to set up too much state and have the machine do too much irrelevant work just to verify this small portion of the code.

I have no problem with my tests chewing up 10,000% more CPU than yours and taking an extra 5 minutes if they've got even a 2% higher chance of finding bugs. CPU time is cheap, my time is expensive, bugs are expensive. Easy trade off.

>That leaves coupling and maintenance. This is a valid concern in the mockist style, but I haven't found it to be a limiting factor in maintaining my code.

Still means you're writing and maintaining more code. That means more bugs and more work.

> think it comes down to testing the right things. If you're using mocks to verify a contract between two objects, and you want to change that contract, naturally your tests for the contract will break.

Yup, and since changing the contracts between different subsystems is the most critical and important part of refactoring code, you've just tossed out a large chunk of the mainr benefit of having tests - safe refactoring. This is the worst aspect of unit testing IMO - it cements technical debt because refactoring changes contracts that will turn the tests red.

pooktrain · on March 16, 2017

> Code coverage is entirely orthogonal to the type of test you are doing.

I did not say they were equivalent. Rather, I instead made a connection to the effort required to cover the state space in a given mode of testing. I've found it takes more integration tests to cover the state space than it does with unit tests, given the multiplicative nature of code paths in each layer. J.B. explains this in his talk, and I buy his argument, since I've experienced it myself.

For example, you could collapse several different types of failure at a lower layer into one code path in a higher layer using an exception, and verify that the HTTP layer returns the same thing when that exception happens. In this situation, you avoid having to set up the state necessary to cause the exception that the errors have been collapsed to, which may be 3 or 4 layers deeper. This means less test code, and less effort on my part.

> You appear to be fairly confused about the distinction between test coverage and modes of testing.

Nope, as explained above, the effort required to cover the state space is directly influenced by the mode of testing. For example, if you tested via typing on the keyboard with a pencil, you'd have to expend more effort to cover your state space. Of course, integration isn't as bad as pencil-testing, but the point is that one is easier than the other with respect to state space coverage.

Regarding reusable tooling - I've found this true as well. But tooling doesn't solve the whole problem. My main point is the amount of set up necessary to trigger a certain behavior, which tends to be too time consuming for me to justify the time. You can use tooling here as well, perhaps with data builders and object mothers. But that is also too much setup effort for my tastes.

> I have no problem with my tests chewing up 10,000% more CPU than yours and taking an extra 5 minutes if they've got even a 2% higher chance of finding bugs. CPU time is cheap, my time is expensive, bugs are expensive. Easy trade off.

Here you seem to assert that integration tests are more likely to find bugs. This is highly dependent on your skill in designing testable code that can be exercised without irrelevant portions of the stack. However, you do have a point that integration tests can find classes of bugs that unit tests may not (environmental setup, connection issues, etc.). I don't unit test those things of course.

Regarding CPU and the 5 minute difference - try 25 minutes difference. I have seen this in practice, and at that scale it really slows down the team. You can ease the pain with more machines, but I'd rather have focused, fast tests that can run right on my machine in as short a time as possible. I value the fast feedback loop.

> Still means you're writing and maintaining more code. That means more bugs and more work.

I haven't found this to be a problem. Sometimes I view messages between objects as behavior, so if the tests are inspecting that then I would actually want them to break if I change messages between objects. I've often found that if this is a hindrance to refactoring, there are problems in the code itself, not just the tests.

>Yup, and since changing the contracts between different subsystems is the most critical and important part of refactoring code...

I don't think that's true. If I'm changing contracts so often that it's a burden to maintain, I've likely missed an abstraction opportunity. Instead of blaming unit tests, I just fix the problem. Further, refactoring is not equivalent to changing subsystems. It encompasses many more useful activities that improve your code for the better. And further, if you take the stance that object interactions are behavior (sometimes, when useful), then changing that is not a refactor, but a re_work_. My favorite article on refactor vs. rework: http://www.daedtech.com/rewrite-or-refactor/

Thanks for the spirited debate!

crdoconnor · on March 17, 2017

>I did not say they were equivalent. Rather, I instead made a connection to the effort required to cover the state space in a given mode of testing. I've found it takes more integration tests to cover the state space than it does with unit tests, given the multiplicative nature of code paths in each layer.

This is more about the level of testing than their nature. You can have high level unit tests (not common, but you can) and you can also have low level integration tests (I have many).

I agree that the "combinatorial explosion" Rainsberger talked about is real, sort of (not to integration test, but to automated end to end tests) but it comes with many caveats:

* If your application is, by and large, gluing together a lot of different libraries that are tested independently of your application then you effectively have a test pyramid even if you only write tests for the top layer because the tests for layers beneath are all written by somebody else. i.e. just because you wrote any kind of app on top of the linux kernel means you somewhat are relying upon the kernel being well tested, but that doesn't mean you're the one doing it directly.

* Those components in the 'application pyramid' you are stringing together require testing independently but IMO it's still better to use integration testing all the way down because all of those components are integrating with something else.

* I've found great success mitigating the effects of the combinatorial explosion using randomized property based testing. Even with slow tests you can get very, very far with this.

* The combinatorial explosion can also be mitigated with stronger typing and more frequent and stringent sanity checking which shuts down invalid code paths.

>For example, you could collapse several different types of failure at a lower layer into one code path in a higher layer using an exception, and verify that the HTTP layer returns the same thing when that exception happens. In this situation, you avoid having to set up the state necessary to cause the exception that the errors have been collapsed to, which may be 3 or 4 layers deeper. This means less test code

I agree with this but I still see it as being orthogonal to the kind of test you use to test each layer.

>Nope, as explained above, the effort required to cover the state space is directly influenced by the mode of testing. For example, if you tested via typing on the keyboard with a pencil, you'd have to expend more effort to cover your state space.

>Of course, integration isn't as bad as pencil-testing, but the point is that one is easier than the other with respect to state space coverage.

I personally found that the up front investment for integration tests is usually higher to put the initial infrastructure all in place - but, once you have gotten past an inflexion point, integration tests are typically easier to write.

For unit tests the investment is roughly constant.

>Here you seem to assert that integration tests are more likely to find bugs. This is highly dependent on your skill in designing testable code that can be exercised without irrelevant portions of the stack

Not really. They find more bugs because they test more realistically. When you are testing against a fake model of a database with fake data that you built for a unit test you are going to inevitably miss stuff that you will pick up on if you use a real database with real data. That's just life.

I do think that the less your code is about linking together different pieces of code and systems and the more it is about 'pure algorithmic calculation', the less you have to face this problem. However, with most code in real life the hard part is typically about integrating things together - most business problems aren't facebook's spam filter.

>However, you do have a point that integration tests can find classes of bugs that unit tests may not (environmental setup, connection issues, etc.). I don't unit test those things of course.

Right. I actually track and classify bugs that I see day to day and these are typically way more common than logical bugs. Moreover, it's not like integration tests don't catch logical bugs as well. They do.

>Regarding CPU and the 5 minute difference - try 25 minutes difference. I have seen this in practice, and at that scale it really slows down the team. You can ease the pain with more machines, but I'd rather have focused, fast tests that can run right on my machine in as short a time as possible. I value the fast feedback loop.

I do as well, but I find that until individual tests take > 30 seconds, or regression test suite breakages become very common, the cumulative effect on my productivity is very minor.

If an entire regression test suite takes an hour it doesn't bother me in the slightest. Hell, I used to have a 32 hour regression test suite once (long story) and even that didn't bother me that much - a release schedule of once a day with a small (~10%) chance of skipping a day when there were failures was more than acceptable for the business.

If that project had unit tests they would have caked themselves around a bunch of really horrible API contracts between module layers and they would have required a rewrite pretty much every time we rewrote anything. Faster to run, yes, but vastly more work to maintain.

>I don't think that's true. If I'm changing contracts so often that it's a burden to maintain, I've likely missed an abstraction opportunity. Instead of blaming unit tests, I just fix the problem.

IME it likely means somebody who came before you missed an opportunity. Since I largely work on projects that existed years before I joined them did I don't see the solution of "just fix the API" as viable. One fix leads to another which leads to another and before you know it it's been weeks or months and you haven't delivered anything. Refactoring needs to be staggered.

It's better to start out with the presumption that you will benefit more by not locking down your API contracts more than you need to conserve CPU horsepower.

>Thanks for the spirited debate!

Likewise.