Java Mutation Testing

cfontes · on May 6, 2015

I can't believe I've never heard of this. It looks really useful.

Thanks for sharing. I will play with it today!

Relevant plugin for IntelliJ: https://plugins.jetbrains.com/plugin/7119?pr=idea

0hjc · on May 6, 2015

I'd recommend using one of the build tool integrations first - they're part of the core project. The IDE integrations still needed a bit of work last time I looked at them.

kremlin · on May 6, 2015

That's an interesting concept - testing the tests themselves.

jasonmp85 · on May 6, 2015

I used to use a very convoluted coverage setup with my Rails apps to ensure that coverage was only counted for the parts directly under test. To clarify: it's pretty easy in a Rails app to write an integration test that hits one endpoint and uses every model.

Because end-to-end integration tests don't actually assert anything about model properties, it's incorrect to record model coverage during such tests. So each controller test only recorded coverage for the controller it was testing (and nothing else), each model test only recorded coverage for the model it was testing (and nothing else), etc.

Of course, in a more IoC/DI-type system (Spring, etc.) well-written tests don't interact with other objects in the first place: stubs or mocks are injected. In that case it's easier to ensure a test for a given component only exercises that one component, but you still have to verify that your assertions are meaningful for the coverage to ultimately mean anything.

So I guess what I'm saying is even with all those precautions and thought, a tool like this is extremely useful to tell you "uh, hey, this method you think exercises all your code isn't actually asserting anything meaningful about your code's behavior". More of this, please!

0hjc · on May 6, 2015

If you're working with ruby mutant is pretty good -> https://github.com/mbj/mutant

kremlin · on May 6, 2015

Yes, writing tests is easy; however writing rigorous tests ... seems like one of those things that you can't really trust yourself not to say, 'Eh, it's rigorous enough'.

TheLoneWolfling · on May 6, 2015

A problem with this is that sometimes nondeterminism is OK.

For instance, changing the constant to another prime in the "classic" hashcode implementation (repeatedly multiply by a prime and add the next field) will (probably) not trigger any (well-written) tests, and indeed generally won't be detrimental at all, but will be flagged by this sort of test.

0hjc · on May 8, 2015

This is what it known an "equivalent mutation".

Along with high computational cost, it's one of the things identified in academic research as preventing the widespread use of mutation testing. There is no general method for distinguishing an equivalent mutation from a normal surviving one except for getting a human to have a look at it.

Having built pitest to address the concerns around computational cost, it was a pleasant surprise to find out that in practice equivalent mutations are not much of a problem.

This isn't entirely by accident - the default set of mutation operations are carefully designed to make equivalent mutations unlikely (they don't/can't guarantee not to create them, but they make them as unlikely as possible).

There's a trade off here. Pitest has a smaller set of operators than a lot of research focussed systems. A larger set of operators would catch more issues, but would also create a larger proportion of equivalent mutants (and also take longer to run).

There are more operators you can enable you wish to change this balance - an operator that changes constants as you describe is one of them.

I rarely encounter equivalent mutants using the default operators and I know of some rollouts of pitest where they break the build on anything less than 100% mutation coverage.

I have no figures to back this up, but I strongly suspect the % of equivalent mutants will be highly dependent on coding style and the domain in which the code operates.

TheLoneWolfling · on May 10, 2015

> I have no figures to back this up, but I strongly suspect the % of equivalent mutants will be highly dependent on coding style and the domain in which the code operates.

Agreed. In everything I've written, at least, I can identify multiple places where such equivalent mutants do exist, even with just the default operators. (In particular, hashcode methods - there are very few mutators that break the contract of a hashcode method, though removing entropy in most cases) But I can easily see that not being the case with other coding styles.

Are the mutations done documented anywhere? I had to look through the source to see what's done.

0hjc · on May 11, 2015

The mutators are documented at http://pitest.org/quickstart/mutators/

the_af · on May 7, 2015

I'm not an expert on this topic, but I don't think those are the kinds of changes generated by mutation testing (or at least, not the only ones). A more interesting example of a mutation is changing a "<" to a ">" in a condition. If your tests don't detect this, you're in trouble...

TheLoneWolfling · on May 7, 2015

There are plenty of places where inverting a condition won't matter. Anything that just requires a consistent ordering, for one.

the_af · on May 7, 2015

Your tests should be written in a way that they test there is a consistent ordering, if that's a relevant property for you. If they don't, you have a problem. If they do, I guess the mutation framework will merely point out your tests don't detect some changes; it doesn't necessarily mean they are bad tests, it's merely a "smell" that deserves further investigation.

(By the way, I think you're overstating your case when you claim "there are plenty of places" where it doesn't matter. I think in the vast majority of cases, if you change a "<" to a ">", you introduce a bug!)

TheLoneWolfling · on May 7, 2015

I agree that in the majority of cases it will cause bugs. But what I'm saying is that there are cases where it can legitimately not cause a bug.

The problem I have is this: this testing framework explicitly states the following:

> The quality of your tests can be gauged from the percentage of mutations killed

So, you write your tests to test that there is a consistent ordering, it changes the "<" to a ">", finds that all tests still pass (because, as you said, you're only testing if there is a consistent ordering, because that's the only thing that matters in this case), and considers your tests lower-quality because of that.

This is a flaw most mutation testing - and, for that matter, test driven design - suffers.

For another flaw: there are many things that end up along the lines of a bunch of shortcut "fast paths" at the start, with a slow path at the end that does slow but exhaustive checking.

For example, if you're checking that two objects are equal you may insert a check at the start to see if they are the same pointer, and if so return `true`. If you're doing division in software you may check if you're dividing by 1 instead of doing the computation. That sort of thing.

Any mutation that introduces false negatives into the shortcuts (but no false positives) won't be flagged by mutation testing.

johnflan · on May 6, 2015

I have used Pitest at work, it is very good and on more than one occasion it unearthed wanting tests.

Unfortunately, we had to remove it from our build. Our CI pipeline uses VM's that were not provisioned with this type of testing in mind and pitest ended up slowing the build enough to make it painful. If we could get past this, I would turn it on in the morning.

the_af · on May 6, 2015

That's too bad that you had to stop using it.

I've never used one of these tools, but I knew they existed. Just today I was discussing "testing the tests" with a coworker. In my opinion, where I work we write lots of incomplete/illogical tests, sometimes bordering on cargo culting. Aside from code-coverage, which is flawed, we have no real measure of whether the tests we write are effective or not.

oliverc2 · on May 6, 2015

There is a tool to detect duplicate tests which many have found useful: http://ortask.com/testless/

For other kinds of test quality, try Mutator: http://ortask.com/mutator/

the_af · on May 6, 2015

Thanks for the links!

I hate to sound negative, but something about that website seems dodgy. I'm unconvinced duplicate/overlapping tests -- while obviously undesirable -- have a direct correlation with code quality. Unfortunately, in order to read their "papers" where they elaborate on this, I have to register :/

I'd rather use something open source like pitest.

(Again, thanks for the links! I don't want to sound too negative)

0hjc · on May 6, 2015

Did you try using the history file option?

johnflan · on May 6, 2015

For every check in we get generate a new build environment, so managing a history file would have been quite difficult.

paulmd · on May 6, 2015

What about checking the history file into its own rep? Clone it as part of the build-environment setup, and commit/push after validation. You can even use this across a group of people and share each other's mutations.

Or even simpler, why not have build actions copy it to/from a specific location outside the build target dir? You could even make it a shared network drive, if you didn't care about the possibility of losing someone's changes if two people run tests at the same time.

JD557 · on May 7, 2015

There seems to be a sbt plugin by the original creator that has not been updated since last year: https://github.com/hcoles/sbt-pit

Does anyone know if the current version of pit already works well with scala?

dlhavema · on May 6, 2015

so if i read this correctly "This filter will not work for tests that utilise classes via interfaces, reflection or other methods where the dependencies between classes cannot be determined from the byte code."

if you are using interfaces to inject and mock things, it cannot test your code?

this sounds really cool, but most of the stuff we do is down with interfaces...

0hjc · on May 7, 2015

No - pitest has no problems with the use of interfaces, reflection etc.

The sentence you quoted relates to an optional feature that allows the tests that will be run against a mutated class to be limited to those within a certain "distance" (i.e number of method calls) from it.

The feature is little used, but is useful in some very specific circumstances.

Even if you do enable it, it would only cause a problem if the class under test was only referred to within the test by some interface that it implemented (which would be very unusual). The fact that the classes dependencies were declared as interfaces would cause no problems.

jmsguy · on May 7, 2015

Unit tests should have exactly one system under test. That has to be a class, not an interface.

You may mock dependencies that may/may not be interfaces. This is safer than using concrete dependencies whose behaviors may change once the fuzzer does its thing.

The behaviors of the classes that implement the mocked dependencies have nothing to do with the system under test.

So if you're writing unit tests that exactly one unit, you should be good to go.

krzyk · on May 7, 2015

Regarding one system under test, is there a java library (with maven preferably) that would check that unit test is performing tests on single class and all the rests are mocked (e.g. with mockito, or with custom anonymous classes)?

ajanuary · on May 7, 2015

Depending on who you talk to, a unit isn't necessarily a single class.

jmsguy · on May 7, 2015

Actually this is a really good talking point that's often the start of many interesting "discussions":

Namely, what happens when you're done with the test-code-repeat cycle for a system under test and you want to make it more architecturally sound / OO / etc.

You typically might end up doing a refactor in which you extract classes from the original system under test and colocating common functionality into new class(es).

In this way you still have the same coverage as before... But you're actually testing multiple classes as a unit.

Some folks would argue you need to split the tests out. Otherwise would say "the coverage is there, what's the point?".

Unless I need to do something, I'm not going to do it.

In playing with pitest, it looks like the refactorings might introduce some fuzzing that needs to be considered in the original (and refactored) SUT depending on how much conditional logic you're moving around.

Arghh. There goes my evening. Will be playing with this after work now :)

c4n4rd · on May 6, 2015

if you are the site owner, please correct:

Its fast, ...

with

It's fast,...

oliverc2 · on May 6, 2015

There are also other great mutation testing tools for other languages, such as Mutator: http://ortask.com/mutator/