Code Coverage vs Test Coverage; Subjectivity and Usefulness

It’s really surprising how many people believe that code coverage and test coverage are the same thing. I don’t know where this confusion has stemmed from, but from scouring around the internet, it seems to be a common challenge that people switch between code coverage and test coverage interchangeably, probably subconsciously too. They are not the same. Let me use one of my son’s toys to explain…

My son, Angus, has a push/walker toy that he’s had for a while. It’s really helped him get his balance – he’s confidently running around at speed now at 17 months old no doubt this push toy has helped him with that.

Anyway… The push toy has some shaped holes cut out on the top and sides, and comes with blocks that fit in the holes. He loves this toy, and from watching him playing with this toy over the past few months, I’ve realized that it provides me with a perfect example for explaining the differences and subjectivity between code coverage and test coverage.

The toy

Here’s a picture of Angus’ toy:

Angus’ push toy with block shapes to insert into the holes

From the picture you can see the red rectangular block. That red block fits into the rectangular hole on the side of the toy.

Using this toy as an analogy, if we look at the rectangular hole as relating to code that represents a feature, and the red block representing data that a user can input relating to that specific feature, then pushing the block through the hole would essentially cover that code. It could be seen as 100% code coverage. We’ve pushed the block through the hole, therefore we’ve exercised 100% of that feature’s code.

But, if we look closer at the block, you’ll see that there is actually 16 different ways that you could pass this block through the same hole.

16 different ways to insert the block

The 6 edges of the red block…

The block has 6 sides, which I’ve labelled in the image above. And the block can fit into it’s hole from each side, in multiple ways:

  • Top side facing up:
    • Edge 1 side being inserted into the hole
    • Edge 2 side being inserted into the hole
    • Edge 3 side being inserted into the hole
    • Edge 4 side being inserted into the hole
  • Bottom side facing up:
    • Edge 1 side being inserted into the hole
    • Edge 2 side being inserted into the hole
    • Edge 3 side being inserted into the hole
    • Edge 4 side being inserted into the hole

But in addition to this, the block can be passed in each of these ways in two directions too:

  • Outside in (meaning you push the block through the hole so it lands inside the toy)
  • Inside out (meaning that you put your hand inside the toy and push the block through the hole so it lands outside of the toy on the floor).

This gives us a total of 16 different ways that you can pass the block through the hole that it’s made for.

So in terms of tests, with there being 16 known tests now, that means our single initial test that we mentioned above that potentially gave us 100% code coverage, actually only has a testcoverage of 6.25%.

And here’s another catch, this time with test coverage: test coverage percentages only relates to your known test ideas. There are 16 tests that we have thought of so far, so you might think running all 16 known tests gives us 100% test coverage… But there are inevitably more tests that we haven’t thought of yet.

Take the toy for example. It has other holes that have other shapes. And some of these other shapes also fit through the previous rectangular hole. From multiple sides too…

The cube hole at the top of the toy

See the square hole on the top of the toy? The cubed block that fits in that square hole also fits into the rectangular hole…
That cube block means that there is a further 48 possible ways to pass the second block through the rectangular hole.

You can see clearly that coverage percentage metrics are extremely subjective here, and purely releate to that snapshot in time based on what information is clearly known. They don’t in fact tell you anything about the quality of your software, or the quality of your testing.

If you and I were testing the exact same feature, at the same time, for the same duration of time, in isolation – our test coverage number will more than likely be different because we’ll think about and uncover different test ideas, and we will both have a level of unawareness of other unknown test ideas that we didn’t think about due to our different past experiences, beliefs and biases.

And if by fluke, we did end up with the same coverage number, it definitely doesn’t mean our testing is the same – our testing will be completely different. This situation would simply obfuscate the problem and the subjectivity further, and it nicely highlights again that coverage metrics don’t tell us about the quality of our application or the quality of our testing, or indeed what has been tested.
 

There is one thing that code coverage is useful for

Code coverage can be useful though – it tells you about areas of the application which are not being covered at all by any assertive tests. And this is a risk and acts as an invitation to investigate those untested areas.

So the percentages might not matter, but the information regarding what code isn’t being executed at all is a useful heuristic.

Do you use code coverage and test coverage metrics? How are you using them? I’d love to hear more about why or why not, and if you find them valuable. Leave a comment below!

24 thoughts on “Code Coverage vs Test Coverage; Subjectivity and Usefulness

  1. Right. Now: what about the paint on the block. Is it safe? Is it the colour that the designer wanted? Was there supposed to be a sticker on the block? What happens when the paint chips? Is the block underneath made of toxic material? …

    Liked by 1 person

    1. Thanks for the comment, Michael. And the amazing lateral thinking to highlight further the point too!

      There are vast more risks and variables to be uncovered and tested. And we’ll never think of them all either.

      Like

  2. A good explanation as well for the difference between structured testing separated from development and the unit testing that is expected to happen within development, where the latter is an expected obligation upon the guys in development, whereas the structured testing needs to take into account a range of other expected attributes – some of which were never explained to the developers, as there was no need for them to know.

    Liked by 1 person

  3. Thanks for the post Dan, to make things even worse you can achieve coverage of code of 100% without a single assert. Also code coverage gives you no indication of the quality of the tests or the impact of a failure relative to another test, the team or the business.

    Liked by 2 people

    1. This is what I was thinking. Neither Test Coverage nor Code Coverage are very useful if the assertions the tests contain aren’t well thought out. What would this be called? Assertion Coverage? Specification Coverage? Story Coverage?

      Like

  4. Good analogy. Very well explained. Thank you for the post.
    On Test coverage, again there are many way to come with diff. no’s, 16, 48, etc. However in a crunch timeline, one can always take risk based approach with methods like pairwise tests (based on orthogonal array) etc. there are many tools available in the market for the same.

    Liked by 1 person

  5. Hi Dan, Great post and the analogy really works – for far too long they have been seen as synonymous and this really illustrates just how far apart they are. In terms of code coverage having any useful value I’d agree with you the primary one is knowing where you have NOT been. However I think it can also have some value in evaluating basic coverage of logic paths by a specific set of tests – for instance geeting 100% statement coverage is easy but 100% modified condition decision or 100% pairwise can be achieved with a little more design effort, and use of free tools, which can be add more value to the test set and is useful in building and maintaining regression checks.

    Like

  6. But what about Code Coverage types? If you test a method in all possible scenarios and emphasis each component of your code from the aspect of branch\predicate\input coverage and so on, you will be able to safely declare the method “Covered”.
    Today’s tools allow you to measure each unit in different angle, 1 method can have 3 or more different coverage percentage and indicate it pretty accurate, for example 1 method can be 100% sequence covered but only 50%branch covered and 75% input covered, the moment you reach 100% in all of them you’ll be more likely to declare it “covered”.
    About the second part, I agree that it is useful to know which parts of your code are being tested at all but Code coverage is more useful than that!

    Like

    1. Yeah, you’re right that there are different kinds of things you can look at covering. I think the same still applies though.

      If you’re using “coverage” (from any type) as a metric for quality, what does it actually tell you about quality?

      It doesn’t tell you about “correctness” (only for the things that you are checking, but nothing beyond that). And it doesn’t tell you about “goodness” at all.
      It simply tells you that “this line, branch, statement, etc has been covered by running a check.
      Many times this could be a single check. Or it could be a couple of checks. But it’s very rare that it’s all the checks that you have relating to that single line, statement, branch, etc. As that defeats the point in simply aiming for “coverage”.

      So if it’s purpose is “quality” related, this is where I’d argue that it’s only useful for pointing out gaps in what hasn’t been tested/covered.
      But I suppose it could have more uses if you’re thinking about it from a perspective unrelated to quality?

      Like

  7. Wonderful explanation!
    I am writing on a piece now on how to assure high quality, or rather high relevance, tests. I believe that the only tests you should write are those that “locks down” and documents a behavior you care about. It does not matter that a method can’t handle null if it, in the context of the application, always is called after a null-check on the argument. You don’t have to write a test making sure the method throws an exception when the argument is null.

    By writing tests for the behaviors you care about you won’t spend time on maintaining irrelevant test code when you refactor the unit under test.

    Liked by 1 person

Please leave a comment!