Sunday, February 19, 2012

Are we there yet? - The metrics of destination


Consider these stories …


The developers tale



You're taking your children on a long distance journey. You know your route and your final destination. You’re all packed up in the car, it's bright and you're wearing sunglasses ...

It’s going to be an exhausting drive. So imagine how you feel when you reach the bottom of your road ...

Are we there yet?”
How much longer?”
When are we going to get there?”
I think we need to go back.”

Frustrating isn’t it? You’re trying to get on with your drive, but you’re being pestered for updates. And no amount of volume on your Cat Stephens CD is going to drown it out.


The manager’s tale



Where the hell is the bus? You’re at a bus stop, and it’s raining. You’ve been here what seems like ages. You check the bus stop, but there’s no timetable and no indication when the next bus is due. You try ringing the bus company, but after 15 minutes of being told your call is important to them, all the voice on the other end tell you is that buses are running today, and a bus will be with you at some point.

The minutes tick by and you’re sure you’ve been here for over an hour. You don’t know whether to give up and just get the car, or if the bus will appear in a couple of minutes. It's frustrating, and you feel an idiot no matter what you do.




These two stories are being played out in many software projects around the world, and it leads to friction. The source of all this strife? The need for balance in a project between needing to monitor progress vs just getting on with the job, and the role metrics play in all this.


Developers and testers often feel like the parent driving their kids. They want to get on and “just drive”, but they feel harassed constantly for updates. “How far are we now?” / “Two more hours” / “You said it an hour ago”. They want to concentrate on the job at hand (driving to their destination), but they feel constantly harassed to stop at every gas station and check the distance and directions to their destination. They point out to the kids that stopping to ask this information so regularly is actually slowing down their journey, which would be so much quicker if they just kept driving.


Managers feel more like the person stranded at the bus station. They know a bus is coming, but they've waited a long time already, and they want to know if it's worth continuing to wait or to make other plans. They're given some information “a bus is on it's way” but it's so vague, it's not really helping with their decision making. It could be minutes, but it could also be hours.


These are the different values and importance that both those in technical delivery and those managing that delivery can take when looking at metrics. It's an emotive case on both sides of the fence. Look at those two stories, you most likely identifed with both the parent being pestered and the man abandoned at the bus stop. In our office do we have to take sides with one viewpoint or the other, or try and make it easier for both with a little compromise?


Why metrics matter


It's important to realise that metrics are important. I've learned this myself from working closely with project management. When I'm asked for estimates for testing times on a project I might say “ooh, 1 week for best case with no major issues, 3 week for most probable case, and possibly 5 and up for worst case if we encounter huge issues”.


The project manager then has to secure a budget for testing for that. They might only be able to get enough money for 3 weeks of testing. When you come to the end of week 2, if you look likely to need more than another week to test because of issues, how will they know? If you think it's now going to take 6 weeks, your manager will need to go to “the business” to get more funding for the projected overspend (unless they have enough contingency budget squirreled away). And “the business” will want some kind of indication to back up the managers claim that it is going to take 6 weeks. This is where some metrics can be needed to argue your corner. But which ones tell the most meaningful stories?


Metrics that have value


As a tester then, you need to be able to provide metrics that are meaningful. We also need to be able to provide them relatively painlessly , because any time we spent compiling metrics is time not spent “getting the job done”.


What about hours spent on project? I know some people hate recording hours on a project. I personally think it's vital, because it helps a manager to determine spend on a project. And when I used to run my own testing company (Ultrastar Ltd) those hours on-project would become the hours I would bill for. And hence they were vital to the process of “getting paid” - suddenly this metric became important to me (funny that).


However I don't really feel hours booked do tell us “percentage completed”. It helps us work out how much budget we've used up, and that's really important to managers, but it doesn't really measure our progress. It's a bit like trying to use the fuel gauge in our car to work out how far we've travelled. Your car manufacturer might have told you that it'll do up to 300 miles on a full tank, and you know you're journey is going to take 200 miles. So when your tank is half full you must be 75% to your destination? [Erm remember mileage varies with car speed, road conditions, idling, age of car ...]


What about the number of test requirements tested and the number passed? Personally I like this metric, as it gives a good feel of how many paths and features we've tested, and I do think it's useful to keep track of this (as long as it's relatively painless). However, I often joke that it takes “90% of a testers time to test 10% of requirements”. If you use requirements tracing you'll probably know that not all your test the same number of requirements. Usually the first test (happy day) cover a whole lot of requirements in one sweep, whereas other test scripts will be as long but only test a single requirement.


In fact I've known runs of test scripts where we've had 3 busy days of testing. We test 90% of requirements day one, 9% on day two, 1% on day three. And this often seems consistent with every project I've been on since– with later tests in an iteration typically having some of the more fiddly and complicated tests in it (you make sure a build can walk before you send it on a marathon).


Measuring requirements tested is going to tell your manager how thorough your testing is. But the brutal fact is you might be 100% tested, with 98% requirements passed, but it's still not a piece of software you're happy to see in production.


Another metric I've seen is simple number of test cases run, and number passed. I'm not a huge fan of this, as this does measure the velocity of testing (although again assuming all tests of similar size), but again I don't feel it's telling us how many requirements and pieces of functionality we're checking. However it's more than likely a lot easier to track this number than the number of requirements if you're running manual test scripts which are just written up in Word (unless you're an Excel wizard).


What about measuring defects encountered for each build? Makes sense yes? As we test each build we should see less defects which means more quality. So for build 1.019 you found 4 defects, and for build 1.024 you have 28 defects – so that means quality is going backwards isn't it?


Well no – turns out that build 1.019 had 4 defects, of which 3 were so catastrophic that not really much testing got done. Build 1.024 has those all resolved, and more testing is getting done – we only have 1 high level defect now, 11 medium, 7 low and 9 cosmetic. So really things are looking much better. I like to track the number of open defects (in total all severity) as well as the number of open defects which we can't go live with (ie. high or severe severity).


As subsequent builds get better you should see the number of defects decrease, but most importantly their severity decrease.


The best thing about modern testing tools if you can get one in your department is it'll usually track all these numbers for you as you go through testing. It's like having a satnav telling your kids how many miles are left to your destination, it takes away a lot of the pain.


Regularity of metric updates …


A big obvious feature about getting the balance is the frequency you need to provide updates on numbers. Every month is obviously too infrequent (although I've known a good number of technical people who even complain about that).


On the other hand, every day can be draining for technical people, and it's too frequent, and lots of tasks span out over a few days. Although if you've entered into formal testing and maybe every day is about right.


Otherwise every week is a good time, usually on a Friday to sum up the progress of the week.


However numbers aren't enough


As you've seen, there's no magic single metric which really does “do it all”. Often there needs to be a few being juggled. Much like having a satnav which tells you there are 44 miles to go, and your current speed is 50 mph. It feels comfortable that you should be at your destination in an hour. But traffic lights, roadwords and urban areas ahead might well slow you down.


And so numbers give you some awareness of possible risk areas, but they're not the whole story. Much like there is no single right statistic for Doctors to use – they use heartbeat, blood pressure, body temperature – we need to use different readings to measure the health status of our testing.


Looking through the metrics suggested, each one can tell a different story,
  • Hours booked on project. Is it lower than expected because testers are being pulled off onto other projects? Is it higher because the progress (as slow as it may seem) is coming with testers working late and weekends? Is it even acurate? If permanent staff aren't paid overtime, they'll often only book their core hours to a project to spare it expense. And hence a manager might say “well we can meet our targets by working evenings and weekends, unaware that this is already happening”.
  • Both requirement coverage tested and test scripts executed shows us how well we're getting through our tests. Whether we've capable of executing the scripts we have in the time we have for testing. If we can't achieve 100% coverage over at least a couple of builds (even if it's not passed) then it shows we don't have enough capacity in our test team. Maybe we need to have another tester, or else look at reducing the number of tests we run, trying to simplify and merge where possible.
  • Requirements coverage and test scripts failed tell an obvious tale about the quality of the product, and a rough indication of how much more ready this build if over the previous ones.
  • Defects in build and high-level defects in build help to show us if our product is maturing, and the high level defects are disappearing, leaving us with the kinds of defects we could consider going live with.


We use metrics as part of our reporting. But our reporting should not be all about the metrics. If 10% of requirements failed in build 2.023, but only 5% failed in build 2.024, then this should mean that build 2.025 should be a candidate for release yes?


This is one of the problems with metrics, trends and graphs. We can't help trying to draw invisible lines through the numbers and see patterns that aren't there sometimes. Just cycling through the iterations doesn't make the software better build on build. Instead it's the management of the individual problems, especially the severe ones, together with any action plans to get them addressed. It's only by managing individual problems and defects that you increase quality and make the numbers “look better”.


Metrics help to identify areas for concern, but sometimes there are factors in these areas which mean the numbers can be misleading. Like having 44 miles to your destination, and doing 50mph, but you know that in a few miles it'll be urban areas and 30mph speed limits from then on … so you're going to be over an hour rather than under.


When I used to work as an automated tester on QARun, I had an assignment to create 3 scripts in 10 days for different functional areas. I had to keep daily progress number updates. After day 5 I had still not finished script 1. In fact after day 8 I still wasn't done on script 1. On day 10 all 3 of my scripts were completed.


My test manager regularly pestered me from day 5 onwards about my progress. And I kept explaining the scenario to him, but it felt like he never listened to me, only the progress numbers. You see all three scripts were essentially very similar. I was creating script 1 very carefully, and once done it required minor changes to produce the other two scripts.


Yes the numbers showed that my work was at potential risk, but my constant explanation of the nature of the assignment should have mitigated that concern and risk. [My view] Or should I have just fudged the numbers for progress on scripts 2 & 3 as I was working on them [My managers view]


At the end of the day though, numbers are only indicators. To please both the children in our car and the man waiting for the bus, we could tell them what they want to hear, and say “just 5 minutes away” to calm them down. But 10 minutes later we'll have serious issues. Something can be 99% done, but the 1% can be a big issue, whereas another project can comfortably go live with 80% done, because the 20% missing can be lived without.


Sometimes our metric indicators can cause us to stress about things which are in hand. Sometimes we they can make us feel comfortable right before a fall. Metrics can be great, but they only have meaning in context.

4 comments:

  1. I like the metaphor of waiting for the bus. And it can get into the Vietnam syndrome - we have invested so much in this effort, even though it looks like we will lose, we hate to pull out now.

    I have to confess my eyes glaze over when I try to read this much about metrics. Personally I'm for getting all our regression tests automated in a continuous build process and let that keep most of our metrics - number of tests (they should ALL pass), coverage and so on.

    My team likes to first set a goal, then figure out what metrics will help us track our progress towards that goal. If we want to get to zero defects released to production, maybe we first set a goal of no more than six high defects in production in the next six months, and check our actual results in our DTS. If we meet that goal, we can try a more challenging goal like 3 defects. Etc.

    ReplyDelete
    Replies
    1. That is useful. And yes, my eyes do glaze over myself.

      When I used to work in neural networks, the networks could learn bad patterns, seeing things in the data which aren't really there. We called it "training on the data noise", and I've seen some people look at graphs and extrapolate lines/trends which aren't really there.

      But overall we need to try and build a case for when we think we're likely to finish. We're engineers after all, not fortune tellers! ;-)

      Delete
  2. Tracking these should be automated, so test leads do not have to waste half a day or more, just collecting data on Thursday and prepare it for presentation on Friday.
    DB based ALM/Test Management tools can assist in collecting the information and presenting it - When each tester reports to same repository, and all data is visible to all managers - that is much more useful.

    Another point, is that not all Requirements and Tests have the same Weight - marking the weight may improve metrics resolution.

    Kobi @halperinko

    ReplyDelete
  3. Want To Increase Your ClickBank Banner Commissions And Traffic?

    Bannerizer made it easy for you to promote ClickBank products using banners, simply go to Bannerizer, and grab the banner codes for your chosen ClickBank products or use the Universal ClickBank Banner Rotator to promote all of the available ClickBank products.

    ReplyDelete