TestSheepNZ: Management Relations

Showing posts with label Management Relations. Show all posts

Sunday, September 30, 2012

Ethics 3 - Ethics for testers at KWST2

Looking back at my two stories before going on to discuss KWST2 (Kiwi Workshop on Software Testing), it's interesting to see my different take on two similar tales ...

In Wernher, I'm overall sympathetic to Wernher von Braun as a man in a difficult place under the Nazis, just trying to get through the war. In The Man On The Cutting Room Floor though I feel there's an element of relish and karma in the fact that Nikolai Sevnik suffers the same fate as those who he's helped to paint out of history.

There's probably something desperately unfair about that – in a way there are many parallels between the two stories, with both Wernher and Nikolai,

living under difficult and brutal regimes
both dreaming of something better – Wernher through rocket exploration, Nikolai through art
both men though knowing of the atrocities of the regime they live in, they both do work that is complicit in supporting it

The very act of reading through, much like reading Winston Smiths tale in Nineteen Eighty-Four, we realise how difficult their choices are and empathise to some extent with them. Even though we know how these people act and their choices are not ones we'd readily make in the society we live in.

That is the benefit of “experience reports” or “war stories” which a conference like KWST2 bring out. Much like my two stories, it allows you to walk in the difficult shoes of another person. This is why the choice of “ethical choices in the workplace” turned out to be such a great one.

Some of the stories we heard during our sessions were about

“A bad tester and a bad role model”

If someone is visibly seen to be associated with testing, but you feel their ideas and values they put out are to the detriment of the testing community – how do you challenge that? Do you seek to confront them, to educate them or do you try and distance yourself?

This initially seemed a little judgemental. But when you introduce yourself to a new team member as “hi I'm a tester” and they roll their eyes and go “oh not one of those people” then your profession and by association yourself have been affected and corrupted by the expectation and press these “bad testers” can cause.

“My last tester gave me the metrics I wanted”

Following on from that was a tale about how these “bad testers” could cause expectations which could be detrimental to people's expectations. One of the most contested metrics in use is “percentage tests completed”. The ISTQB and several “experts” like the story above referred to will tell all and sundry that this is a good metric to monitor testing and regularly report to management.

Most good testers will know a problem from the off. You can refer to “percentage test cases passed”, but the tester themselves know that some of those tests are 5 minutes long, and some of those tests are 3 hours long. It's like following the advice of “eat 5 pieces of fruit a day and stay healthy” so you eat 5 currents, whilst your friend eats 5 apples.

For a number to be a metric, it must be just that, a measurement. That this scenario … I have 10 tests to run,

4 should take 15 minutes to run if there are no issues
2 should take 30 minutes to run if there are no issues
3 should take an hour
1 should take 3 hours

At the end of day one, I report I am 50% through. So how long will it take me to run the other 50%? Will I be done by tomorrow?

This is the problem with metrics, and this metric in particular. If can be manipulated to tell almost any story. The peer conference was not against reporting progress to management, on the contrary there was a feeling of obligation to tell project management on what testing was achieving and the milestones it was reaching. But more through collected defect reports and progress on areas of functionality tested, not through vague and potentially misleading numbers.

“Who steers the test community?”

So there is a group of testers with certain practices, and maybe certificates to prove they belong together. And over in a different area there are some more testers who have a different set of practices. So who are the better testers with the better practices with the right to call themselves “the test community”?

What became apparent through conversation was that no-one testing thought leader has the right, no matter what their credentials to declare themselves “The Test King” and demand the community bow down to them. Instead by building reputation, groups of testers should be able to cluster together, exchange ideas (much as was done over KWST2) and reach consensus. And that way the community would move forward as a group and not as a cult.

Senior testers and leaders needed to understand their ideas would have to be challenged – this was a form of empirical testing to work out the value of the ideas. This could feel a brutal system at times (I can vouch for that personally), however this is how community advanced. There was a great quote that we're all on an airliner, and we have a habit of deferring to the Captains authority (our thought leaders) at all times, but the truth is most of us have a pilots license ourselves, and we won't learn from the Captain unless we occasionally question their actions. What's more those Captains need us to question their actions to stop them from doing something really dumb when they're not paying attention – because even Captains are fallible.

In science, theories are often put forward, often causing uproar as they challenged what was seen as accepted theory and truth of the time. Both Galileo and Darwin put off releasing their ideas of 'the Earth moving around the Sun' and of evolution, because of fear of ridicule and persecution. But overall we benefited from their perseverance.

Community cannot always move forward together. In an ideal world the whole world of testers would move forward together as a mass movement. But the nature of free will, and the way we see different values in different aspects of testing, there will always be some element of fraction over ideas. And sadly sometimes those fractions cannot be band-aided. However we can't stay stuck in the past thinking the Sun moves round the Earth. Some ideas are so powerful they have to be explored – nothing should be seen as untouchable or sacred in testing.

“I'm watching you … always watching you”

When a co-worked does something which you feel is wrong, what action should you take? Do you just straight go up and report them to management? Do you give them the benefit of the doubt and just keep an eye on them? Do you try and discuss directly with them?

On the topic of just reporting people– we started by talking about 1984 regimes of monitoring. I lived for a while in East Germany as a student, and the stories of the Stasi secret police keeping notes and tabs on people was deeply unpleasant. Everyone distrusted everyone, and it created a climate of fear and distrust. And so I find the idea of a project full of people reporting each other to management for the slightest infraction as really the kind of place I would not want to work.

I'd like to think I'd always want to get involved and coach the individual back into line as best I could, in fact I've done that myself several times. Often the individual had been upset or just not thought something through, and what they'd done had a minimal and fairly unseen impact. When it comes to reporting someone to a manager it just feels to me like I've failed (but sometimes it's a resort you have to sadly take). But as the stories of Wernher and Nikolai have shown, the difficulty with ethics it's sometimes all too easy to say “well I wouldn't do that” and pass judgement on another's actions. It's all too easy to be judgemental, yet your co-worker may have laboured in a difficult ethical battle of their own before taking that action. This is the importance of “walking in another's shoes” first, and in my opinion engaging with that person wherever possible before escalation.

Yet in saying that, I've also known co-workers who were caught using their work account for the most horrific acts (two members of a past company were discovered using their work machines to distribute child pornography). There are some things which are inexcusable and criminal, in such cases had I known I'd have had to report them, as they'd crossed a line that trying to talk and engage with them would not help.

“Just do what you're told”

Variations on this theme came around again and again in experience reports. The scenarios involved a non-tester (often management) giving explicit instructions to a test team on what they should and shouldn't do. So in a powerful example we explored, a project manager tells the test team, “look we're going to ship this product in two weeks as is. We want you to continue testing – but any more defects, we don't want to hear about them”.

What do you do? The point of testing is to confirm behaviour, and where it deviates from expectations, report bugs. If you can't report a bug, what's the point of testing?

If you follow to the letter what you've been told, then what you're doing is acting unprofessionally – what James Bach calls “malicious compliance”, letting something fail because “I was only obeying orders”.

But what do you do? This is where the theme of “who does testing serve?” came about – do we serve the project management or do we champion customers and end users? In actual fact we're in many ways answerable to both. This was why many felt that the test team needed in some respects to be slightly independent of the rest of the project so it can make it's own calls on such doctrine. It's a good point, although the power of having the testing team under the project umbrella is the feeling of “we're all in this together” and there's much more sharing with testers than the suspicion testing's just coming in to audit software.

Strategies for dealing with this request involved a level of disobedience which attempted to honour the intent of the request, but also continue to work in a professional manner. And so bugs encountered would still be noted, but use of the company official bug tracking tool would be avoided.

If a defect was a high impact one, someone would try and talk to the manager and developers with “look, I know we're not supposed to find any more bugs … but we found a big one”. It might be that the project manager only said that comment in a moment of frustration about not reporting any more bugs – we've all heard variations of “this project would be fine if only testers didn't find any more bugs” as if it's the testers who put the bugs in there (testers don't break code, it arrives to us already broken to quote Jerry Weinberg) – and in truth this project manager despite saying “don't tell us about more bugs” would actually be pretty peeved if people followed his instructions and failed to tell him something critical about his product.

To sum up …

An amazing couple of days. I originally felt the format too confrontational on day one, but by day two where consensus was coming about, it felt like a course of therapy where we'd made real progress and got to learn more about each other, and find a lot of common ground. Just as importantly I felt the two days had taken some peers, and forged some important friendships from our shared experiences.

It's with great pleasure I've heard that several people behind the event are launching a regular WeTest meetup in Wellington, which I'm already looking forward to …

http://www.meetup.com/WeTest-Workshops/

A couple of snaps I've stolen from David Greenlees (who had the honour of being sung a variation of Greensleeves to by James Bach during the event) ...

Sunday, February 19, 2012

Are we there yet? - The metrics of destination

Consider these stories …

The developers tale

You're taking your children on a long distance journey. You know your route and your final destination. You’re all packed up in the car, it's bright and you're wearing sunglasses ...

It’s going to be an exhausting drive. So imagine how you feel when you reach the bottom of your road ...

“Are we there yet?”

“How much longer?”

“When are we going to get there?”

“I think we need to go back.”

Frustrating isn’t it? You’re trying to get on with your drive, but you’re being pestered for updates. And no amount of volume on your Cat Stephens CD is going to drown it out.

The manager’s tale

Where the hell is the bus? You’re at a bus stop, and it’s raining. You’ve been here what seems like ages. You check the bus stop, but there’s no timetable and no indication when the next bus is due. You try ringing the bus company, but after 15 minutes of being told your call is important to them, all the voice on the other end tell you is that buses are running today, and a bus will be with you at some point.

The minutes tick by and you’re sure you’ve been here for over an hour. You don’t know whether to give up and just get the car, or if the bus will appear in a couple of minutes. It's frustrating, and you feel an idiot no matter what you do.

These two stories are being played out in many software projects around the world, and it leads to friction. The source of all this strife? The need for balance in a project between needing to monitor progress vs just getting on with the job, and the role metrics play in all this.

Developers and testers often feel like the parent driving their kids. They want to get on and “just drive”, but they feel harassed constantly for updates. “How far are we now?” / “Two more hours” / “You said it an hour ago”. They want to concentrate on the job at hand (driving to their destination), but they feel constantly harassed to stop at every gas station and check the distance and directions to their destination. They point out to the kids that stopping to ask this information so regularly is actually slowing down their journey, which would be so much quicker if they just kept driving.

Managers feel more like the person stranded at the bus station. They know a bus is coming, but they've waited a long time already, and they want to know if it's worth continuing to wait or to make other plans. They're given some information “a bus is on it's way” but it's so vague, it's not really helping with their decision making. It could be minutes, but it could also be hours.

These are the different values and importance that both those in technical delivery and those managing that delivery can take when looking at metrics. It's an emotive case on both sides of the fence. Look at those two stories, you most likely identifed with both the parent being pestered and the man abandoned at the bus stop. In our office do we have to take sides with one viewpoint or the other, or try and make it easier for both with a little compromise?

Why metrics matter

It's important to realise that metrics are important. I've learned this myself from working closely with project management. When I'm asked for estimates for testing times on a project I might say “ooh, 1 week for best case with no major issues, 3 week for most probable case, and possibly 5 and up for worst case if we encounter huge issues”.

The project manager then has to secure a budget for testing for that. They might only be able to get enough money for 3 weeks of testing. When you come to the end of week 2, if you look likely to need more than another week to test because of issues, how will they know? If you think it's now going to take 6 weeks, your manager will need to go to “the business” to get more funding for the projected overspend (unless they have enough contingency budget squirreled away). And “the business” will want some kind of indication to back up the managers claim that it is going to take 6 weeks. This is where some metrics can be needed to argue your corner. But which ones tell the most meaningful stories?

Metrics that have value

As a tester then, you need to be able to provide metrics that are meaningful. We also need to be able to provide them relatively painlessly , because any time we spent compiling metrics is time not spent “getting the job done”.

What about hours spent on project? I know some people hate recording hours on a project. I personally think it's vital, because it helps a manager to determine spend on a project. And when I used to run my own testing company (Ultrastar Ltd) those hours on-project would become the hours I would bill for. And hence they were vital to the process of “getting paid” - suddenly this metric became important to me (funny that).

However I don't really feel hours booked do tell us “percentage completed”. It helps us work out how much budget we've used up, and that's really important to managers, but it doesn't really measure our progress. It's a bit like trying to use the fuel gauge in our car to work out how far we've travelled. Your car manufacturer might have told you that it'll do up to 300 miles on a full tank, and you know you're journey is going to take 200 miles. So when your tank is half full you must be 75% to your destination? [Erm remember mileage varies with car speed, road conditions, idling, age of car ...]

What about the number of test requirements tested and the number passed? Personally I like this metric, as it gives a good feel of how many paths and features we've tested, and I do think it's useful to keep track of this (as long as it's relatively painless). However, I often joke that it takes “90% of a testers time to test 10% of requirements”. If you use requirements tracing you'll probably know that not all your test the same number of requirements. Usually the first test (happy day) cover a whole lot of requirements in one sweep, whereas other test scripts will be as long but only test a single requirement.

In fact I've known runs of test scripts where we've had 3 busy days of testing. We test 90% of requirements day one, 9% on day two, 1% on day three. And this often seems consistent with every project I've been on since– with later tests in an iteration typically having some of the more fiddly and complicated tests in it (you make sure a build can walk before you send it on a marathon).

Measuring requirements tested is going to tell your manager how thorough your testing is. But the brutal fact is you might be 100% tested, with 98% requirements passed, but it's still not a piece of software you're happy to see in production.

Another metric I've seen is simple number of test cases run, and number passed. I'm not a huge fan of this, as this does measure the velocity of testing (although again assuming all tests of similar size), but again I don't feel it's telling us how many requirements and pieces of functionality we're checking. However it's more than likely a lot easier to track this number than the number of requirements if you're running manual test scripts which are just written up in Word (unless you're an Excel wizard).

What about measuring defects encountered for each build? Makes sense yes? As we test each build we should see less defects which means more quality. So for build 1.019 you found 4 defects, and for build 1.024 you have 28 defects – so that means quality is going backwards isn't it?

Well no – turns out that build 1.019 had 4 defects, of which 3 were so catastrophic that not really much testing got done. Build 1.024 has those all resolved, and more testing is getting done – we only have 1 high level defect now, 11 medium, 7 low and 9 cosmetic. So really things are looking much better. I like to track the number of open defects (in total all severity) as well as the number of open defects which we can't go live with (ie. high or severe severity).

As subsequent builds get better you should see the number of defects decrease, but most importantly their severity decrease.

The best thing about modern testing tools if you can get one in your department is it'll usually track all these numbers for you as you go through testing. It's like having a satnav telling your kids how many miles are left to your destination, it takes away a lot of the pain.

Regularity of metric updates …

A big obvious feature about getting the balance is the frequency you need to provide updates on numbers. Every month is obviously too infrequent (although I've known a good number of technical people who even complain about that).

On the other hand, every day can be draining for technical people, and it's too frequent, and lots of tasks span out over a few days. Although if you've entered into formal testing and maybe every day is about right.

Otherwise every week is a good time, usually on a Friday to sum up the progress of the week.

However numbers aren't enough

As you've seen, there's no magic single metric which really does “do it all”. Often there needs to be a few being juggled. Much like having a satnav which tells you there are 44 miles to go, and your current speed is 50 mph. It feels comfortable that you should be at your destination in an hour. But traffic lights, roadwords and urban areas ahead might well slow you down.

And so numbers give you some awareness of possible risk areas, but they're not the whole story. Much like there is no single right statistic for Doctors to use – they use heartbeat, blood pressure, body temperature – we need to use different readings to measure the health status of our testing.

Looking through the metrics suggested, each one can tell a different story,

Hours booked on project. Is it lower than expected because testers are being pulled off onto other projects? Is it higher because the progress (as slow as it may seem) is coming with testers working late and weekends? Is it even acurate? If permanent staff aren't paid overtime, they'll often only book their core hours to a project to spare it expense. And hence a manager might say “well we can meet our targets by working evenings and weekends, unaware that this is already happening”.
Both requirement coverage tested and test scripts executed shows us how well we're getting through our tests. Whether we've capable of executing the scripts we have in the time we have for testing. If we can't achieve 100% coverage over at least a couple of builds (even if it's not passed) then it shows we don't have enough capacity in our test team. Maybe we need to have another tester, or else look at reducing the number of tests we run, trying to simplify and merge where possible.
Requirements coverage and test scripts failed tell an obvious tale about the quality of the product, and a rough indication of how much more ready this build if over the previous ones.
Defects in build and high-level defects in build help to show us if our product is maturing, and the high level defects are disappearing, leaving us with the kinds of defects we could consider going live with.

We use metrics as part of our reporting. But our reporting should not be all about the metrics. If 10% of requirements failed in build 2.023, but only 5% failed in build 2.024, then this should mean that build 2.025 should be a candidate for release yes?

This is one of the problems with metrics, trends and graphs. We can't help trying to draw invisible lines through the numbers and see patterns that aren't there sometimes. Just cycling through the iterations doesn't make the software better build on build. Instead it's the management of the individual problems, especially the severe ones, together with any action plans to get them addressed. It's only by managing individual problems and defects that you increase quality and make the numbers “look better”.

Metrics help to identify areas for concern, but sometimes there are factors in these areas which mean the numbers can be misleading. Like having 44 miles to your destination, and doing 50mph, but you know that in a few miles it'll be urban areas and 30mph speed limits from then on … so you're going to be over an hour rather than under.

When I used to work as an automated tester on QARun, I had an assignment to create 3 scripts in 10 days for different functional areas. I had to keep daily progress number updates. After day 5 I had still not finished script 1. In fact after day 8 I still wasn't done on script 1. On day 10 all 3 of my scripts were completed.

My test manager regularly pestered me from day 5 onwards about my progress. And I kept explaining the scenario to him, but it felt like he never listened to me, only the progress numbers. You see all three scripts were essentially very similar. I was creating script 1 very carefully, and once done it required minor changes to produce the other two scripts.

Yes the numbers showed that my work was at potential risk, but my constant explanation of the nature of the assignment should have mitigated that concern and risk. [My view] Or should I have just fudged the numbers for progress on scripts 2 & 3 as I was working on them [My managers view]

At the end of the day though, numbers are only indicators. To please both the children in our car and the man waiting for the bus, we could tell them what they want to hear, and say “just 5 minutes away” to calm them down. But 10 minutes later we'll have serious issues. Something can be 99% done, but the 1% can be a big issue, whereas another project can comfortably go live with 80% done, because the 20% missing can be lived without.

Sometimes our metric indicators can cause us to stress about things which are in hand. Sometimes we they can make us feel comfortable right before a fall. Metrics can be great, but they only have meaning in context.

Tuesday, October 4, 2011

Spinning Plates – The Test Manager's Stage Show

I've been a test manager about 6 months now all told ... As a senior tester I used to scoff at what my old test manager got up to – but now I know!

If someone asked me what it's most like, to me it would be spinning plates …

We know the stage show. Someone sets about 6 plates spinning, and keeps rushing between them to give them an extra bit of momentum to keep spinning and not fall off.

And that's pretty much what I do – our company has a whole host of projects in the pipeline. I look at the future load for the next few months, and get involved in early meetings about them, review business requirements (if they exist), write the original master test plan, work out how much effort it should require to test, and try and organise test resources so we've got someone to do the actual testing. And maybe a bit of sleight of hand to keep two projects from getting to testing at the same time ...

It means getting involved in a lot of projects. Our department is part of customer delivery, and a lot of projects as you can imagine come through us for testing. I always need to have a trick up your sleeve in case something comes late so we're not busting the budget. Have a rabbit in the hat, just in case I need extra resource because time's running out.

As a rough rule of thumb (and don't tell my project managers) I always plan for myself to do pure test management … so when things get tight, I can go “all hands on the pump” and magic almost an entire tester out of thin air for 30(ish) hours a week. Of course that can only be a short term band-aid, and on some projects that's not enough.

But all of the above can become tiring!

In Agile it's said you become much less effective if you're always task switching during the day. Something like 20% they estimate.

Yesterday I kept track, and I worked on 5 projects during the period of the day. Ouch!

Although joking about it on Twitter this morning I am starting to show some signs of fatigue. Getting bits mixed up between projects, when someone asks me a question there's so much I am working on, it takes me a while to straighten it all out mentally.

At the beginning of the year after being “on the bench” (not working on site and doing desk based training) – I was sharp, eager. Now come October, with an unforgiving workload, we're just getting by week to week. All the promise of trying to improve processes in March has been whittled down to “just get it out the door” - and not by management, but by me. The delays that are part of any testers lament have forced us into a level of technical testing debt, and we're having trouble getting things out to time because we're so close to delivery dates.

And I know I've talked about it before in this blog – but no-one wants to be the one to tell the business owner their delivery dates are unachievable, especially when everyone else is saying there's no problem.

I know people should have the courage to, but everyone quite rightly wants to give it their best shot at achieving it first.

So right now, I'm realising we're in a kind of testing deathmarch. There are things we need still to get out this year. But we have a freeze from late November onwards – I'm hoping we'll be able to catch up on ourselves a bit then, and hopefully set up 2012 for a bit of an easier year.

Otherwise I'm sure we're all headed somewhere in a very large basket ...