Thursday, September 4, 2014

Metrics and the science of measurement ...

"Physics is the science of measuring things".  These words as uttered by Mr Botham on an autumn day of my first week at Abbot Beyne Secondary School changed my life.  From that moment on I was both fascinated and hooked on the subject.

I had great teachers for physics at "The Beyne" as we called it.  Greatest of all was the friendship I developed with John Sneyd who taught me for 5 years, and put up with my many questions.  Physics held a fascination for me, and one element never left me, the fascination in measuring and observing.  It's probably the reason that testing eventually ended up feeling "the right profession" for me.

It's not a surprise then that I continued in the subject at The University Of Sheffield.  Looking back though, there were subtle things we learned year on year.  To be a good scientist you needed to know how to arrange a good experiment (preferably repeatble tests), use the appropriate equipment to measure but most of all, to understand your data.

As years went on, "understanding your data" evolved.  If any of my old log books remained, you'd notice that I always recorded answers to ridiculous levels of decimal points, "to be careful".  But any tool you measure is only as good as it's accuracy and suitability for the job at hand.  I may hand you a ruler, and get you to acknowledge it measures distance, and then go "good, so measure me out accurately 1 km".  Likewise, you'd be equally baffled if I gave it to you to measure the size of a grain of sand.

With a ruler, the best you can measure with is probably 15cm, and to an accuracy of about 0.05cm (if you've got a keen eye and steady hand).  If you're measuring a length with it can come out with a value like 16.07cm, I have to figure you're measuring less accurately than you think.

At University, this hands-on and understanding of data was crucial.  One exercise I remember was at Astronomy Lab.  I was really excited to do our first Astro Lab, thinking it's be telescopes and stargazing.  It wasn't.  We were given about 500 data points to draw in a graph.

It was boring.  And I mean really boring.  Little did I know, but we were charting our own Hertzsprung-Russell diagram (a key graph in Astronomy, as it identifies classifications of stars).

It took a few labs to put the graph together.  And I'm afraid to say at week two, I just flipped and asked our tutor (Prof David Hughes, pictured) why, "this being the far flung future of 1989" we weren't just putting all this data into a computer and have it process it for us.  I no doubt seemed impatient and dumb to my classmates, but I'm glad I did, because his answer has stuck with me to this day.  After 25 years, you'll forgive me if I don't get the words quite right, but it went a little like this ...

"You can feed a computer a string of numbers, and it can add them and divide them, and multiply them faster and more accurately than any human will.  It will even draw a mean graph line through them.  But it will never go "uh-oh, that piece of data looks out of place".  Only human beings can look at, and if need be, ignore data that could be erroneous.  If you just feed numbers into a computer without an intrinsic understanding of the data and the measurements you're using, you're essentially cutting out human judgement and intuition.  You're not here to learn how to enter numbers into a machine, but how to see those patterns for yourself, and trust to your own judgement over that of a computer."

It's actually a good point, when you're measuring, recording and handling the data yourself you get a feel for it, and intuition if you like.  If you try and measure out 1 km using a 15 cm ruler, something in your brain goes "this isn't quite right".  Give that task to a machine, and it will never figure that out, it will just carry out the task.

That intuition in good scientists develops as an understanding of error.  This idea says that between an actual value of something and the value I measure, there is always going to be a discrepancy because of the tool I use to measure.  If I go back to the simple problem of measuring a short line with my ruler - the actual line might truly be 5.648cm long, but the best I can measure to is 0.05 cm (and then only with a steady hand and keen eye).  Most of the time that's accurate enough.

If I think of the "measure out 1 km using a 15 cm ruler" scenario, you can lay the ruler out, end on end and measure out 6666.7 ruler lengths.  But you have to ensure each length is perfectly in alignment with the previous one, and starts exactly when the previous one finished.  In reality, it's going to be a small amount off, which multiplied by 6666.7 times, is going to magnify that error.

And what if I use it to measure a grain of sand?  The best I can measure with a ruler is to an accuracy of about 0.05cm, but the grain doesn't show very well against the ruler (sand from my local beach is finer than that shown).  And am I measuring length wise or width wise?  In this case my ruler, although a method of measuring distance, is once again completely inaccurate.

This of course links back to my counting fruit challenge a few weeks ago.  I asked you to imagine items of fruit in a basket as if there were test cases, some were big melons, and some small grapes.  How could just counting them help you really know how you were progressing through your bowl of fruit?  If you said that this morning there were 8 items of fruit, and at the end of the day there were 6 left, I could tell you with some accuracy that you have less fruit left that this morning.  What you can't say with accuracy is that in 3 days you'll have finished your last piece of fruit.

The problem if of course there are a good deal of test management tools out there who collect all forms of statistics on your project, and that's exactly what they DO say.  Often in pretty graphical form, and (what can kind of annoy me) to several decimal points.  The decimal points can really get my goat, because by stating it to such a level, it's implying a level of accuracy that as we've said, just isn't there.

To me with an understanding of the science of measurement and error, it feels like Prof Hughes comments come to bite us in the ass, and hard.

Of course if you're a tester, you may well be contractually obliged to use these tools, and even provide this data on a regular basis.  What I would encourage you to do is to think and learn about the error in what your tool is measuring and quoting to a high level of detail, and advise caution.  Like Prof Hughes said, that computer tool is taking those numbers, counting daily, dividing, but with totally no understanding of the data it's using.  It's model expects a single line script to run the same as a one with dozens of items.  It won't expect a "test the one hour automatic logout" to take any longer than "log in with user name and password.  You are now logged in".

If you have to use test case counts, I do encourage you to group together similar scenarios and create a form of dashboard.  A total test count of 94% of passes might be encouraging for someone to at least consider to release into production (obviously depending on the defects), however there might be a clustering of tests which have failed or simply not been run, look below ...

Well that's worrying - why have no login scenarios been passed?  Turns out under inquiry that the team is using some form of stub to enable login, because the login page is still under development.  Do you even have a product without a login page?

A dashboard at least enables people to ask why you've focused on one certain scenario, and not on another.  It allows people to look at what you're focusing on.  The "total test case percentage" doesn't encourage that kind of engagement.  Whenever possible I use a dashboard to talk with management and business owners.  As ever, James Bach has a useful set of resources and slides on them here.

But as I've mentioned, I do encourage you to think about the data you use in your reporting, and how accurate it really is.  We're often under duress to use it as "it may be inaccurate, but it's the best data we have", but I do encourage you to understand the error in those numbers, and use methods like the dashboards mentioned to give something more meaningful.

I'll just leave this here, because it's doing the rounds in my head ...

No comments:

Post a Comment