The expected result was 42. Now what was the test?: July 2010

Wednesday 28 July 2010

Reponse to How many test cases by James Christie

James Christie wrote a great blog about his concern on using test cases to measure testing

http://clarotesting.wordpress.com/2010/07/21/but-how-many-test-cases/

and several people blogged a response

Simon Morley added his view here: http://testers-headache.blogspot.com/2010/07/test-case-counting-reflections.html

Jeroen Rosink added his here: http://testconsultant.blogspot.com/2010/07/repsonse-on-how-many-test-cases-by.html

and Abe Heward directed people to a similar blog he had wrote earlier: http://www.abeheward.com/?p=1

Each of these blogs makes very valid points about how non-useful measuring test cases are for indicating the progress or coverage of testing effort.

The aim of this blog is to try and expand upon these posts and see if there are ways in which we could measure testing effort and progress within resorting to using numbers.

To start with we shall take a look at a made up situation.

You are testing on a project which is has a two week testing cycle, your manager has requested that you need to report each day the following:

How many test cases you have
How many have been run
How many have passed
How many have failed.

(Does this seem familiar to anyone?)

So before you start testing you report to your manager that you have 100 test cases to run over the two week cycle

At the end of day one you report the following

Test cases ran: 60
Test cases pass: 59
Test cases fail: 1
Defects raised: 1
Test cases still to run: 40

So management think cool we are ahead with testing, 60% done in one day.

At the end of day 2 you report:

Test cases ran: 62
Test cases pass: 61
Test cases fail: 1
Defects raised: 1
Test cases still to run: 138

Management now thinks how come you only ran two test cases today, why are you running slowly? WHAT!!!! Where did those other 100 test cases come from? Did you not do your job correctly to begin with?

However the two you ran today had lots of dependencies and very complex scripts.

Plus your testers noticed that there appeared to be new features that had not been documented or reported, you have now had to add another 100 test cases. Also your testers actually think when they are testing and thought of new edge cases and ways to test the product whilst they were testing.

Management starts to panic – you reported on day one that 60% of testing had been completed. Now you are saying only 30% of the testing has been completed, Stakeholders are not going to happy when we report that we have only covered 30% when the day before I reported to them that 60% had been completed.

This continues, your testing team are really good testers and find more and test ideas which are turned into test cases. So at the end of day seven you report the following:

Test cases ran: 1200
Test cases pass: 1109
Test cases fail: 91
Defects raised: 99
Test cases still to run: 10000

So at the end of the first week you have only completed 8% of all the test cases. You get fired for incompetence and the project never gets released.

Many people reading this may have experienced something similar to the above, what worries me that there are still people stating the best way to measuring testing is by the use of test cases!

The question now is that if measuring by the use of test cases is not a good way to measure then what can we do?

The following suggestions are my own and what I apply within my test approach, it does not mean it will work for everyone nor am I saying it is the best approach to take. However the purpose of my blog is to offer suggestions about testing that could be useful to some people.

I work in the following testing environment:

Agile based – 2 week iterations
Customer changing requirements frequently
Code delivered daily
Functions and features added without supporting documentation
Use a mixture of scripted and exploratory testing

If I tried to report the testing effort using the traditional test case scenario it would be of little (or zero) value, since the test case number would be constantly changing.

What we do is split functions, features etc into test charters, as per the exploratory testing approach, these ‘Test Charters’ are known as the test focus areas of the software. If a new function or feature is discovered a new charter is created.

We then use the Session Based Test Management approach (James and Jon Bach - http://www.satisfice.com/sbtm/) and implement sessions based upon mission statements and test ideas. During the testing session the testers are encouraged to come up with new test ideas or new areas to test, these are captured either during the session or during debrief.

The reporting of progress is done at the test charter (test focus area) level. The test manager reports in the following way.

Test focus area 1: -Testing has started – there are a few issues in this area:

Description of Issue x, issue y, issue z.
Which need to be resolved before there is conference that is area is fit for its purpose.

Test focus area 2 – has been tested and is fit for it purpose

Test focus area 3 – test has started and some serious failures have been found defect 1, defect 2, defect 3

And so on.

Some people may ask but how will this tell us if we meet the deadline for testing? I am sure it will NOT tell you if you will finish ALL of your testing before the deadline since testing is an infinite thing, we as testers will carry on testing until we meet a stop heuristic (See Michael Bolton article on stopping heuristics: http://www.developsense.com/blog/2009/09/when-do-we-stop-test/).

The problem with testing is that it is not a yes or no when it comes to the question of have you completed your testing yet. Every time a skilled tester looks at the software they can come up with more and more test areas and test ideas that they could carry out. These may or may not add benefit to the suitability of the software and if it is fit for its purpose. What is required is a test manager that talks to and listens to their test team and see which test areas are the most important and MANAGE test sessions based upon what is critical – basically do some good old prioritizing. The test manger needs to ask the difficult questions of the stakeholder and project managers.

What features can you do without?
What are the critical areas that are required?
Function abc has many serious problems – it can cause problems x,y,z for your users. Do you need function abc?
We have tested all the key functions and found the following problems x,y,z. You want to release tomorrow, are you OK with these known issues?

In return the stakeholders and project managers must trust the test team and accept that when they report that an area has been ‘sufficiently’ tested they believe them.

To summarize – instead of reporting on a small area of testing such as test cases, move a couple of level ups and report on the progress for test areas/functions./features based upon the importance of the feature. This may not tell you if you will compete the testing before the deadline but it will show you how well the testing is progressing in each functional area at a level that stakeholders can relate to and understand. The trust your stakeholders will have in you should improve since you are giving them a story about the progress of the testing effort without trying to hide things using numbers.

Tuesday 27 July 2010

DANGER - Confirmation Bias

In my previous blog I touched upon a term called Confirmation Bias and how as testers we should be aware of this. I stated that I would put a blog together on the subject so here it is.

I should start by defining what confirmation bias is.

Confirmation bias refers to a type of selective thinking whereby one tends to notice and to look for what confirms one's beliefs, and to ignore, not look for, or undervalue the relevance of what contradicts one's beliefs:- http://www.skepdic.com/confirmbias.html

The reason I started to look more into confirmation bias was due to the following article in Ars Technica - http://arstechnica.com/science/news/2010/07/confirmation-bias-how-to-avoid-it.ars

A good example of this is if you are thinking of buying a new car and all of a sudden you seem to notice lots and lots of the model of the car you was thinking of purchasing. You mind is conditioning itself to notice this make and model of car and making you notice them more, even if there are no more than there was before – you appear to be seeing them everywhere.

Another example is if you start talking to a friend about a certain film and actor and then suddenly notice lots of coincidences, the actor is on a advert, the film is being shown again on TV, a support actor is in another film you just started to watch. The following gives a good example of this. http://youarenotsosmart.com/2010/06/23/confirmation-bias/

If there was no such thing as confirmation bias there would be no conspiracy theories. Conspiracy theories are based upon information which proves the theory correct; those who believe in the theory ignore the evidence that debunks that theory.

So why is there any concern for testers?

Let us start with an example.

You are working closely with the development team and you start to ask them questions about the release you are about to test. You ask their viewpoint on which areas they feel are the most risky and which they feel are the most – so you can adjust your priorities as required, a pretty standard exchange between developers and testers. You now start testing beginning with the area of high risk and work your way to the low risk areas.

You find a few serious bugs in the high risk areas (as expected) and you find no problems in the low risk areas.

After release a major bug is reported in the low risk area you tested. How did you miss the bug? Did you see the bug but your thinking was that everything was working alright? Did confirmation bias play a part? Did your subconscious hide the bug from you? Now this gets very scary, most people who work in software testing know that some bugs try to hide from you, we expect them to hide in the software. What happens if they decide to hide in your brain?

So how can we try and prevent confirmation bias?

The quick and easy way to try and prevent confirmation bias is to ensure that more than one tester tests the same feature, they may bring in their own confirmation bias but hopefully it will be different from the previous testers bias. There is more chance that it will be different if the testers have not discussed the area under test beforehand.

Another way to try and prevent confirmation bias is to do ‘paired testing’ either with a software engineer, another tester or a user. That way you can question each other with regards to what is true and what is false. There is a chance that you could cross contaminate each other with your own confirmation bias, but the risk should be less than if your are working on your own.

It is not easy to remove confirmation bias since it is infectious. The way of working on a software development project requires testers to communicate more and more with other areas of the business and at each stage and with each conversation confirmation bias could be introduced.

So should we lock ourselves away in a dark room with no communication with anyone else on the team? I think I would get out of testing as a career if that happened, the Social Tester (@Rob_Lambert) would now be the anti-social tester, time to get him a ASBO (For our non-UK readers - http://en.wikipedia.org/wiki/Anti-Social_Behaviour_Order)

My view is that there is no realistic way to prevent confirmation bias due to the way software development projects work and that there is a need for everyone to be able to communicate with each other. However if testers are aware that there is such a thing as confirmation bias then they can try and take steps to ensure it does not creep into their testers. That is the whole concept and point of this blog – to help to raise awareness of confirmation bias and how it can effect your testing.

Monday 19 July 2010

The Emotional Tester (Part 2)

The first part of this blog looked at how our emotions could affect how we test. This second part will look at how we could capture our feelings when testing and could this provide us with any useful information about the product we are testing. Could it prove to be a useful oracle when testing?

On twitter @testsidestory said the following:

That is done regularly in usability labs: capture emotions and facial expressions of the users as they use the s/w

This was in response to a question that I posted on twitter:

…. - what I am thinking is that we need to capture our mood when #testing it could indicate a problem in the s/w…

The concern with this is that it would be very expensive to implement for the majority of people. I thought how we could implement a system that could capture emotional state and be effective and inexpensive.

One idea I had was to use a concept from the book Blink by Malcolm Gladwell, in which Malcolm talks about how important our initial emotion/reaction is when we first encounter something. There is a discussion about how often our ‘gut reaction’ proves to be correct and he uses an example of a statue that a gallery had bought after a lot of scientific experts, who had tested the statue, had said the statue was genuine. A couple of art experts who got to see the statue before it was unveiled in private viewings had a ‘feeling; that there was something wrong about the statue, their initial gut reaction was telling them it was a fake. Several months latter it was discovered to be a fake.

The above is a concise retelling of the story within the book, however why did the scientific experts get it so wrong? Could it be that conformation bias played a part? The scientific experts wanted so much to believe that it was real and not fake they caused bias in the results or avoided obvious facts that pointed to it being a fake. I think confirmation bias is a great subject and one I will look at from a testing perspective sometime in the future.

So can we use this ‘gut reaction’ concept in testing?
Would it be of any value?

I should state that I have not tried any the following ideas and that if anyone would love to volunteer within their organizations to ‘trial’ the ideas out I would be most interested. Due to circumstances I currently do not have the ability to try this out on a large scale.

The first problem we face is how we capture out initial reaction to what we are testing. The requirements for this are that it is:

Easy to capture
Simple
Quick

My thought is to use different smiley’s which are simple and quick to create and capture thus covering all the requirements.

My idea would be to use three different smiley’s:

Happy
Neutral
Unhappy

Why use smiley’s?

The idea as to why use smiley’s is that anyone can draw them no matter how artistic and from the perspective of measurements it is very easy to recognize and see pasterns when using such well known symbols. The other longer term thought was that it is easy to extend to add sad, angry, and extremely happy if you wish to improve the range of emotions and feelings.

Capturing the initial feeling/emotion.

If you are working in an environment in which you are carrying out exploratory testing and following mission statements (Session based testing) then this is very simple to implement. The idea is that when the tester starts their mission (session) they should within the first couple of minutes (5 at a max) record their emotion/feeling of the software by the use of the smiley’s.

If this was done for every session being run and captured in such a way that it would be easy to see at a glance which areas (test charters) testers are unhappy with it could provide some useful information.

So you now have a whole set of data with regards to the testers initial feeling about the software there are testing, what does this information tell you?

For example a certain test focus area shows that all the testers are unhappy in that area would this indicate a problem? I feel it could indicate something wrong in that area but you would need to talk to the testers and gather more information (obtain context) I think the great thing about capturing initial feelings towards the software could help the development teams to focus on areas where there could be implied problems based upon initial feeling.

This approach could be taken a step further and get the testers to add another smiley when they have finished the session to see how they feel about the software after they have finished their session. You now have two sets of data and can compare any discrepancies with the two.

What would you think if the majority of testers were happy about a certain test focus area but at the end of the session they were unhappy?

Does this indicate a problem?

Or what if it was the opposite mostly unhappy and at end of session they were happy?

Also if they were unhappy at the beginning and at the end, their gut reaction proves to be correct, does this give an indicator that there are some major issues within that area?

Could this indicate frustration with the system, lack of knowledge maybe?

In my opinion this approach could provide to be a very useful oracle to the quality of the software.

What do think?

Could this prove to be useful?

I would love some feedback on this idea - good or bad.

Friday 16 July 2010

The Emotional Tester (PART 1)

This blog is going to be in two parts, the first will focus on the question of do emotions affect the quality of testing. The second will look at ways in which we can gather information about how we feel about the product we are testing to see if there is any value in capturing this information.

I have an amateur interest in psychology and how some of the ideas and thoughts from this area of science can be used in software testing. I was reading ‘The Psychology of Problem Solving’ by Janet E. Davidson & Robert J. Sternberg and it had a section on how emotions affect the way we think and focus.

So I decided to tweet a question based on some of the information I had read:

Emotions and #Testing:-Do we find more bugs when we are in a bad mood? Psychology research shows we are more negative when in bad mood.

It would be interesting to have feedback from #testing community on this - Does this mean a good tester has to be a grumpy so and so... :o)

It was not long before I started to receive replies on this.

@Rob_Lamber: @steveo1967 I don't really attribute negativity to being good at finding bugs. Positive attitude, passion, inclination...not negativity

@nitinpurswani I @steveo1967 i cannot think when i am in bad mood and i guess sapient testers think

@ruudcox @steveo1967 This article might help: Mood Literally Affects How We See World. http://www.medicinenet.com/script/main/art.asp?articlekey=100974

This turned in to a lively debate on which mood is better for testing.

After reading various articles there appeared to be some common ground on how we think and see things based upon our emotions and mood.

Looking at the article suggested by @ruudox this suggested that when in a good mood we can see the whole picture and when in an unhappy mood we narrow our focus.

This appears to be backed up by research from Foless & Schwarz

Individuals in a sad mood are likely to use a systematic, data driven bottom-up strategy of information processing, with considerable attention to detail In contrast, individuals in a happy mood are likely to rely on pre-existing general knowledge structures, using a top-down heuristic strategy of information processing, with less attention to detail (foless & Schwarz, 1999;).

This now leads to some complex dilemmas, and the whole point of this blog.

Which mood is best for someone whose is a professional tester?

Which mood is more than likely to find more bugs when testing?

What other influences can affect our ability to test well?

My thoughts indicate from the information and research I have read that to be really good at testing and finding defects we need to be in a sad or unhappy mood.

Research concludes that when in a sad or unhappy mood we are more than likely to focus in on the task and step though in a data driven way. When happy we are more than likely to see the whole of the picture and look at the task from a top down approach.

Now in my opinion both of these traits are needed to be excellent testers. So do testers need to have split personalities that they can switch on or off?

The point made by @nitinpurswani about being in a bad mood stops him thinking and that to be a sapient tester he needs to think. This got me thinking and I asked him a question back.

@nitinpurswani I like that idea. However if you're in a bad mood with what u are #testing would it make you want to break it more?

My thought behind this is that if something is annoying me or irritating me I feel I am more than likely work harder to find out why it is annoying me. I become deeply focused on the problem in front of me. Does this mean I am in a bad mood? Not necessarily so – it could be I am annoyed at what I am testing but not in a bad mood in general.

When in a happy mood when testing it is easy to just let things go, we unconsciously think well that is not too much of problem we can forget about it. This is a dangerous attitude to have as a tester because this simple little problem can come back to be huge problems. Someone in an unhappy mood is more than likely to investigate why this thing is annoying and find the underlining cause.

@Rob_Lambert made a very valid point that there are environmental issues that could come into play. How many testers when testing listen to music? Rob suggested that the type or style of music you are listening to can influence the mood you are in and as a side effect the way you are thinking. I had not thought about this very much but going deeper than this – if you are working in a open office and everyone around you is having a laugh and joking would this make your testing better or worse? What if a tester and a developer are having a heated debate about something that has just been tested? Will this influence your testing?

Does any of this article back up my earlier tweet that testers need to be grumpy so and sos?

However I think this view is too simplistic. I am often asked about testers and how they are different from developers. (There is still a big drive within testing that developer and tester can be the same person and be able to switch between the different roles). I have a feeling that some of the best testers can switch between different psychological emotional states when testing. They have the best of both worlds. Able to remain focused when something is bugging them and then when they have solved what is bugging them able to switch to a whole picture view of the system they are testing.

When I started to write this article I thought it would be very simple to come to a conclusion about how emotions can affect our ability to test and what is the best mood to be in to get the best out of testing. It has proven more difficult than I thought and I still have not come to any firm conclusion about which is the best.

The one interesting point that should be made is that as professional testers we need to be aware of our emotions and how they can affect the quality of the testing we are doing. Part 2 of this blog will be looking at how we can capture our emotion and feelings about the product we are testing and see if this could provide useful information.

Sunday 11 July 2010

Managing Exploratory Testing with Mercury Quality Center

I thought I would write about my experiences of using Mercury Quality Center (MCQ) to help manage my exploratory testing sessions

When carrying out exploratory testing I use the James and Jon Bach approach of Session based testing (http://www.satisfice.com/sbtm/). What I found is that the tool provided did not match the needs of the company and was hard to sell to management since we already had commercial tools for capturing testing effort (MQC). I had to re-think how I could get buy-in from management on using the exploratory testing approach whilst making use of the tools we already had.

One of the first things I did was implement a structure within the test plan section of MCQ. So I defined the following folder structure for each project

Project Name

Test Charter

Mission Statement

Test Ideas

e.g.

Project Name -->

Test Charter -->

Mission Statement -->

Test ideas(s)

So under the planning section testers can define a folder name for the test charter they are working on and then add a folder for each mission statement and then add their test ideas.

The thinking behind this was at a glance anyone can see what has been covered under each test charter and see if their any gaps. Reports can be pulled off and used during debrief sections to act as focus points when discussing the testing that has been done.

I created a Test Plan Hierarchy using a standard numbering scheme for the folder and test idea names. This helped with traceability and navigation around the test plan.

e.g.

Project Name -

01 – Test Charter 01 -->

01.01 – Mission Statement 01

01.01.01 – Test Idea 01

01.01.02 – Test Idea 02

01.02 – Mission Statement 02

01.02.01 – Test Idea 01

01.02.02 – Test Idea 02

02 – Test Charter 02 -->

02.01 – Mission Statement 01

02.02.01 – Test Idea 01

02.02.02 – Test Idea 02

MQC is setup for a formal test case and test step scripted form of testing, I have not found a way to get around this however instead of test cases I use test ideas and needed a quick way to create new test ideas without being bogged down in writing details about lots of steps. So I suggested that each test idea has ONLY the following information:

Test Idea Name
Test Idea description (This should be as descriptive as possible – include any models/heuristic thinking/problem solving ideas)
A single test step - This is required by MQC so that the user can run the test and record its status (Pass/Fail etc)

Since we use a different system for capturing defects (Don’t ask!) I also added a folder to each project called 99- Defects – so that I could trap any defects that needed testing.

The next step was to have a structure for the test lab (this is where details of tests are run)

I implemented the following structure:

Project Name -->

Project Release Version X.Y -->

01.01 - Mission Statement 01 -->

01.01.01 - Test idea 001

01.01.02 - Test case 002

01.02 - Mission Statement 02 -->

01.02.01 - Test idea 001

01.02.02 - Test case 002

It is recommended that X.Y numbers in Project Release Version name are provided as a multiple digit left zero padded integers. This is to ease sorting by name. This was basically copied over from the test plan section.

For exploratory testing I suggested that as a minimum the following columns are included when recording the execution of the test idea.

Plan.Test Name
Result
Defect (For recording CQ defects raised within that test script)
Priority * (How important is this test idea , what risk is it to the project by not doing this test idea)
Status
Execution Date

Once this had been setup it was then easy to run a session based upon a mission statement for the session I was running. Each mission statement had multiple test ideas. I found this very useful since it was very quick to create test focus areas based upon test charter names and mission statements. These could then very simply be turned into session sheets within MQC test lab.

One of the key elements of session based testing is to capture what all the evidence of the exploratory testing session. I implemented the following to capture details of what went on the testing session. Each test idea was run from within MQC and recorded if that test idea passed or failed. (I am aware this can be very subjective and depends on context however to ease transient to ET it is necessary to have some familiar ways of recording progress). I ensured that all session notes, log captures, screen prints, videos etc were captured by attaching them to the test idea.

THIS was very IMPORTANT – since if anyone needs to follow your test idea in the future they now have a record of what and HOW you executed your test idea. This is an issue with biases here and people carrying out testing afterwards could just follow your notes and repeat what you did which is not really exploratory testing but that can be mitigated by mentoring.

You now have a tool in which you can capture what you have done during your exploratory testing sessions.

There are a few issues I find with MQC and I am sure people out there in the testing community may have the answers. I want to use MQC to record the time spent on each session (As short, medium, long). I also I wanted to capture how much as a percentage of that time was spent on:

Test execution
Bug reporting and investigation
Test environment set up
Test data setup

This would help in the telling of the story of what is stopping the testers actually testing. I am sure there is a way to do this is MQC and I just need to do some more investigation. I hope readers of this find it useful, I know it has helped me to persuaded management to take exploratory testing seriously.

To finish this is working for me, it is not perfect and I am investigating other ways/tools that can make this more efficient. Looking at using a java application to create the session sheets and report back via the MQC API directly – but that is in the future. I am also investigating ways to customize MQC so that I can have the columns I wish to have. I will let you know it that works.