Agile Metrics Grid

Recently, Gerald Heller an I discussed about metrics used in agile development, agile testing metrics in particular. We found quite a number of relevant metrics and sought a way to structure them. Partly inspired by Brian Marick’s testing quadrants (see Lisa Crispin’s presentation), we ended up with a matrix spanned by two dichotomous axes.

The first axis (horizontal axis) distinguishes between coordinative and analytical metrics usage. Coordinative metrics are well-suited to directly support project activities, information needs, and decisions. Examples are tracking of work item status or burndown. Analytical metrics are used as input to investigations and analyses as they are conducted, for instance, in iteration retrospectives. An example is story cycle time, which shall be investigated at the end of a release in order to look for improvement opportunities for subsequent releases.

The second axis (vertical axis) distinguishes between internal and external metrics target groups. Internal target groups are the members of an agile software development team. External target groups are other stakeholders such as development management and product management.

The following figure shows our proposed agile metrics grid along with a number of categorized agile metrics. The grid helps guiding metric definition and clarifying the role that a given metric plays for agile development.

The agile testing grid is described in more detail in issue 4 of Agile Record, a magazine for agile developers and agile testers.

http://www.agilerecord.com

Goal/Question/Metric (GQM)

Goal/Question/Metric (GQM) is an established and elaborated method for measurement in software engineering. I have been using it, and doing evaluation research and method development on it since the 1990-ies. It can be useful in particular for the following purposes:

  • Systematically define measurement and reporting in software projects
  • Consolidate existing “ad hoc” measurement
  • Gain information needed for proces improvement and agile retrospectives

There have been reported many applications of GQM. Unfortunately, some involve misinterpretions of the method. For this reason, I propose in this article a few references and hints that can provide you with guidance and examples on how to apply GQM effectively.

GQM includes a data structure, the so-called GQM tree, that helps identifying and interpreting metrics for a given measurement goal. Specific types of questions clarify certain aspects of the measurement goal. Another data structure, the GQM Abstraction Sheet, facilitates handling of goal, questions and metrics.

Whenever I need metrics or reporting information about a piece of software or a software project, I find it highly useful to follow the GQM method in a pragmatic manner, to clarify what information exactly is required, how it can be obtained, and what needs to be considered when interpreting the data. Important GQM principles that provide guidance are:

  • Use the GQM goal template to clarify what shall be found out, and to what target group the information shall be directed.
  • Use only one or very few levels of Questions in order to keep the GQM tree concise.
  • Use exactly one level of metrics. If metrics tend to become complex, ensure that their definitions are clear and precise.
  • Focus on defining appropriate graphical representations of metrics information.
  • Be aware that the metrics data is only the raw material of measurement. Most important is the interactive data interpretation together with the GQM goal’s target group. This collaborative data interpretation is guided by the previously defined GQM questions.

The following literature gives a good overview of GQM. The article by van Latum et al. illustrates practical GQM application. The book by van Solingen and Berghout gives rich additional practical advice. The articles by Basili, Caldiera, and Rombach can be regarded the original source of GQM, although the method goes back to earlier work by Basili and Weiss from the 1980-ies.

Frank van Latum, Rini van Solingen, Markku Oivo, Barbara Hoisl, Dieter Rombach, Günther Ruhe. Adopting GQM-Based Measurement in an Industrial Environment. IEEE Software, pp. 78-86, January/February, 1998. (http://www.computer.org/portal/web/csdl/doi/10.1109/52.646887)

Rini van Solingen, Egon Berghout. The Goal/Question/Metric Method. McGraw-Hill Education, 1999. (http://www.iteva.rug.nl/gqm/GQM%20Guide%20non%20printable.pdf)

Victor R. Basili, Gianluigi Caldiera and H. D. Rombach. The Goal Question Metric Approach. In Encyclopedia of Software Engineering (John J. Marciniak, Ed.), John Wiley & Sons, Inc., 1994, Vol. 1, pp.528–532.

V. R. Basili, G. Caldiera, and H. Dieter Rombach. The Goal Question Metric Approach. NASA GSFC Software Engineering Laboratory, 1994. (ftp://ftp.cs.umd.edu/pub/sel/papers/gqm.pdf)

Testing Classification for Focused Test Planning

Testing can mean very different things, depending on the software to be tested, the organization in which the testing takes place, and several other factors. Since testing can be so different, it is useful to have a classification system for testing at hand. It can guide test management during planning and preparation of the various test activities.

A notable classification of testing has been proposed by Robert L. Glass, along with recommendations for test planning. I came across it in late 2008, when I read one of his columns in IEEE Software magazine. Let’s have a look on Glass’s testing classification and discuss what it means to test management practice. At the end of the article, you find references to additional information.

Glass classifies testing along two dimensions. The first dimension is the goal of testing, i.e., the main principle that drives identification of test cases. There are four goal-driven approaches to testing: (1) Requirements-driven testing derives test cases from defined requirements. (2) Structure-driven testing derives test cases from the software’s implementation structure. (3) Statistics-driven testing derives test cases from typical and frequently conducted usage scenarios of the software. (4) Risk-driven testing derives test cases from the most critical software-related risks.

The second dimension of Glass’s model refers to the phase of the software development lifecycle, in which the testing takes place: (1) Unit testing involves the lowest-level components of the software product. (2) Integration testing involves the intermediate-level of the software product and lifecycle. (3) System testing involves the final level of software development.

Glass argues that both dimensions must be combined. For each combination, he recommends the degree at which the respective testing type shall be executed. The following table summarizes the recommendations.

Testing phase
Testing approach Unit testing Integration testing System testing
Requirements-driven 100% unit requirements 100% product requirements 100% system requirements
Structure-driven 85% logic paths 100% modules 100% components
Statistics-driven 90-100% of usage profiles if required
Risk-driven As required As required 100% if required

How can we use this classification system and its recommendations?—First, it tells us that testing focus should be different in different lifecycle phases or stages of product aggregation. Second, the classification system recommends that we shall aim for complete requirements coverage in every testing phase, while other testing approaches should be emphasized mainly during the later phases of testing. This way, we receive guidance for focused and more efficient testing.

While I value the essence of Glass’s classification very much, I partly question the first dimension of testing approaches, and I am particularly sceptic of the recommended 100% testing degree for the requirements-driven approach. In my opinion, only the first two testing approaches are basic and fundamentally different categories: Requirements-driven and structure-driven testing. The other two approaches, statistics-driven and risk-driven, are variants of the basic approaches. Statistics-driven usage scenarios and risk can only be determined based on requirements or structure. So, those latter approaches are means for focusing requirements-driven and structure-driven testing.

Why am I sceptic of the 100% degree for requirements-driven testing? I find it impractical for several reasons: First, most testing that I have encountered suffered from severe time and resource constraints, which clearly demanded “less than 100% testing!” Second, requirements are often vague and uncomplete. So, 100% of something vague is just an illusion of 100%. Third, hardly any project has explicitly stated unit and product requirements. As a consequence, there is no basis for stipulating any kind of test coverage for those requirements types.

However, the basic messages from Glass’s testing classification remain valid and important: Distinguish between requirements-driven and structure-driven testing, and apply different kinds of testing at different phases of system aggregation. Also use statistics-driven and risk-driven approaches for focusing testing. In an earlier article, I have proposed a pragmatic approach for establishing good test coverage based on those principles: Two Essential Stages of Testing.

Further information about Robert L. Glass’s classification can be found in two of his columns in IEEE Software magazine and in a publically available excerpt from one of his latest books: An Ancient (but Still Valid?) Look at the Classification of Testing (IEEE Software magazine Nov/Dec 2008), A Classification System for Testing, Part 2 (IEEE Software magazine Jan/Feb 2009), and The Many Flavors of Testing (Book Excerpt).

http://makingofsoftware.com/archives/358Two

Two Essential Stages of Testing

Good test coverage is important and sometimes not easy to achieve. A simple principle can lay a solid foundation for test coverage: Distinguishing two essential stages of testing.

The initial low-level stage tests basic development artifacts immediately or soon after they have become available. This is usually called component, module or unit testing.

The later high-level stage tests the entire system or its higher-level aggregates as early as possible and lasting until very close before product delivery. This is usually called acceptance test.

The concepts of low-level and high-level tests are not new. Important is to relate them to different phases of the development cycle (or to activities within agile iterations, likewise; levels become stages) and to systematically plan associated activities. This way, good test coverage  can be achieved very efficiently.

Low-level testing stage High-level testing stage
Tests cover implementation of each basic development artifact from an implementation point of view (white or grey box testing) Tests cover design of the entire system or major system part from a business or usage point of view (black box testing)
Test cases defined by developers; derived from requirements and design Test cases defined by testers, domain experts from the software team, and/or customers; derived from explicit requirements or tacit domain knowledge
Tests conducted by developers Tests conducted by testers and/or customers or users
Defects usually fixed immediately or otherwise entered into defect database Defects usually entered into defect database, fixed, and being re-tested

I have seen many projects in trouble, because they did not properly address these test stages. Sometimes, low-level testing was replaced by pure faith (“My programs always run well”). Sometimes, high-level testing was shallow and ineffective (“We don’t have any time for more tests”). Often, the relation between both stages was not managed well, lowering product quality and limiting the efficiency of testing.

Taking care that both stages of testing are being addressed is a first and important step towards improved testing. Both stages are complementing each other well, so that higher test coverage can be achieved without very little planning and qualification efforts. This is also a good basis for subsequent improvement activities.

Additional details on the concepts of low-level and high-level testing are described in the testing literature, although the relation between the levels and phases of the development lifecycle is often not explored very much. A very instructive book is TMap® Next by Koomen et al. (2006). Another elaboration on the two testing levels are Brian Marick’s agile testing quadrants. Lisa Crispin’s presentation on  agile test planning provides a detailed explanation of those quadrants. Finally, an inspiring reflection on testing is Robert L. Glass’s text on The Many Flavors of Testing.