Agile Metrics Grid

Recently, Gerald Heller an I discussed about metrics used in agile development, agile testing metrics in particular. We found quite a number of relevant metrics and sought a way to structure them. Partly inspired by Brian Marick’s testing quadrants (see Lisa Crispin’s presentation), we ended up with a matrix spanned by two dichotomous axes.

The first axis (horizontal axis) distinguishes between coordinative and analytical metrics usage. Coordinative metrics are well-suited to directly support project activities, information needs, and decisions. Examples are tracking of work item status or burndown. Analytical metrics are used as input to investigations and analyses as they are conducted, for instance, in iteration retrospectives. An example is story cycle time, which shall be investigated at the end of a release in order to look for improvement opportunities for subsequent releases.

The second axis (vertical axis) distinguishes between internal and external metrics target groups. Internal target groups are the members of an agile software development team. External target groups are other stakeholders such as development management and product management.

The following figure shows our proposed agile metrics grid along with a number of categorized agile metrics. The grid helps guiding metric definition and clarifying the role that a given metric plays for agile development.

The agile testing grid is described in more detail in issue 4 of Agile Record, a magazine for agile developers and agile testers.

http://www.agilerecord.com

Goal/Question/Metric (GQM)

Goal/Question/Metric (GQM) is an established and elaborated method for measurement in software engineering. I have been using it, and doing evaluation research and method development on it since the 1990-ies. It can be useful in particular for the following purposes:

  • Systematically define measurement and reporting in software projects
  • Consolidate existing “ad hoc” measurement
  • Gain information needed for proces improvement and agile retrospectives

There have been reported many applications of GQM. Unfortunately, some involve misinterpretions of the method. For this reason, I propose in this article a few references and hints that can provide you with guidance and examples on how to apply GQM effectively.

GQM includes a data structure, the so-called GQM tree, that helps identifying and interpreting metrics for a given measurement goal. Specific types of questions clarify certain aspects of the measurement goal. Another data structure, the GQM Abstraction Sheet, facilitates handling of goal, questions and metrics.

Whenever I need metrics or reporting information about a piece of software or a software project, I find it highly useful to follow the GQM method in a pragmatic manner, to clarify what information exactly is required, how it can be obtained, and what needs to be considered when interpreting the data. Important GQM principles that provide guidance are:

  • Use the GQM goal template to clarify what shall be found out, and to what target group the information shall be directed.
  • Use only one or very few levels of Questions in order to keep the GQM tree concise.
  • Use exactly one level of metrics. If metrics tend to become complex, ensure that their definitions are clear and precise.
  • Focus on defining appropriate graphical representations of metrics information.
  • Be aware that the metrics data is only the raw material of measurement. Most important is the interactive data interpretation together with the GQM goal’s target group. This collaborative data interpretation is guided by the previously defined GQM questions.

The following literature gives a good overview of GQM. The article by van Latum et al. illustrates practical GQM application. The book by van Solingen and Berghout gives rich additional practical advice. The articles by Basili, Caldiera, and Rombach can be regarded the original source of GQM, although the method goes back to earlier work by Basili and Weiss from the 1980-ies.

Frank van Latum, Rini van Solingen, Markku Oivo, Barbara Hoisl, Dieter Rombach, Günther Ruhe. Adopting GQM-Based Measurement in an Industrial Environment. IEEE Software, pp. 78-86, January/February, 1998. (http://www.computer.org/portal/web/csdl/doi/10.1109/52.646887)

Rini van Solingen, Egon Berghout. The Goal/Question/Metric Method. McGraw-Hill Education, 1999. (http://www.iteva.rug.nl/gqm/GQM%20Guide%20non%20printable.pdf)

Victor R. Basili, Gianluigi Caldiera and H. D. Rombach. The Goal Question Metric Approach. In Encyclopedia of Software Engineering (John J. Marciniak, Ed.), John Wiley & Sons, Inc., 1994, Vol. 1, pp.528–532.

V. R. Basili, G. Caldiera, and H. Dieter Rombach. The Goal Question Metric Approach. NASA GSFC Software Engineering Laboratory, 1994. (ftp://ftp.cs.umd.edu/pub/sel/papers/gqm.pdf)

Testing History

Recently I came across the website www.testingreferences.com whose author Joris Meerts did a great job in collecting a history of testing. In addition, he  provides a pretty extensive listing on

  • Testing web sites
  • Testing blogs
  • Testing videos (educational)
  • Testing literature

I am sure you will love this site. As the list is pretty long I will tell you my personal shortlist here:

Testing Classification for Focused Test Planning

Testing can mean very different things, depending on the software to be tested, the organization in which the testing takes place, and several other factors. Since testing can be so different, it is useful to have a classification system for testing at hand. It can guide test management during planning and preparation of the various test activities.

A notable classification of testing has been proposed by Robert L. Glass, along with recommendations for test planning. I came across it in late 2008, when I read one of his columns in IEEE Software magazine. Let’s have a look on Glass’s testing classification and discuss what it means to test management practice. At the end of the article, you find references to additional information.

Glass classifies testing along two dimensions. The first dimension is the goal of testing, i.e., the main principle that drives identification of test cases. There are four goal-driven approaches to testing: (1) Requirements-driven testing derives test cases from defined requirements. (2) Structure-driven testing derives test cases from the software’s implementation structure. (3) Statistics-driven testing derives test cases from typical and frequently conducted usage scenarios of the software. (4) Risk-driven testing derives test cases from the most critical software-related risks.

The second dimension of Glass’s model refers to the phase of the software development lifecycle, in which the testing takes place: (1) Unit testing involves the lowest-level components of the software product. (2) Integration testing involves the intermediate-level of the software product and lifecycle. (3) System testing involves the final level of software development.

Glass argues that both dimensions must be combined. For each combination, he recommends the degree at which the respective testing type shall be executed. The following table summarizes the recommendations.

Testing phase
Testing approach Unit testing Integration testing System testing
Requirements-driven 100% unit requirements 100% product requirements 100% system requirements
Structure-driven 85% logic paths 100% modules 100% components
Statistics-driven 90-100% of usage profiles if required
Risk-driven As required As required 100% if required

How can we use this classification system and its recommendations?—First, it tells us that testing focus should be different in different lifecycle phases or stages of product aggregation. Second, the classification system recommends that we shall aim for complete requirements coverage in every testing phase, while other testing approaches should be emphasized mainly during the later phases of testing. This way, we receive guidance for focused and more efficient testing.

While I value the essence of Glass’s classification very much, I partly question the first dimension of testing approaches, and I am particularly sceptic of the recommended 100% testing degree for the requirements-driven approach. In my opinion, only the first two testing approaches are basic and fundamentally different categories: Requirements-driven and structure-driven testing. The other two approaches, statistics-driven and risk-driven, are variants of the basic approaches. Statistics-driven usage scenarios and risk can only be determined based on requirements or structure. So, those latter approaches are means for focusing requirements-driven and structure-driven testing.

Why am I sceptic of the 100% degree for requirements-driven testing? I find it impractical for several reasons: First, most testing that I have encountered suffered from severe time and resource constraints, which clearly demanded “less than 100% testing!” Second, requirements are often vague and uncomplete. So, 100% of something vague is just an illusion of 100%. Third, hardly any project has explicitly stated unit and product requirements. As a consequence, there is no basis for stipulating any kind of test coverage for those requirements types.

However, the basic messages from Glass’s testing classification remain valid and important: Distinguish between requirements-driven and structure-driven testing, and apply different kinds of testing at different phases of system aggregation. Also use statistics-driven and risk-driven approaches for focusing testing. In an earlier article, I have proposed a pragmatic approach for establishing good test coverage based on those principles: Two Essential Stages of Testing.

Further information about Robert L. Glass’s classification can be found in two of his columns in IEEE Software magazine and in a publically available excerpt from one of his latest books: An Ancient (but Still Valid?) Look at the Classification of Testing (IEEE Software magazine Nov/Dec 2008), A Classification System for Testing, Part 2 (IEEE Software magazine Jan/Feb 2009), and The Many Flavors of Testing (Book Excerpt).

https://makingofsoftware.com/archives/358Two

Requirements Prioritization

Beyond the Limits of One-Dimensional Lists

Prioritizing requirements for a software release is an activity which frequently crosses the border between science and psychology. The goal is to determine the right set of things to do for a release. For many IT projects this turns out to be a moving target.

In software product development, requirement priorities are set by the product manager. Typically a product manager focuses exclusively on the market need of a requirement and its selling potential. Excellent software companies look at additional factors as well:

  • Cost
  • Risk
  • Fit to product strategy and architecture
  • Ability to deliver

Creating estimates for these factors cannot be done by a single person. Experts from different domains are needed.

There is a couple of techniques available to cope with this issue. Most of them are based on “cost-value” approaches. In my projects, I have applied a variety of them ranging from simple to complex ones. Often we started with something similar to Karl Wieger’s requirements prioritization approach (see www.processimpact.com).

Ultimately, these techniques yield a list of requirements ordered by their relative priority. Such a list is charming because you can always pick the most important requirement next. It is so charming, that this approach has made it into today’s most prominent development approach: Agile methodologies. In agile projects, the term “product backlog” describes a priority-ordered list of work items, which are addressed from top to bottom.

However, there are some shortcomings with these approaches, which need to be overcome in industrial practice. The key problem is the underlying assumption that requirements can be prioritized independently from each other.

Experience shows that this is seldom the case. Most requirements are interdependent from other requirements.

Across many industry domains including software and IT, the fundamental approach to building a system is always the same: A high-level plan is decomposed until the units of work are manageable. Elements of such units share some characteristics. They belong to the same domain and have similar complexity. Rating such elements in relative order works well. However, if the elements are from different domains, you may run into problems: They might have architectural dependencies which cannot be addressed in isolation. Many times, I saw projects getting stuck, because one team was waiting for something that another team had decided to lower in priority.

This problem of deadlock situations can be addressed by explicit dependency management. In requirements engineering, the concept of traceability is used to manage dependent work-items. Directional links express dependency relationships. These techniques exist and are available in most modern requirements management tools. However, in practice, these solutions can be very hard to accomplish, because they require more discipline than most projects are able to bring up.

So I recommend to deploy lightweight traceability using tagging mechanisms at requirements. Sometimes multiple backlogs are used. Each backlog holds items from the same domain. I have seen such approaches working pretty well in many project situations. So they might be the optimal solutions with the right mix of rigor and flexibility.

And don’t forget the importance of communication: Whatever approach one uses, it will be successful only if it is accompanied by good communication structures in the organization.