This article examines potential validity threats in a controlled software engineering experiment, outlining risks to conclusion, internal, construct, and externalThis article examines potential validity threats in a controlled software engineering experiment, outlining risks to conclusion, internal, construct, and external

Assessing Validity Threats in Controlled Software Engineering Experiments

Abstract

1 Introduction

2 Original Study: Research Questions and Methodology

3 Original Study: Validity Threats

4 Original Study: Results

5 Replicated Study: Research Questions and Methodology

6 Replicated Study: Validity Threats

7 Replicated Study: Results

8 Discussion

9 Related Work

10 Conclusions And References

\

3 Original Study: Validity Threats

Based on the checklist provided by Wohlin et al. [52], the relevant threats to our study are next described.

3.1 Conclusion Validity

1. Random heterogeneity of participants. The use of a within-subjects experimental design ruled out the risk of the variation due to individual differences among participants being larger than the variation due to the treatment.

3.2 Internal Validity

  1. History and maturation:

    – Since participants apply different techniques on different artefacts, learning effects should not be much of a concern. – Experimental sessions take place on different days. Given the association of grades to performance in the experiment, we expect students will try to do better on the following day, causing that the technique applied the last day gets a better effectiveness. To avoid this, different participants apply techniques in different orders. This way we cancel out the threat due to order of application (avoiding that a given technique gets benefited from the maturation effect). In any case, an analysis of the chosen techniques per day is done to study maturation effect.

    \

  2. Interactions with selection. Different behaviours in different technique application groups are ruled out by randomly assigning participants to groups. However, we will check it analysing the behaviour of groups.

    \

  3. Hypothesis guessing. Before filling in the questionnaire, participants in the study were informed about the goal of the study only partially. We told them that we wanted to know their preferences and opinions, but they were not aware of our research questions. In any case, if this threat is occurring, it would mean that our results for perceptions are the best possible ones, and therefore would set an upper bound.

    \

  4. Mortality. The fact that several participants did not give consent to participate in the study has affected the balance of the experiment.

  5. Order of Training. Techniques are presented in the following order: CR, BT and EP. If this threat had taken place, then CR would be the most effective (or their favourite).

3.3 Construct Validity

  1. Inadequate preoperational explanation of cause constructs. Cause constructs are clearly defined thanks to the extensive training received by participants on the study techniques.
  2. Inadequate preoperational explanation of effect constructs. The question being asked is totally clear and should not be subject to possible misinterpretations. However, since the perception is subjective, there exists the possibility that the question asked is interpreted differently by different participants, and hence, perceptions are related to how the question is interpreted. This issue should be further investigated in future studies.

3.4 External Validity

  1. Interaction of setting and treatment. We tried to make the faults seeded in the programs as representative as possible of reality.
  2. Generalisation to other subject types. As we have already mentioned, the type of subjects our sample represents are developers with little or none previous experience in testing techniques and junior programmers. The extent to which the results obtained in this study can be generalised to other subject types needs to be investigated. Of all threats listed, the only one that could affect the validity of the results of this study in an industrial context is the one related to generalisation to other subject types.

:::info Authors:

  1. Sira Vegas
  2. Patricia Riofr´ıo
  3. Esperanza Marcos
  4. Natalia Juristo

:::

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.