Internal and External Validity


Internal Validity

Internal validity refers both to how well a study was run (research design, operational definitions used, how variables were measured, what was/wasn't measured, etc.), and how confidently one can conclude that the observed effect(s) were produced solely by the independent variable and not extraneous ones. In experimental research, internal validity answers the question, "Was it really the treatment that caused the difference between the subjects in the control and experimental groups?" In descriptive studies (correlational, etc.) internal validity refers only to the accuracy/quality of the study (e.g., how well the study was run).

In their classic book on experimental research, Campbell and Stanley (1966) identify and discuss 8 types of extraneous variables that can, if not controlled, jeopardize an experiment's internal validity.

  1. History-- History refers to the effect external events have on subjects between the various measurements done in an experiment. These experiences function like extra, and unplanned, independent variables. Compounding this, the experiences are likely to vary across subjects which has a differential effect on the subjects' responses. Studies that take repeated measures on subjects over time are more likely to be affected by history variables than those that collect data in shorter time periods, or that do not use repeated measures.

  2. Maturation-- Maturation refers to how subjects naturally can change over the passage of time (rather than due to the treatment). For example, the more time that passes in a study the more likely subjects are to become tired and bored, more or less motivated as a function of hunger or thirst, older, etc. As Isaac and Michael (1971) point out, subjects may perform better or worse on a dependent variable not as a result of the independent variable but because they are older, more/less motivated, etc.

  3. Testing-- Testing refers to how a pretest can affect subjects' performance on a post-test. Many experiments pretest subjects to establish that all the subjects are starting the study at approximately the same level, etc. A consequence of pretesting programs/protocols is that they can contaminate/change the subjects' performance on later tests (e.g., those used as dependent variables) that measure the same domain beyond any effects caused by the treatment itself.

  4. Instrumentation-- Instrumentation refers to the objectivity, reliability and validity of the research measurements. Data that is biased (nonobjective) or unreliable threatens a study's internal validity. In addition, changing the measurement methods (or their method of administration) during a study can affect what is measured. For more detail on this topic see Variable Measurement in Research.

  5. Statistical Regression-- Statistical regression is the phenomenon whereby retest results tend to regress toward the mean. When subjects in a study are selected as participants because they scored extremely high or extremely low on some measure of performance (e.g., a test, etc.), retesting of the subjects will almost always produce a different distribution of scores, and the average for this new distribution will be closer to the population's. For example, if the chosen subjects all had high scores initially, the group's average on the retest will tend to be lower (i.e, less extreme) than it was originally. Conversely, if the group's mean was originally low, their retest mean would be higher.

  6. Selection-- Selection refers to the effect of nonequivalent groups on a study's validity. The subjects in comparison (e.g., the control and experimental) groups should be functionally equivalent at the beginning of a study. If they are, then observed differences between the groups, as measured by the performance dependent variable(s), at the end of the study are more likely to be caused only by the independent variable instead of organismic ones. If the comparison groups are different from one another at the beginning of the study, then the observed effect(s) may be due to these differences, as opposed to the result of the experimental treatment.

  7. Experimental Mortality/Attrition-- Attrition refers to the potential bias that occurs depending on who stays or drops out of a study. Subjects frequently 'drop out' of studies. If one comparison group experiences a higher level of subject attrition than other groups, then observed differences between groups become questionable. Were the observed differences produced by the independent variable or by the different drop out rates? (Mortality is also a threat when drop out rates are similar across comparison groups but high.)

  8. Selection Interactions--In some studies the selection method can interact with maturation, history or instrumentation, also biasing the study's results.


External Validity

External validity represents the extent to which a study's results can be generalized or applied to other people or settings. Campbell and Stanley (cited in Isaac & Michael, 1971) have identified 4 factors that can adversely affect a study's external validity.

  1. An interaction between how the subjects were selected and the treatment (e.g., the independent variable) can occur. If subjects are not randomly selected from a population, then their particular characteristics may bias their performance and the study's results may not be applicable to the population or to another group that more accurately represents the characteristics of the population.

  2. Pretesting subjects in a study may cause them to react more/less strongly to the treatment than they would have had they not experienced the pretest. In such situations the researcher(s) cannot conclude that members of the population who were not pretested would perform in a similar manner to the participants in the study. Restated, to generalize the results of the study the researcher would have to specify that a particular type of pretesting also be done because the pretesting could be serving as an extra, unintentional independent variable.

  3. The performance of subjects in some studies is more a product or reaction to the experimental setting (e.g., the situation where the study is conducted) than it is to the independent variable. For example, subjects who know they are participants in a study, or who are aware of being observed, etc., may react differently to the treatment than a subject who experienced the treatment but was not aware of being observed (Hawthore Effect).

  4. Studies that use multiple treatments/interventions may have limited generalizability because the early treatments may have a cumulative effect on the subjects' performance. If a group experienced treatment X1, and the first treatment was followed by a second (X2), their measured performance after X2 will be affected by both treatments not just X2's because the effects of X1 are not erasable.

References