Seeking Answers:
Questions Health Professionals Ask and the Evidence Needed to Answer Them
ÓCraig L. Scanlan, 2004 (Version 2, Sep 2005)
After completion of this module, you will be able to:
Health professionals commonly ask two types of questions: background and foreground questions. The following table distinguishes between these two types of questions and provides an example of each:
Background vs. Foreground Questions |
|
|
Background questions: |
Foreground questions: |
|
|
|
|
|
(Adapted from Sackett, DL and others. (2000). Evidence-based Medicine: How to Practice and Teach EBM. 2nd ed. Edinburgh: Churchill Livingstone |
|
.
As indicated in the Figure 1, students or novice practitioners ask primarily background questions. As one progresses from novice to expert clinician, more and more of the pertinent questions that need answering are of the foreground type. Of course, one generally needs to fully understand the background questions and answers before asking and answering foreground questions.
|
|
Figure 1. Questioning: The progression from novice to expert. Adapted from Guyatt, G., & Rennie, D. (2002). Users' guides to the medical literature: Essentials of evidence-based clinical practice. Chicago: AMA Press |
Therapy |
Therapy questions ask which treatment is 'best' for a patient, and/or what immediate or short-term outcomes one can expect from different treatment options. The goal is to select those treatments that do more good than harm and that are worth the efforts and costs of using them |
|
Diagnosis |
Diagnosis questions ask to what degree a particular test is a valid and reliable predictor of a clinical condition. The goal is to decide whether a patient would get enough benefit from the test, on average, to justify having it done. |
|
Prognosis |
Prognosis questions ask what effect a particular treatment option will have on a patient's future health, life span, and/or quality of life. The goal is to estimate the patient's likely clinical course over time and anticipate likely complications of disease and/or factors affecting long-term outcomes. |
|
Harm |
Harm questions ask about the relationship between a disease and possible causes or risk factors. The goal is to reduce the chance of disease by identifying and modifying or eliminating risk factors |
http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.html
As show in Figure 2, this page helps you find articles in the Medline database on a particular subject area that focus on one of the major categories of foreground questions described above. To obtain a list of citation for a subject area of interest to you that is based primarily on studies of therapy, diagnosis, harm (etiology) or prognosis:
|
|
|
|
|
Figure 2. PubMed Clinical Queries Page |
For example, assume that my question involves identifying the best screening test for the diagnosis of occupational asthma (asthma associated with exposure to allergens in the work setting). I type occupational asthma into the Search text box, click on diagnosis for my Category and select narrow, specific search for my Scope. This search results in 94 citations, of which the following examples are representative of the many good 'hits':
9: Anees W. Use of pulmonary function tests in the diagnosis of occupational asthma. Ann Allergy Asthma Immunol. 2003 May;90(5 Suppl 2):47-51.
29: Weytjens K, Malo JL, Cartier A, Ghezzo H, Delwiche JP, Vandenplas O. Comparison of peak expiratory flows and FEV1 in assessing immediate asthmatic reactions due to occupational agents. Allergy. 1999 Jun;54(6):621-5.
33. Leroyer C, Perfetti L, Trudeau C, L'Archeveque J, Chan-Yeung M, Malo JL. Comparison of serial monitoring of peak expiratory flow and FEV1 in the diagnosis of occupational asthma. Am J Respir Crit Care Med. 1998 Sep;158(3):827-32.
50: Paggiaro PL, Giannini D, Moscato G, Bacci E, Bancalari L, Carrara M, Dente FL, Di Franco A, Di Pede F, Petrozzino M, et al. Peak expiratory flow monitoring in diagnosis and management of occupational asthma. Monaldi Arch Chest Dis. 1994 Dec;49(5):425-31.
68: Perrin B, Lagier F, L'Archeveque J, Cartier A, Boulet LP, Cote J, Malo JL. Occupational asthma: validity of monitoring of peak expiratory flow rates and non-allergic bronchial responsiveness as compared to specific inhalation challenge. Eur Respir J. 1992 Jan;5(1):40-8.
All in all a good start on answering my question. Try out this tool and see what results you get!
Two important notes related to using these research methodology filters:
Not all questions of importance to health-related professionals focus on patients and their therapy, diagnosis, prognosis or /harm. Typically, these 'other' types of questions focus on units of analysis broader than an individual patient or client, i.e., on policies/procedures, programs, systems, or organizations. A supplementary help document entitled 'Other' Types of Questions is available to assist those intent on pursuing these broader questions at:
http://www.umdnj.edu/idsweb/idst6400/other_questions.htm
A careful review of all the pertinent articles I retrieved related to my foreground question on diagnosis of occupational asthma reveals several different types of article and studies. For example, citations #29 is a clinical trial, citation #68 is a prospective cohort study and citations #9, 33 and 50 are review articles. Although they are all related to the diagnosis of occupational asthma, they clearly represent different types of evidence. Which type of study or evidence provides the best answer?
Types of Studies
Before you can determine which type of evidence provides the best answer to one's clinical questions, it is essential that you understand the full scope of study types typically encountered in the literature. Then you will need to relate this knowledge back to the category of question being asked (i.e., therapy, diagnosis, harm/etiology or prognosis).
Health science studies commonly are divided into two major types: original research and secondary or integrative studies (reviews). Original research studies report research first-hand. Secondary, integrative studies or reviews summarize and draw conclusions from primary studies.
Original research can further be divided into two subcategories: experimental studies and observational studies (Gay, 1999). Generally, experimental studies are assumed to provide stronger empirical evidence than that resulting from observational studies. Two other unique types of original research, methodological studies and evaluation research, are discussed separately.
Experimental Studies. Experimental studies are prospective in nature, meaning that data collection and the events of interest occur after subjects are enrolled. Moreover, in these types of studies, the investigator manipulates the exposure or treatment by controlling the assignment or allocation of subjects (Gay, 1999). In relative order from those providing the strongest to those providing the weakest empirical evidence for clinical practice are the following types of experiments (some study type definitions based on National Library of Medicine Medical Subject Headings):
In discussing clinical trials, one also may encounter terminology used by the FDA in its approval process for new drugs, devices or procedures. The FDA defines four phases of clinical trials, each with a different purpose (definitions based on National Library of Medicine Medical Subject Headings):
Observational Studies. The second subcategory of original research is the observational study. In an observational study, the investigator cannot and does not manipulate the exposure or treatment. Instead, subjects are identified and compared (by historical record or survey) as either having or not having the exposure, treatment or attributes of interest or are followed forward in time to observe what occurs in the future (e.g., morbidity, mortality, contracting a disease, etc). In this regard, observational studies are essentially "experiments in nature." In general, the level of evidence provided by observational studies is lower/weaker than that provided by experimental studies due to the many confounding factors that cannot be fully controlled (Gay, 1999). However, for some clinical questions, observational studies are not only appropriate but also represent the best available evidence. In relative order from those providing the strongest to those providing the weakest empirical evidence for clinical practice are the following types of observational studies:
Other Types of Original Research. There are two other types of original research that do not nicely fit into either the experimental or observational category: methodological studies and evaluation research. What distinguishes these studies from the subject and treatment/exposure-oriented research methods described above is their focus. Methodological studies focus on assessing the tools or methods of research, whereas evaluation research involve drawing summary judgments about the impact of a process or program.
Secondary studies summarize and draw conclusions from primary studies. Secondary or integrative studies found in professional journals include systematic reviews and meta-analyses, decision analyses, practice guidelines, consensus statements and journalistic reviews/overviews. Outside the journal literature, books and book chapters are the most common secondary source materials. In relative order from those providing the strongest to those providing the weakest empirical evidence for clinical practice are the following types of secondary studies:
Systematic Reviews and Meta-Analyses. Systematic reviews provide summaries of related primary studies that have been searched for, evaluated, selected and reported according to a rigorous and predefined methodology. The most common methodology employed to conduct a systematic review is that developed by the Cochrane Collaboration. Often, a systematic review will employ meta-analysis to statistically combine the numeric results of several separate studies addressing the same question into a single estimate of their combined effect (commonly referred to as 'pooling data'). Typically, the results are presented in graphic form using a 'forest plot' (Figure 3). In this example of a forest plot, the individual odds ratios (with confidence intervals) of five studies (i. v.) are depicted by the black blobs and lines, with the pooled odds ratios (and its confidence intervals) represented by the open diamond.
|
|
|||||
|
Intervention group
does |
OR = 1.0 |
Intervention group
does |
|||
|
|
|
|
|||
|
Figure 3. Forest plot summarizing 5 studies. Adapted from University of Sheffield, School of Health and Related Research (n.d.) Appraisal of reviews. Retrieved May 15, 2004 from http://www.shef.ac.uk/scharr/ir/units/critapp/apprev.htm. Click here for a detailed explanation |
|||||
Decision Analyses. Decision analyses use the results of primary studies to develop quantitative measures or estimates of the risks and benefits of alternative diagnostic or therapeutic options. As described in Figure 4, these estimates are then used to develop a probability tree that can help health professionals and/or patients make informed choices about clinical management. Decision analysis also can be used to help develop clinical guidelines or consensus statements.
|
|
|
Figure 4. A decision analysis tree for the diagnosis and treatment of sore throat. Time flows from left to right. Blue squares represent decisions, green circles are event probabilities and the red triangles depict terminal or outcome 'utilities' (global estimate of the preference for the outcome) with a utility of 1.0 being optimal. In the above analysis, the least preferred outcome (0.6) is a patient who has strep, is tested for strep via the rapid antigen test, but for whom a false negative (FN) test result is obtained. Optimal (1.0) outcomes in this analysis include (a) empirical treatment for patients having strep, (b) a true positive (TP) diagnosis of strep (treatment assumed), and (c) a true negative test result for a patient not having strep. Figure and explanation adapted from Michigan State University - College of Human Medicine, Department of Family Practice Reading and doing decision analyses (Module 6 in An Introduction to Information Mastery). Retrieved May 17, 2005 from http://www.poems.msu.edu/InfoMastery/DecisionAnalysis/DA.htm |
Practice Guidelines (Evidence-Based). An evidence-based practice guideline is a statement based on systematic review of primary studies that is designed to help clinicians make appropriate decisions about health care in specific clinical circumstances. The guideline is evidence-based to the extent that it employs the same methods and procedures used in developing systematic reviews and/or meta-analyses. Evidence-based guidelines may also involve quantitative assessment of alternative risks and benefits, as with decision analysis.
Practice Guidelines (Non-Systematic). Like its evidence-based counterpart, a non-systematic practice guideline is a statement based on the review of primary studies that is designed to help clinicians make appropriate clinical decisions. However, a non-systematic practice guideline does not employ reproducible methods for evaluating, selecting and reporting the primary studies on which it is based.
Consensus Statements. A consensus statement is a statement developed by professionals via a group consensus process that is intended to advance health professional and/or public understanding of a targeted health problem, practice or issue. Typically, consensus statements are developed before the research literature reaches the 'critical mass' needed to conduct systematic reviews or write evidence-based practice guidelines. Notwithstanding this limitation, consensus statements should still be based on evaluation of the available scientific evidence, no matter how lean.
Journalistic Reviews/Overviews. Non-systematic or journalistic reviews provide a summary of evidence derived from primary studies that have been selected and synthesized according to the author's personal and professional perspective. Non-systematic reviews can cover a wide range of subject matter at various levels of completeness and comprehensiveness. Most books and book chapters provide a similar perspective and together with journal overviews generally represent the lowest level of secondary or integrative evidence.
The prior discussion of study types provided a general hierarchy of evidence for experimental, observational and integrative studies. Now it's time to determine which type of study is best for answering each specific type of question, i.e., therapy, diagnosis, harm/etiology or prognosis.
Although the literature provides many systems for defining levels of evidence, the one most commonly cited and used is that developed Oxford Centre for Evidence-based Medicine. Below is an adapted version of the Oxford Levels of Evidence table with accompanying notes:
Levels of Evidence Based on Category of Question Being Asked
(adapted from Levels of Evidence and Grades of Recommendation, Oxford Centre for Evidence-based Medicine available online at: http://www.cebm.net/?o=1025
|
Level |
Category of Question |
|||
|
|
Therapy/Prevention |
Diagnosis |
Etiology/Harm |
Prognosis |
|
1a |
SR (with homogeneity*) of RCTs |
SR (with homogeneity*) of Level 1 diagnostic studies; CDR with 1b studies from different clinical centres |
SR (with homogeneity*) of RCTs |
SR (with homogeneity*) of inception cohort studies; CDR validated in different populations |
|
1b |
Individual RCT (with narrow Confidence Interval) |
Validating** cohort study with good reference standards; or CDR tested within one clinical centre |
Individual RCT (with narrow Confidence Interval) |
Individual inception cohort study with > 80% follow-up; CDR validated in a single population |
|
1c |
All or none case-series |
|||
|
2a |
SR (with homogeneity* ) of cohort studies |
SR (with homogeneity*) of Level >2 diagnostic studies |
SR (with homogeneity* ) of cohort studies |
SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs |
|
2b |
Individual cohort study (including low quality RCT; e.g., <80% follow-up) |
Exploratory** cohort study with goodreference standards; CDR after derivation, or validated only on split-sample§§§ or databases |
Individual cohort study (including low quality RCT; e.g., <80% follow-up) |
Retrospective cohort study or follow-up of untreated control patients in an RCT; Derivation of CDR or validated on split-sample§§§ only |
|
2c |
"Outcomes" Research; Ecological studies |
|
"Outcomes" Research; Ecological studies |
"Outcomes" Research |
|
3a |
SR (with homogeneity*) of case-control studies |
SR (with homogeneity*) of 3b and better studies |
SR (with homogeneity*) of case-control studies |
|
|
3b |
Individual Case-Control Study |
Non-consecutive study; or a study without consistently applied reference standards |
Individual Case-Control Study |
|
|
4 |
Case-series (and poor quality cohort and case-control studies§§ ) |
Case-control study, poor or non-independent reference standard |
Case-series (and poor quality cohort and case-control studies§§ ) |
Case-series (and poor quality prognostic cohort studies***) |
|
5 |
Expert opinion without explicit critical appraisal, or based on physiology, bench research or pathophysiological principles |
Expert opinion without explicit critical appraisal, or based on physiology, bench research or pathophysiological principles |
Expert opinion without explicit critical appraisal, or based on physiology, bench research or pathophysiological principles |
Expert opinion without explicit critical appraisal, or based on physiology, bench research or pathophysiological principles |
Notes
|
|
SR = systematic review; RCT = randomized controlled trial; CDR = clinical decision analysis rule |
|
* |
Homogeneity indicates that a systematic review is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. |
|
|
Clinical Decision Rule. (These are algorithms or scoring systems which lead to a prognostic estimation or a diagnostic category) |
|
|
The width of a confidence interval indicates the precision of the inferences made from it and thus the power or probability of making a correct conclusion from the data. A narrow confidence indicates high precision and power, whereas a wide interval indicates low precision and power and the raises the possibility of overlooking important benefits or harms. |
|
§ |
Met when all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it. |
|
§§ |
A poor quality cohort study is one that fails to clearly define comparison groups and/or fails to measure exposures and outcomes in the same (preferably blinded), objective way in both exposed and non-exposed individuals and/or fails to identify or appropriately control known confounders and/or fails to carry out a sufficiently long and complete follow-up of patients. A poor quality case-control study is one that fails to clearly define comparison groups and/or fails to measure exposures and outcomes in the same (preferably blinded), objective way in both cases and controls and/or fails to identify or appropriately control known confounders. |
|
§§§ |
Split-sample validation is achieved by collecting all the information in a single block, then artificially dividing this into "derivation" and "validation" samples. |
|
|
An "Absolute SpPin" is a diagnostic finding whose Specificity is so high that a Positive result rules-in the diagnosis. An "Absolute SnNout" is a diagnostic finding whose Sensitivity is so high that a Negative result rules-out the diagnosis. |
|
|
Good, better, bad and worse refer to the comparisons between treatments in terms of their clinical risks and benefits. |
|
|
Good reference standards are independent of the test, and applied blindly or objectively to applied to all patients. Poor reference standards are haphazardly applied, but still independent of the test. Use of a non-independent reference standard (where the 'test' is included in the 'reference', or where the 'testing' affects the 'reference') implies a level 4 study. |
|
|
Better-value treatments are clearly as good but cheaper, or better at the same or reduced cost. Worse-value treatments are as good and more expensive, or worse and the equally or more expensive. |
|
** |
Validating studies test the quality of a specific diagnostic test, based on prior evidence. An exploratory study collects information and trawls the data (e.g. using a regression analysis) to find which factors are 'significant'. |
|
*** |
By poor quality prognostic cohort study we mean one in which sampling was biased in favor of patients who already had the target outcome, or the measurement of outcomes was accomplished in <80% of study patients, or outcomes were determined in an unblinded, non-objective way, or there was no correction for confounding factors. |
|
**** |
Good follow-up in a differential diagnosis study is >80%, with adequate time for alternative diagnoses to emerge (e.g. 1-6 months acute, 1 - 5 years chronic) |
The Oxford table is the most-cited resource for assessing levels of evidence, but (as suggested by the notes) is a bit complex. A simplified 3-level table of evidence levels, adapted from the Oxford version, is provided below. This simplified table is acceptable for use in Evidence-Based Literature Review (IDST6400). Note the use of Roman numerals instead of Arabic numbers to designate the levels of evidence (I = highest; III = lowest).
|
Design Level |
Therapy Studies |
Diagnostic Studies |
Prognosis/Harm Studies |
Other Studies |
|
I |
Large RCT or systematic reviews or meta-analysis of RCTs |
Prospective cohort study based on a gold standard or systematic reviews or meta-analysis of comparable cohort studies |
Population prospective cohort or systematic reviews or meta-analysis of comparable cohort studies |
Randomized/matched group comparison of intervention or meta-analysis of comparable studies |
|
II |
Quasi-experiments (e.g., nonrandomized) and prospective cohort studies |
Prospective cohort lacking gold standard or cross-sectional or retrospective study |
Retrospective
cohort study or |
Prospective cohort studies or cross-sectional survey |
|
III |
Case series/reports; other (e.g., consensus statements, nonsystematic reviews) |
Case series/reports; other (e.g., consensus statements, nonsystematic reviews) |
Case series/reports; other (e.g., consensus statements, nonsystematic reviews) |
Retrospective studies, including case studies |
Of course, levels of evidence speak only to the validity of studies and not their clinical applicability. Clinicians normally take other factors into account (such as cost, easy of implementation, impact of the disease when making choices for their patients). To address the broader issue of clinical applicability, the Oxford Centre also provides four grades of recommendations that can be derived from the literature and can be made by or for clinicians considering a diagnostic test, specific therapy, or long-term treatment plan for a patient:
|
A |
consistent level 1 studies |
|
B |
consistent level 2 or 3 studies or extrapolations from level 1 studies |
|
C |
level 4 studies or extrapolations from level 2 or 3 studies |
|
D |
level 5 evidence or troubling inconsistent or inconclusive studies of any level |
More recently, the GRADE Working Group (Atkins, Best, Briss and others, 2004) have developed a system for judging the strength of a recommendation that includes not only the quality of the evidence but also the potential benefits and harms, the costs/ resource utilization, the translation of the evidence into specific circumstances, and the baseline risk.
Atkins, D., Best, D., Briss, P.A., Eccles, M., Falck-Ytter, Y., Flottorp, S., et al. (2004). GRADE Working Group. Grading quality of evidence and strength of recommendations. BMJ, 328, 1490. Available as full-text via UMDNJ EZProxy
Duke University Medical Center Library (n.d.). Study design. Retrieved August 4, 2005 from http://www.mclibrary.duke.edu/subject/ebm/studies.html
Gay, J. (1999). Clinical epidemiology & evidence-based medicine glossary: Clinical study design and methods terminology. Available online at: http://www.vetmed.wsu.edu/courses-jmgay/GlossClinStudy.htm
Greer, N., Mosser, G., Logan, G., Halaas, G.W. (2000). A practical approach to evidence grading. Jt Comm J Qual Improv, 26, 700-12.
Guyatt G., Drummond R. (Eds.) (2002). Users' guide to the medical literature. Chicago, IL: AMA Press.
Guyatt, G.H., Keller , J.L., Jaeschke, R., Rosenbloom, D., Adachi, J.D., & Newhouse, M.T. (1990). The n of 1 randomized controlled trial: Clinical usefulness. Ann Intern Med, 112, 293-9.
Guyatt, G.H., Sackett D.L., Sinclair, J.C., Hayward, R., Cook, D.J., Cook, R.J., et al. (1995). Users' guide to the medical literature IX: a method for grading health care recommendations. JAMA, 274, 1800-4.
Mann, C.J. (2003). Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emerg Med J, 20, 54-60.
Michigan State University - College of Human Medicine, Department of Family Practice Reading and doing decision analyses (Module 6 in An Introduction to Information Mastery). Retrieved May 17, 2005 from http://www.poems.msu.edu/InfoMastery/DecisionAnalysis/DA.htm
National Health and Medical Research Council. (2000). How to use the evidence: assessment and application of scientific evidence. Canberra: AusInfo. Available online at:
http://www.nhmrc.gov.au/_files_nhmrc/file/publications/synopses/cp69.pdf
Oxford Centre for Evidence-based Medicine (2001). Levels of evidence and grades of recommendation. Available online at: http://www.cebm.net/?o=1025
Richardson, W.S., Wilson, M.C., Nishikawa, J., Hayward, R.S. (1995). The well-built clinical question: A key to evidence-based decisions. ACP Journal Club, 123, A-12. Retrieved June 23, 2005 from http://www.mclibrary.duke.edu/training/pdaformat/articlepda.html
Sackett, D.L. (1993). Rules of evidence and clinical recommendations for the management of patients. Can J Cardiol, 9, 487-9.
University of Sheffield, School of Health and Related Research (n.d.). Appraisal of reviews. Retrieved November 15, 2004 from http://www.shef.ac.uk/scharr/ir/units/critapp/apprev.htm
West, S., King, V., Carey, T.S., Lohr, K.N., McKoy, N., Sutton, S.F., et al. (2002). Systems to rate the strength of scientific evidence. Rockville, MD: Agency for Healthcare Research and Quality. (AHRQ publication No 02-E016.) Retrieved July 13, 2005 from http://www.ahrq.gov/clinic/epcsums/strengthsum.htm