The International Society of Pharmacoeconomics and Outcomes Research (ISPOR), the Academy of Managed Care Pharmacy (AMCP) and the National Pharmaceutical Council (NPC), have developed questionnaires to help health care decision makers assess and effectively use available comparative effectiveness studies in health technology assessments and/or formulary decisions.
The goal of these questionnaires is to provide greater uniformity and transparency in the use and evaluation of information to improve evidence-based health care decision making for better patient care.
To enhance the functionality and accessibility of these questionnaires, ISPOR has developed an interactive user-friendly version ‘Assessing the Evidence for Health Care Decision Makers’ that includes the following types of studies:
‘Assessing the Evidence for Health Care Decision Makers’* provides users with:
1) Interactive questionnaires to help determine if the study is: a) relevant to the setting/decision in question and b) credible enough to include in the overall body of evidence.
2) A personalized web-based database to: a) store their assessments and b) access them anywhere at any time.
General structure of the questionnaire
Each questionnaire includes two main sections: a) relevance and b) credibility. Credibility is further divided into several domains. Upon completing the questionnaire the user will be able to make a more substantiated judgment regarding the relevance and credibility of a study to inform a decision. No summary score is provided for the overall questionnaire or for the domains of the credibility section. This was an explicit choice in the design of these questionnaires, since individuals may place greater or lesser weight on the response to any individual question. However, some 'credibility' questions are considered critical; a negative answer to these questions suggests the presence of a “fatal flaw”. It is up to each user to decide how these answers affect the overall credibility of the study.
*In order to obtain the full potential of the assessment you are encouraged to answer as many questions as possible.
The purpose of a model in the health care field is to provide decision-makers with information about what outcomes they can expect if they implement the intervention(s) addressed in the analysis. Decision-makers can then use that information to determine if they should implement the interventions. A major dilemma facing decision makers is the degree of confidence they should place in the results of the model. Decision-makers may understand that there is always some uncertainty about the results of a modeling analysis, but they would like to know the extent to which they should trust them. Thus, it should be helpful to have a tool for assessing the degree of confidence that should be placed in a particular modeling analysis.
When evaluating the modeling analysis there are two main issues that need to be considered. One has to do with the extent to which the problem specified by the modelers is applicable to the problem faced by the decision maker. The other has to do with the accuracy of the modeling analysis for the problem the modelers specified. Every analysis is conducted for a reference setting – a specific population, interventions, comparison, outcomes, and time horizon (model setting). In order for the modeling results to accurately predict the outcomes that will occur in the setting of interest to the decision-maker (decision setting), two conditions must be met:
This leads to two main categories of questions. They address what we are calling "relevance" and "credibility." The questionnaire consists of 8 domains with a total of 15 top level questions related to the relevance and credibility of modeling study.
For some top level questions, further explanations and definition of terms are provided, as well as helper questions.
Relevance addresses the extent to which the results of the study apply to the setting of interest to the decision-maker. Stated another way, relevance asks how closely the model setting matches the decision setting. For example, a highly credible modeling study of the effects of a drug in Sweden in the 1990s may have little relevance to the assessment of a related drug (i.e., same class but different chemical structure) given at a different dose to Hispanic-Americans in California today. There is no single rating for relevance. As each decision maker is interested in his or her own setting, the relevance of the model analyses will vary for each setting. Relevance has to be determined by each decision-maker, and the relevance determined by one decision-maker will not necessarily apply to others. The “Relevance” domain has four main questions related to the population, intervention, comparators, endpoints, timeframe, and policy-relevant differences. Additional helper questions are provided to identify specific items the evaluator should think about in responding to the questions.
Credibility is the extent to which the model analyses accurately answer the question posed. For example, if the question is, what is the effect of a particular drug on particular outcomes in a particular population in a particular care delivery setting over a particular time (collectively, the "setting" of the model), then credibility addresses how accurately the model estimates that effect. Credibility is determined by the design of the model, the data used as inputs, the construction of the model, and conduct of the analyses; it does not depend on how the study will be used to make decisions. If one user has done a good job assessing the credibility of the study, then that user’s conclusions should apply equally well to all users.
The credibility of a modeling study is assessed at several levels (seven domains). Each domain includes top level questions that you should address, as well as helper questions to identify specific items the evaluator should think about in responding:
a) “Validation,” assesses how well the model has been tested to ensure it accords with available data – do these tests support the model’s credibility? This domain is key in assessing the credibility.
Validation is a set of methods for judging a model’s accuracy in making relevant predictions. That information can be used by decision-makers to determine to what extent they should trust the results. While transparency can help readers understand what a model does and how it does it, validation is the only way for readers to determine how well it does it. It is very difficult for anyone other than the modelers to assess validity fully. Models use mathematical structures to make their predictions, and directly making the validity assessments requires technical expertise and full access to the model and external data. A more feasible task for users of models is to evaluate the extent to which the modelers have assessed validity (bearing in mind that validity pertains to a particular application of the model — it is not a property of the model per se and the degree of validity required depends on the question posed). It should be noted that peer review, as currently practiced, is insufficient to determine that a model’s results are credible. Just because a modeling analysis has been published does not mean that it is credible. If a model has not been sufficiently validated, or the decision-maker cannot tell this information, then the results of the model should not be trusted (i.e., this is a fatal flaw).
b) “Design,” is whether the design of the model is credible, in the sense that it accords with what is known about the decision problem. This is often known as “face validity”.
c) “Data,” addresses if the data used in building the model were suitable for the purpose, properly analyzed and incorporated in the model.
d) “Analysis,” has to do with whether the analyses carried out with the model provide the information required to support the decision maker.
e) “Reporting,” though not specifically pertaining to a model’s credibility, is important because the model reports need to be sufficient to assess relevance and credibility.
f) “Interpretation,” of the results of the model is important. Although users are free to ignore these and reach their own conclusions, it is helpful if interpretations provided by the modelers are balanced in accordance with the results and limitations of the model and data.
g) “Conflict of interest”, it is helpful to assess the extent to which conflicts of interest exist and how they might have influenced the modeling study.
Sound comprehensive decision making requires comparisons of all relevant competing interventions. Ideally, robustly designed RCTs would simultaneously compare all interventions of interest. Unfortunately, such studies are almost never available, thereby complicating decision making. New drugs are often compared with placebo or standard care, but not against each other.
In the absence of trials involving a direct comparison on treatments of interest, an indirect comparison can provide useful evidence for the difference in treatment effects between competing interventions. For example, if two particular treatments have never been compared against each other, head-to-head, but these two treatments have been compared against a common comparator, then an indirect treatment comparison (ITC) can use the relative effects of the two treatments versus the common comparator.
Although it is often argued that indirect comparisons are needed when direct comparisons are not available, it is important to realize that both direct and indirect evidence contribute to the total body of evidence. The results from indirect evidence in combination with the direct evidence may strengthen the assessment between treatments directly evaluated.
When the available RCTs of interest do not all compare the same interventions – but each trial compares only a subset of the interventions of interest – it is possible to represent the evidence base as a network where all trials have at least one intervention in common with another. Such a network of trials involving treatments compared directly, indirectly, or both, can be synthesized by means of network meta-analysis (NMA). In traditional meta-analysis all included studies compare the same intervention with the same comparator. NMA extends this concept by including multiple pair-wise comparisons across a range of interventions and provides estimates of relative treatment effect on multiple treatment comparisons.
Despite the great value of ITC and NMA to inform health care decision making and its increasing acceptance [e.g., Pharmaceutical Benefits Advisory Committee (PBAC) in Australia, Canadian Agency for Drugs and Technologies in Health (CADTH); National Institute for Health and Clinical Excellence (NICE) in the United Kingdom, Institute for Quality and Efficiency in Health Care (IQWIG) in Germany, and Haute Autorité de Santé (HAS) in France], many decision makers might not be very familiar with these kinds of studies. A major dilemma they are facing is the degree of confidence they should place in the results obtained with an ITC or NMA. Decision makers may understand that there is always some uncertainty about the findings, but they would like to know the extent to which they should trust the findings. Thus, it would be helpful to have a tool for assessing the degree of confidence that should be placed in a particular ITC or NMA.
When evaluating an ITC or NMA there are two main issues that need to be considered. One has to do with the extent to which the ITC or NMA is applicable to the problem faced by the decision maker. The other has to do with the extent the findings of the NMA are trustworthy. This leads to two main categories of questions which address what we are calling "relevance" and "credibility." The questionnaire consists of 6 domains with a total of 22 questions related to the relevance and credibility of an ITC or NMA.
Additional information and explanation are provided for each question through the questionnaire explanation links, as well as definition of some terms.
We use the terms ITC or NMA interchangeably. If a distinction needs to be made, one can arguably call a NMA an indirect comparison where, for some of the available head-to-head or direct comparisons, multiple studies are available. Furthermore, with a NMA there may be a combination of direct and indirect evidence for some treatment comparisons (i.e., treatment contrasts).
The term “relevance” addresses the extent to which the results of the study, if accurate, apply to the setting of interest to the decision maker. For example, a high-quality NMA of biologics in rheumatoid arthritis that only includes RCTs performed in an Asian population may have little relevance for a Caucasian population if the efficacy of the biologics is affected by race. There is no single answer for relevance; the relevance of the NMA may vary for each setting. Relevance has to be determined by each decision maker and the relevance score determined by one decision maker will not necessarily apply to other decision makers. The “Relevance” domain addresses questions related to the population, comparators, endpoints, timeframe, and policy-relevant differences (four questions).
Credibility is defined as the extent to which the ITC or NMA accurately answers the question it is designed to answer. For this assessment tool we take the position that the credibility is not limited to the internal validity, but also relates to reporting quality and transparency, interpretation, and conflict of interest. The internal validity of the ITC or NMA is compromised in the presence of bias in the identification and selection of studies, bias in the individual studies used for the NMA, and bias introduced by the statistical methods used. The credibility of the ITC or NMA is assessed with questions in five domains:
The choice of a study design follows from the research question, but optimal design must consider issues of the expected value of the information to be generated, clinical equipoise, timing, feasibility, cost, ethics, and legality. Potential study designs to assess comparative effectiveness include retrospective observational studies, prospective observational studies, randomized clinical trials (RCTs), and naturalistic (“pragmatic”) RCTs. If an observational study* is to assess comparative effectiveness, the first choice to be made is whether the design should be retrospective or prospective.
*Observational Study: A study in which participants are identified according to current risk status or exposure, and followed forward through time to observe outcomes. Participants are not randomized or otherwise pre-assigned to an exposure. The choice of treatments is up to patients and their physicians (subject to any third-party payer constraints).
When evaluating an observational study design, it is important to determine whether there are confounding factors or biases that impact the interpretation of the results.
Prospective observational studies are those in which participants are not randomized or otherwise assigned to an exposure and for which the consequential outcomes of interest occur after study commencement (including creation of a study protocol and analysis plan, and study initiation). They are often longitudinal in nature. Exposure to any of the interventions being studied may or may not have been recorded before the study initiation such as when a prospective observational study uses an existing registry cohort. Exposure may include a pharmaceutical intervention, surgery, medical device, prescription, or decision to treat.
Retrospective observational studies are those that use existing data sources in which both exposure and outcomes have already occurred.
Study design and analytic approaches can be used to address potential biases** including confounding,*** and therefore are critical to the credibility of the findings from observational studies. In the presence of strong treatment preferences, observational designs and analytic approaches may not be adequate to address confounding and other biases.
**Major types of bias include channeling bias, loss to follow-up, and misclassification of treatment and outcomes.
***Confounders are variables or factors that are correlated with treatment and outcome variables that lie outside the causal pathway between treatment and outcome. Confounders may be an artifact of common epidemiologic biases such as channeling or indication bias.
In assessing observational studies, there are two main categories of questions. They address what we are calling "relevance" and "credibility". The questionnaire consists of 7 domains with a total of 34 top level questions related to the relevance and credibility of an observational study.
For some top level questions, further explanations and definition of terms are provided, as well as helper questions, whenever applicable.
The term “relevance” addresses the extent to which the results of the study, if accurate, apply to the setting of interest to the decision-maker. It addresses issues of external validity (population, comparators, endpoints, timeframe) and policy-relevant differences. To illustrate, a highly credible study of the effect of a drug in Sweden in the 1990s may have little relevance to the assessment of a related drug (e.g. same class but different chemical structure) given at a different dose to a Hispanic Americans in Los Angeles today. There is no single answer for relevance. Each decision maker is interested in applying study results to their own setting and the relevance of a study may vary accordingly. Relevance has to be determined by each decision maker and the relevance assessment determined by one decision maker will not necessarily apply to other decision makers. The “Relevance” domain addresses questions related to the population, comparators, endpoints, timeframe, and policy-relevant differences (four questions).
Credibility is the extent to which a study’s findings can accurately answer the study hypotheses or questions. Credibility and is akin to internal validity. In order for the study to have valid inferences from the reported data, the study must account for potential biases. For example, new treatments are frequently reserved for sicker or more complicated patients. When comparing outcomes in these patients to less sick patients receiving other medications, the outcomes may not be as good. However, this may be due to the underlying severity of the disease and not due to differences in treatment effectiveness. Appropriate design and analytic approaches can permit the assessment of what is the contribution of the treatment effectiveness to the observed differences in outcomes. How well these approaches work will vary across studies and the assessment of the credibility of observational studies rests on a critical assessment of the success of design and analytic approaches.
Treatment inferences from observational studies are all potentially biased from imbalances across treatment groups on confounding variables whether the variables are observed or not. Carefully identifying all potential confounders is a critical step in any observational comparative effectiveness research study. Confounding can be controlled statistically using a wide arrange of multivariate approaches, however, if a statistical model excludes a confounding variable, the estimates of treatment effects suffer from omitted variable bias in just about all analyses except when a valid instrumental variables approach is undertaken.
When assessing a research study, one should check if the authors have considered all potential confounding factors and conducted a literature review to identify variables that are known to influence the outcome variable. Often the study data will not contain information for some confounding variables (e.g. race, income, exercise) and will be omitted variables from the analysis. When the analysis does not include key confounders, the potential impact of these should be discussed including the direction and magnitude of the potential bias.
The credibility of the observational study is assessed with questions in 6 domains:
For some top level questions, further explanations and definition of terms are provided, as well as helper questions, whenever applicable.