September 12, 2016
Steven D. Pearson, MD, MSc, FRCP
President
Institute for Clinical and Economic Review
One State Street, Suite 1050
Boston, MA 02109 USA
RE: Call for Proposed Improvements to ICER’s Value Assessment Framework
Submitted electronically via: [email protected]
Dear Dr. Pearson:
The National Pharmaceutical Council (NPC) shares your interest in recognizing the many components of health care value, and in using evidence as the cornerstone for making the health care system more effective and efficient. With this view in mind, NPC appreciates ICER’s call for comprehensive and directed suggestions for improvements to the ICER Value Assessment Framework. NPC recognizes the changes you have made to the framework to date, including the two highlighted in the call for comments.
As you know, NPC is a health policy research organization dedicated to the advancement of good evidence and science, and to fostering an environment in the United States that supports medical innovation. NPC is supported by the major U.S. research-based biopharmaceutical companies. We focus on research development, information dissemination, education and communication of the critical issues of evidence, innovation and the value of medicines for patients. Our research helps inform critical health care policy debates and supports the achievement of the best patient outcomes in the most efficient way possible.
As stated in NPC’s Guiding Practices for Patient-Centered Value Assessment (Guiding Practices),[i] we believe value assessments can be an important tool for the complex decisions organizations and patients face when considering treatment options. Assessments that adhere to the Guiding Practices can support optimal value for patients. There are several key areas where changes to ICER’s current value assessment framework will create more alignment with the Guiding Practices.
The most critical of these areas is the assessment of budget impact and the way it is intertwined with value, most notably the calculation of a “value-based benchmark price.” The Guiding Practices state that budget impact assessments — which are measures of resource use, not of value — should remain completely separate from value assessments. Other key areas are highlighted below.
I. Care Value
ICER’s evaluation of “care value” is aligned with the Guiding Practices in several areas. For example, the time horizon is long-term, a broad array of factors that are important to patients and society are considered (albeit qualitatively, which does not give these important, patient-centered factors sufficient impact), and cost offsets are included. However, there are areas of misalignment in the care value evaluation, most notably the use of a single quality-adjusted life year (QALY) threshold across all populations and diseases (further detail below).
ICER does reach out proactively to manufacturers and provides high-level information. However, this level of information is not sufficient to enable reviewers to reproduce the results and provide meaningful, real-time input. Full transparency — down to the equation level — is needed to enable reproducible results and support fully informed stakeholder collaboration. NPC recommends releasing the model to all stakeholders along with the draft report, perhaps on a protected web-based platform. Furthermore, (expedited) peer review of the model before it is finalized is recommended.
NPC recommends the routine use of sensitivity analyses. Many judgments come into play when conducting meta-analysis and cost-effectiveness analysis, thus, sensitivity analyses are critical to examine whether the findings were influenced by any decisions made in the analysis. Sensitivity analysis can also be used to explore heterogeneity of treatment effects, avoiding an over-reliance upon methods based on averaged estimates. The accuracy of these analyses is in question when no sensitivity analyses have been performed to test the assumptions of the model.
Heterogeneity occurs not just at the patient level, but at the payer level, too. The health care system in the U.S. includes quite diverse payers and payment systems. Applying a framework and decision criteria that are similar to those used in a relatively homogeneous and centrally driven payer health care system does not include the range of decision attributes for relevant U.S. stakeholders. Ideally, the value framework should present a transparent and modifiable output that allows the user to adjust any cost-effectiveness results according to user/health plan preferences and decision needs, and to include the range of factors of interest to that user.
I. A. Integrating “Additional Benefits or Disadvantages” and “Contextual Considerations”
ICER seeks: Methods to integrate patient and clinician perspectives on the value of interventions that might not be adequately reflected in the scientific literature, elements of value intended to fall in the current value framework within “additional benefits or disadvantages” and “contextual considerations.”
As noted above, ICER assessments attempt to include a broad array of factors that are important to patients and society. However, these factors are currently incorporated in a qualitative manner as “additional benefits and disadvantages” and “contextual considerations,” and it is incumbent upon the voting panel to recognize the value of these factors and reflect them in their care value vote. This qualitative inclusion does not allow these important, patient-centered factors to have a strong enough — or consistent — impact on an assessment. ICER is seeking methods for more formal integration, and NPC agrees it is important to include these factors in a more robust and representative manner.
ICER’s current approach leaves the consideration of these factors up to the discretion of the voting panel, which may not have the expertise or appropriate context to meaningfully evaluate them. Moreover, this valuation approach is heavily dependent upon the perspectives and values of a small group, and is not transparent. This approach is insufficient to incorporate the impact of these important patient-centered factors.
Previous work by ICER identifies many examples of these factors (see tables below), [ii] which can indeed be quantified and incorporated into a composite measure of benefit or effectiveness.
There are many precedents and good examples of cost-effectiveness in health care that optimize the assessment of clinical effectiveness with both individual and composite measures (rather than the sole use of the QALY).[iii],[iv],[v] Research — particularly in worksite health promotion — has shown how some aspects can be quantified and used to incorporate indirect costs into cost-effectiveness models. A notable example is quantification of productivity loss using the Work Productivity and Activity Impairment (WPAI) Questionnaire in the assessment of absenteeism, presenteeism, and daily activity impairment due to general health or due to a specific health condition; it has been widely used and validated in the study of many diseases.[vi]
In instances where it is feasible to do so, quantitative credit should be given for these factors. In instances where there is not yet a quantitative path forward, the qualitative inclusion of such factors in the models should be formalized as part of the framework so patient-centered concerns can be meaningfully considered in the care value assessment. Examples of transparent methods of structuring qualitative assessments come from both EMA and the FDA in how they conduct their benefit-risk assessments.[vii],[viii]
Recognizing the heterogeneity of payers in the U.S., the value framework should acknowledge that different users will have different preferences for including/excluding these factors, and allow for user customization of factors and weights for said factors.
I. B. Incremental Cost-Effectiveness Ratios
ICER seeks: Incremental cost-effectiveness ratios: appropriate thresholds, best practice in capturing health outcomes through the QALY or other measures
ICER’s current assessments apply the threshold range of $100,000 to $150,000 per QALY. This approach is inconsistent with the Guiding Practices, which emphasize that no single threshold can or should be universally applicable, as thresholds are likely to vary by population and disease. In setting particular thresholds, the Guiding Practices also recommend a multi-stakeholder evaluation process reflecting societal values.
The thresholds are used by ICER to set a “value-based price,” on which stakeholders and the media focus. Using a threshold that is not applicable for a population or disease produces an invalid value-based price, yet decisions will be made based on this price. NPC recommends moving away from a cost-effectiveness-derived, value-based price as a single number to a discussion about the implications for various parameters/assumptions in the cost-effectiveness evaluation for a specific disease or condition. ICER could instead lead the conversation about what the best parameters and assumptions are when it comes to modeling cost-effectiveness for a specific disease. (NPC notes this recommendation is specific to the “care value” value-based price and not the “budget impact” version.)
I. B. 1. The QALY Has Serious Limitations and Caution Should be Exercised in its Use
A cost-effectiveness ratio is used to determine if a treatment provides good value (expressed as a health outcome) relative to its cost.[ix] Different treatments produce different health outcomes and it can be difficult to compare across such a wide variety of outcomes. The QALY is designed to express a variety of outcomes in a composite measure (that combines length and quality of life) so they can be compared more easily. However, the QALY has not proven fit for purpose for this goal.
Use of the QALY poses several serious limitations, primarily ethical considerations, methodologic issues, and disease-specific considerations.[x] The QALY is designed to maximize health, which excludes other treatment benefits that are of importance to patients (e.g., productivity). Maximizing health is not always the focus of treatment, particularly for drugs that treat the elderly, in which case health care and social care are inevitably intertwined.[xi]
The QALY traditionally includes physical domains (e.g., mobility), but does not effectively capture mental and social domains, which have been rated as more essential for inclusion in a health-related quality of life measure than physical domains.[xii] This is especially important for treatments for conditions that improve quality of life but do not extend life per se.
Other shortcomings of the QALY include:
- QALYs may undervalue survival benefits in populations presumed to have poor quality of life, e.g., Oncology and CHF patients;
- Cost per QALY approaches may under-incent development of orphan/rare disease products, which face a number of economic cost-effectiveness challenges;[xiii]
- Treatments for acute conditions may be undervalued;[xiv] and
- Age differences are inadequately managed, i.e., interventions in youth are inherently valued above those targeted for later in life.
The ECHOUTCOME project tested the validity of the assumptions underlying the use of the QALY and found it to be an invalid measure. The researchers note its use leads to inconsistent recommendations on access to innovative health technologies and medicines, and that HTA groups should use other methods.[xv]
ICER itself identifies some of the key problems with the QALY:
Whereas many international payer agencies have adopted the QALY as a universal metric of health outcomes by which to analyze comparative net health benefit across different types of medical interventions, very few payers in the US use the QALY in a systematic way. In part this is because of methodological concerns about whether the QALY adequately reflects the preferences of patients for different types of health outcomes. There are long-standing concerns that QALYs fail to capture important societal values favoring health benefits for patients with the most severe illnesses. And QALYs usually must be estimated from published literature through analyses that can be complex, time consuming, and ultimately lacking in the degree of transparency that is one of the most important goals of a value framework. The methodological concerns are most relevant when QALYs are used as part of analyses comparing the incremental cost-effectiveness of treatments for different conditions.[xvi]
ICER’s current approach of using the QALY as the sole measure of effectiveness in its evaluations is not addressing the very limitations stated above.
Taken together, these serious limitations should be acknowledged and alternative weighting/approaches included in ICER’s value framework. No single measure or set of measures will be optimal across evaluations. Measures of benefit and effectiveness can, and should, vary across evaluations.
Ideally, alternative approaches should be designed by the research community to incorporate the additional benefits and contextual considerations referenced in the previous section. This would mean that the QALY would be replaced with more appropriate and sensitive measures. Cost-effectiveness analyses do not require that the QALY be used; given that the QALY is often not fit for this purpose, it should be replaced with suitable alternatives. Caution should be exercised with its use, and the limitations of its use should be explicitly stated when it is used.
I. B. 2. Thresholds Will Vary Across Diseases and Populations
If the QALY is used (despite the limitations noted above), it should be recognized that no single threshold can or should be universally applicable, as thresholds are likely to vary by decision-maker, population, and disease. Neumann states:
…it is impossible to find a single threshold to represent society's willingness to pay for QALYs gained, because different approaches yield different values, each of which is based on different assumptions, inferences, and contexts. Searching for a single benchmark is at best a quixotic exercise because there is no threshold that is appropriate in all decision contexts.[xvii]
Evidence exists that willingness to pay for minor conditions is less than that for life-saving conditions.[xviii] Willingness to pay for oncology suggests thresholds as high as $300,000/QALY.[xix],[xx] QALY thresholds are ill-defined for acute, short-term treatments such as anesthesia, for which the benefit of reduction of pain and suffering is measured in literally minutes or hours. The resulting cost per QALY calculation can be stratospheric, but no one would realistically suggest operating without anesthesia because a QALY threshold has been crossed.
It is very hard (and politically unpalatable) to declare one condition more important than another. However, just because it is hard does not mean that it should not be done. Given this evidence, thresholds should vary for different decision-makers, diseases, and populations, if they are used at all.
I. B. 3. Perspective
Cost-effectiveness should, at a minimum, take a societal perspective and not a payer perspective. The societal perspective can allow for many additional constructs to be considered, such as “value of innovation” and “value of scientific spillover.” A societal perspective will ensure that appropriate cost-offsets are included and not just those that will be accrued by the payer.
An even better approach would be to take the emerging importance of a “Patient Perspective,” which is usually consistent with the societal perspective but allows for constructs that matter significantly to patients (e.g., the “value of hope”) to be considered. Finally, ICER should clearly state the perspective it takes and be ready to address issues of inclusion and exclusion in both its assessment of benefit and cost.
I. B. 4. Indirect Treatment Comparisons
The limitations of indirect treatment comparisons are well known. ICER’s reliance upon this approach is problematic and can lead to significantly flawed conclusions. Use of indirect treatment comparisons in the absence of direct head-to-head comparative data suffers from an inability to fully adjust for differences in trial populations and protocols.
For example, for the review of relapsing-remitting MS (RRMS), comparisons across trials on annualized relapse rate (ARR) may be complicated by changes in relapse rates over time, suggesting changes in the natural history of disease. The time periods over which ARRs are calculated also differ from one trial to another. Differences in inclusion and exclusion criteria, baseline characteristics between trials, and even variability in definition of key outcomes can introduce bias in indirect treatment comparisons across MS trials.[xxi],[xxii]
Indirect treatment comparison is especially problematic in the instance of emerging/ unapproved products or for evaluation of off-label usages for products, for which available comparative information is very limited and the overall evidence base has not been finalized by regulatory organizations. ICER should focus on FDA-approved products and evaluate indications based on well-controlled clinical trials (rather than integrating earlier reviews of unapproved products and incorporating products used for off label purposes), and limit firm comparative conclusions to circumstances when direct comparative data exist and heterogeneity between populations and studies is limited.
II. Budget Impact
ICER’s evaluation of “health system value” — which confounds budget impact and value — is not aligned at all with the Guiding Practices. Addressing this misalignment is of paramount importance to support informed health care decision-making.
Budget impact assessment is a measure of resource use, not a measure of value, and it has no role in value assessment. Eliminating budget impact assessment completely from ICER’s reviews is the most definitive way of keeping the concepts separate. An alternate (but less definitive) method of keeping the concepts separate is to refrain from moving beyond an estimate of budget impact into an assessment of affordability. NPC strongly recommends that ICER revise the framework in a manner that ensures this separation of budget impact and value.
Accordingly, “Health System Value” should be renamed “Short-Term Budget Impact.” The name “health system value” is misleading and suggests that the assessment is representative of health system benefit relative to health system cost. In fact, ICER’s assessment is simply an estimate of budget impact, and should be referred to as such.
II. A. Market Uptake and “Potential” Short-term Budget Impact
ICER seeks: Methods to estimate the market uptake and “potential” short-term budget impact of new interventions as part of judging whether the introduction of a new intervention may raise affordability concerns without heightened medical management, lower prices, or other measures.
An estimate of budget impact is a necessary but insufficient part of evaluating affordability. NPC offers recommendations on estimating uptake rates and short-term budget impact with the explicit caveat — as per the Guiding Practices — that it is not appropriate to hold a budget impact estimate up against an artificial affordability threshold.
Budget impact assessments are important to payers. They will be most relevant to payers if the assessments are realistic and representative of the varied scenarios individual payers may face. Since there is no single U.S. national health care budget holder, the current approach is neither realistic nor representative of these scenarios.
There are three key components to the budget impact assessment: utilization, price, and time horizon, all of which need improvement under ICER’s current approach. The current approach creates upwardly biased budget estimates, which can have unintended consequences for chronic diseases that impact large populations (e.g., Alzheimer’s).
II. A. 1. Use Realistic Estimates of Utilization and Include Sensitivity Analysis
The Guiding Practices recommend the use of realistic estimates regarding a treatment’s uptake rate. ICER assessments currently unrealistically assume unmanaged utilization and incorporate ICER predictions for the level of uptake after five years. A recent study found that the ICER one-year uptake rate estimates for PCSK9 inhibitors were significantly overestimated.[xxiii] More realistic utilization estimates would: (1) be based on the typical medication management that payers would use to impact utilization in the clinical area of interest, (2) incorporate uptake predictions from manufacturers, clinical experts, and/or analysis from claims database of recently launched products or similar analogues, (3) be limited to the populations in scope, and (4) use sensitivity analysis to capture uncertainty and the range of possible uptake rates.
II. A. 2. Use Realistic Estimate of Price
The Guiding Practices recommend use of costs that are representative of the net price most relevant to the user. The “list price” that ICER assessments currently utilize does not represent the actual discounted price that is relevant to, and negotiated by, payers. Using third-party data to obtain an industry-wide discount rate estimate and conducting sensitivity analysis around this rate (using a range of discount assumptions) would provide a more realistic price estimate. Following International Society for Pharmacoeconomics and Outcomes Research (ISPOR) good research practices for measuring drug costs is recommended.[xxiv]
The Guiding Practices recommend incorporating reductions in cost due to generic entry. For example, in the multiple sclerosis class review, there are three oral medications (dimethyl fumarate, fingolimod, and teriflunomide), which will have generic alternatives available during the assessment time horizon. Third-party data and existing research[xxv],[xxvi] can be used to provide estimates of the expected reduction in price due to generic entry, and this reduction should be included in the budget impact estimate.
II. A. 3. Use Multiple Time Horizons, Including Lifetime
For the time horizon, budget impact assessments face the tension between payer budget windows (1-2 years) and long-term horizons that matter to patients and capture more cost offsets. ICER assessments currently use a five-year window as a compromise. Projecting budget impact should include time horizons that are relevant for the specific assessment. Using multiple time horizons, including a lifetime horizon (when applicable), could better satisfy the needs of all stakeholders.
Using uptake rates and prices that are higher than the managed uptake rates and discounted prices that payers will actually face biases ICER’s budget impact estimates upwards. A recent analysis by the Partnership for Health Analytics Research (PHAR) estimated the difference in ICER’s one-year prediction of the cost for the PCSK9 inhibitors and actual spending.[xxvii] ICER predicted $7.2 billion, while actual sales are estimated to be $83 million. The magnitude of the difference between these two estimates is so large that it raises significant concerns over using the ICER budget impact estimates for decision-making.
II. B. Affordability
ICER seeks: Methods to set a threshold for potential short-term budget impact that can serve as a useful “alarm bell” for policymakers to signal consideration of whether affordability may need to be addressed through various measures in order to improve the impact of new interventions on overall health system value.
II. B. 1. Separate Budget Impact from Affordability
Short-term budget impact is a measure of resource use and should remain separate from affordability. Affordability is an important concept for society. Evaluating affordability involves making assessments and trade-offs at an overall health system level (i.e., a broad assessment of all investments in a health care system) and beyond the health system (i.e., spending on health care vs. other societal considerations such as education, police, and roads).
A comprehensive approach to affordability requires considerations of concepts such as disinvestment and willingness to pay, needs to be informed by cultural and societal values and by health and non-health needs, and requires broad stakeholder involvement. ICER’s current approach to assessing affordability — setting an “alarm bell” threshold — is not a comprehensive consideration of the health care system, does not consider societal values, and does not adequately measure affordability. In addition, it creates unnecessary fear and anxiety around the numbers.
Not only would an affordability assessment require decisions about health care spending vs. non-health care spending, it would also require societal decisions about intra-health care spending — tradeoffs regarding spending on the elderly versus the young, rare versus common disease, curative therapies versus prolonging life versus quality-of-life enhancement, as well as allocations between medications, surgery, hospital care, and physician services. ICER’s current framework and stakeholder input process does not incorporate these broader factors required to assess affordability and therefore its focus should not extend beyond an assessment of budget impact; the assessment of affordability should be eliminated.
II. B. 2. Artificial Affordability Caps and Derived Prices are Inappropriate
The Guiding Practices state that an assessment of budget impact should not be judged against artificial affordability caps. As noted above, an affordability assessment needs to look broadly at all health care spending that is relevant to achieving a given health outcome. ICER looks more narrowly at a particular treatment and determines whether spending on that treatment might exceed a fixed portion of drug expenditures.
ICER’s current approach of setting a uniform “alarm bell” threshold based on a fixed portion of drug expenditures creates an artificial affordability cap that does not conform to historical drug spending patterns and could have negative, unintended consequences. A forthcoming analysis by IMS and NPC demonstrates this fact.[xxviii] It shows that substantial variability exists in new-drug spending across years, as well as for individual drugs within years. Setting a single spending cap at the individual product level as ICER does presupposes that drug spending is relatively uniform and predictable across and within years.
Furthermore, the IMS/NPC analysis shows that only a very small percentage of drugs each year exceed the artificial cap created by ICER. Since the large majority of products are well below the threshold, that makes headroom for those very few products which might have a larger expected budget impact. A single threshold applicable to all new drugs does not consider this empirical reality. Since substantial variation does exist, and very few products exceed the ICER-specified cap, setting a single budget threshold at the individual product level and using it as a revenue cap is inappropriate and has the potential for significant unintended consequences.
One such unintended consequence is the disincentivization of development of drugs for broad populations with unmet need. Predictions for budget impact will increase as the predicted number of patients increases, causing a treatment for a broad population to be more likely to trigger an “alarm bell” threshold. However, a comprehensive affordability assessment that considers societal values and the broader public health perspective would likely result in a higher spending allocation for such a treatment.
The ICER threshold equation assumes that the allocation of health care spending among drugs, hospital care, imaging, and physician care is the “correct” allocation across resources. Perhaps more should be spent on drugs and less on imaging for optimal resource allocation, or vice versa. The derived threshold assumes that the current allocation is optimal, an unproven assumption that is likely incorrect.
Additionally, the ICER threshold equation is benchmarked to the annual GDP growth rate plus one percent. This is counter to innovation patterns that may occur periodically rather than at a constant rate.
ICER could provide a ranged budget impact assessment based upon sensitivity analysis. Linking that assessment to “affordability” to derive “value-based prices,” however, would not be appropriate based upon the issues highlighted above. Identifying the potential range of budget impacts and raising the need for public dialogue among all stakeholders for high budget impact therapies is more appropriate.
III. Assessment Process
ICER’s assessment process includes advance notifications of assessment topics and an opportunity for all stakeholders to submit public comments on scoping documents and reports (albeit the time to do so is too short), which are in alignment with the Guiding Practices. There are, however, many areas of concern where the assessment process can be made more robust.
III. A. Bring Broader Stakeholder Representation into the Process
ICER has sought to improve stakeholder engagement over the past year. The introduction of engagement guides for manufacturers and patients has been helpful to those groups. Some manufacturers have expressed appreciation for the proactive outreach and earlier engagement that ICER has implemented in more recent reviews, but much more can be done to bring broad stakeholder representation into the assessment process. Although outreach is occurring, much greater engagement and feedback of patient groups is needed. NPC recommends using “The Patient Voice in Value: The National Health Council Patient-Centered Value Model Rubric” as a guide to ensure the patient community is engaged throughout the process.[xxix]
III. B. Include Broader Perspectives and Clinical Expertise to Voting Panel
Although a variety of perspectives are represented at ICER meetings, comments made by panel participants during meetings often suggest they are approaching value assessment solely through a cost-focused lens. Panel members should have a broader view of value beyond cost, and should be more diverse in their views.
Providing a mechanism for stakeholder representatives (e.g., consumer, industry) to receive nominations for inclusion on a panel, which would be reviewed by a separate committee, could bring this broader perspective to the panels.
It also is important for multiple voting panel members to have expertise in the disease area under discussion to improve clinical accuracy of their assessments. Such expertise was lacking in the recent multiple myeloma panel.
Voting panel members should also receive some level of (independent) training on the fundamentals of c