Special Series on Comparative Effectiveness Research: Challenges to Real-World Solutions to Quality Improvement in Personalized Medicine
McShane and Hayes1 and Ginsburg and Kuderer2 in the special series issue of the Journal provide a comprehensive vision for improved conduct and reporting of predictive/prognostic biomarker research in personalized cancer medicine. Their visions are prompted by the need for more methodologic rigor for validation studies in this field as highlighted by a critical appraisal of 37 published reports in a systematic review.3
They invoke three strategies for improving study quality: first, comparative effectiveness research (CER) that addresses population-level effects of alternative interventions; second, quality improvement (QI) tools as methodologic standards; and third, infrastructure development for informatics and data linkage, and institutional networking with data sharing.
CER was popularized with the passing in 2009 of the American Reinvestment and Recovery Act (ARRA). Among other objectives, CER aims to compensate for limitations in the inferences that can be made from randomized controlled trials (RCTs) about an intervention's performance when applied in the messier real world.4 CER blends the established fields of outcomes research informed by classical epidemiology and health technology assessment that includes economic evaluation. CER is a concept that bundles these various evaluation approaches for population-level decision making, although techniques like health technology assessment and economic analyses have been practiced at the policy level for decades in other countries where fears about so-called rationalized health care are less part of the political landscape.5,6
An important aspect of CER is exploration of the development of informatics infrastructures, networks for data linkage, and biobanking standards to achieve the evaluation requirements for personalized medicine. Ginsburg and Kuderer briefly sketch seven active CER projects in genomic medicine under the National Cancer Institute ARRA-funded program. They share the common theme of infrastructure development (technical and social) for information capture, linking, and sharing—the third strategy for QI listed. Such infrastructure development may be the first necessary step in foundation building for CER to facilitate the subsequent analytic components. So, it is not yet clear from these brief descriptions where the comparisons of alternative interventions will be made in the analysis phases of CER and whether such comparisons on the basis of large data sets will necessarily provide more valid inferences than those from well-executed RCTs.
But the articles by McShane and Hayes1 and by Ginsburg and Kuderer2 raise the concept of clinical utility as the underlying rationale for the relevance of CER, claiming that analytic and clinical validity, although necessary, are not sufficient for adopting technological innovations into the population. CER claims to address the clinical utility issue. The term clinical utility as usually applied is restricted to the clinical (physical, emotional, and social) consequences of interventions, both good and harmful, to the individuals exposed to them.7 Issues such as financial or out-of-pocket costs to individuals and convenience can be considered part of clinical utility; but Ginsburg and Kuderer include costs to society as part of the concept.
It is worthwhile distinguishing between clinical utility and health systems utility, with the former applying to decisions or judgments by health care providers and patients about consequences to individuals and the latter to policy advisors and decision makers considering consequences for the population and the system as a whole. What individuals demand for themselves and what is appropriate for society as a whole can come into conflict; and it is not clear at this time that the public in the United States is ready for setting priorities, interpreted as health care rationing. For the CER agenda to achieve its objectives in informing policies about the affordability of new interventions, whether well or imperfectly evaluated, it will be important to overcome such attitudes. A poignant illustration of differences in societal attitudes related to this issue is the contrast between how the US Food and Drug Administration and the United Kingdom National Institutes of Health and Clinical Excellence approached decisions about approval of bevacizumab for metastatic breast cancer, with economic evaluation missing from the former and an integral part of the latter evaluation.8,9 As a concept, personalized medicine does create the expectation of health care for individuals, which could undermine policy informing aspirations of CER for setting priorities for the population as a whole.
Proponents of CER also claim that it has the potential to address limitations in the inferences that can be made at a population level from the results of RCTs that are conducted in more restricted settings. But we need to avoid assuming that the failure to reproduce the results of RCTs for interventions when applied to the real world is necessarily a comment on the potential benefits of the intervention itself; discordance between RCT results and those in the real world may have more to do with the circumstances under which the intervention is delivered. Exploration of these circumstances, rather than rejection of an intervention that might benefit subpopulations when appropriately applied, is a potentially important agenda for CER.
Because of the variety of factors that can affect the performance of an intervention in the real world, it is not unreasonable to question CER results obtained from selected populations as necessarily generalizable to other populations or regions where there may be differences in health services delivery mechanisms, access to services, insurance arrangements, age structures, and socioeconomic factors among others. We already know from the pioneering work of Wennberg and Gittelsohn10 that there can be considerable variation in patterns of care and outcomes across geographic regions.11 We will need to be more explicit about how CER can most appropriately contribute, what inferences can be made from its comparisons, and what further explorations of the data might be needed before accepting them at face value.
A key contribution from these papers is reconciliation of the lingering doubts created by the political debates around ARRA in which a coalition of influential partners opposed the $1.1 billion investment in evaluation science on the grounds that it would stymie progress in personalized medicine.12 Both articles confirm, through CER, the relevance of traditional evaluation approaches for personalized medicine interventions.
The two sets of authors set up a vision for improved conduct and reporting of biomarker validation studies for personalized medicine using tools (often presented as recommendations or statements) patterned after the evidence-based movement (eg, CONSORT; Assessment of Multiple Systematic Reviews [AMSTAR]; Grading of Recommendations Assessment, Development, and Evaluation [GRADE]; Appraisal of Guidelines for Research and Evaluation [AGREE]).13–16 But, just as McShane and Hayes1 and Ginsburg and Kuderer2 highlight the distinction between real-world circumstances requiring CER approaches and the controlled environments of RCTs, their own recommendations for use of QI tools (eg, Biospecimen Reporting for Improved Study Quality, Reporting Recommendations for Tumor Marker Prognostic Studies [REMARK], Strengthening the Reporting of Observational Genetic Association Studies, and the Evaluation of Genomic Applications in Practice and Prevention [EGAPP] initiative) will themselves face real-world challenges for adoption.17–20
Analogous to the gap identified between recommendations from practice guidelines and their use in the real world,21 the recommendations from these QI tools for genomic medicine will require active implementation. Lessons for promoting adoption of standards can be learned from the emerging fields of knowledge translation and implementation science22–24 that show dissemination of recommendations through the literature to be a relatively ineffective, and perhaps the least effective, strategy to promote behavior change. In this context, it is surprising how timid is the language in the publications of tools for improved reporting in genomic medicine, often couching recommendations as voluntary, awareness-raising, and intended for readers as opposed to doers of research. For example, the publication of QI standards in genomic medicine by the EGAPP Working Group is accompanied by disclaimers in almost apologetic terms about any authority (other than moral) for influence or enforcement of its recommendations.20 Although this may be appropriate given the positioning of the EGAPP Working Group, it does speak to the need for bolder approaches to stimulate adoption of these QI tools. On the other hand, it is encouraging that tools such as these have been successfully promoted and adopted in the evidence-based realm by influential journals and other organizations—from which lessons can be learned. It is encouraging that Journal of Clnical Oncology has adopted REMARK as a reporting standard. A recent proposal of a tool to facilitate adoption of the Biospecimen Reporting for Improved Study Quality recommendations by pathologists is a welcome action-oriented strategy for promoting their use.25 In addition, the recent Institute of Medicine Report on Evolution of Translational Omics26 and the availability of ARRA related-funding could stimulate accelerated uptake.
As CER and QI tools seek to address the real-world circumstances that affect quality in the conduct and reporting of genomic research, so too do these strategies face their own real-world challenges, some of which I highlight here.
Steven Lewis27 advanced ten propositions as a theory of indifference to research-based evidence to explain unexpectedly sluggish adoption of evidence-based medicine and decision making. All 10 of Lewis's propositions cannot be recounted here, but especially relevant is his suggestion that “the first step is to reconceive evidence-based medicine and decision making as habits of mind rather than a toolbox and to recognize that the sociology of knowledge is as important as its technical content.”27(p166) The proposals in the two articles by McShane and Hayes1 and Ginsburg and Kuderer2 should be evaluated in these terms to inform socially relevant strategies of implementation for real-world circumstances. In this context, it would be useful to learn what strategies were used to persuade Journal of Clinical Oncology to adopt the REMARK standards and what progress is being made with other journals involving REMARK and other proposed genomic standards.
To be implemented consistently, recommendations from QI tools in genomic medicine will require consistent adherence to rigorous standards not only by individual researchers but also by those responsible for managing biobanks, where standards ranging from tissue and data handling to privacy protection and ethics to governance models have been proposed.28–33
Such strict standards would require a level of oversight and transparency achievable only through some voluntary mechanism (eg, accreditation and/or certification) or an official regulatory framework. Upgrading the standards of long-standing biobanks developed in less bureaucratic eras could be accomplished by empowering institutional research review boards to apply standards for new research proposals involving existing biobanks and facilitating upgrades when already approved studies come forward with annual renewals or amendments.
A large proportion of clinical research is sponsored by industry, where biobanking has become routine. Invoking standards of performance for academic institutions without similar requirements and oversight for industry-managed biobanks would undermine the entire quality movement. It is noted that publications of the QI tools proposed as standards have come from academically oriented groups and public agencies, with little apparent participation from industry.
The peer-review literature is not organized for decision making. Implementation of the proposed standards should not suppress original preliminary research intended as hypothesis generating or as proof of concept. Criteria could be developed and used by journals to distinguish between manuscripts intended to influence practice (eg, true clinical validation studies) versus those intended to raise awareness or to inform. In clinical medicine, American College of Physicians Journal Club, a regular supplement of the Annals of Internal Medicine, distills the broader literature to advise decision makers of what might be considered ready for application.34 Journals that report genomic advances could set up different review criteria depending on a manuscript's purpose, similar to the concept of proportionate review used by institutional review boards to guide the level and depth of the review effort depending on risks to subjects.35
The real-world practice of publishing incompletely evaluated interventions “with promise” in journals and their presentation at highly visible medical conferences accompanied by published abstracts attracts media and patient groups whose framing of a story is beyond the influence of the researcher. Insurers (including governments), health care delivery organizations, and practitioners are lobbied heavily by advocacy groups and industry to make these incompletely evaluated interventions available. Both industry and government are often forced to compromise with patient and public demand through provisional access of incompletely evaluated medicines in the form of compassionate access programs and programs like coverage with evidence development.36
The laudatory visions of McShane and Hayes1 and Ginsburg and Kuderer2 for improvements in the quality and reporting of predictive/prognostic biomarker research—so crucial to the future of personalized medicine—sets an important agenda, but more effort is required to address real-world challenges for implementation. The QI effort for genomic medicine is itself challenged by real-world barriers to adoption of standards. To accelerate the quality agenda in genomic medicine we need to add to CER the insights and methodologies of health services research and the emerging fields of implementation science and knowledge translation. To this I would add the imperative of including policy analysis research to broaden our perspectives about how societal trends, attitudes, and changing values reflected in the notion of personalized medicine can influence the acceptability to all stakeholders of more rigorous evaluation standards and their consequences, especially if more rigorous evaluation is perceived by the public and by clinicians as counter to the availability of innovation earlier in the development cycle of an intervention.
The author(s) indicated no potential conflicts of interest.
|1.||LM McShane, DF Hayes: Publication of tumor marker research results: The necessity for complete and transparent reporting J Clin Oncol 30: 4223– 4232,2012 Link, Google Scholar|
|2.||GS Ginsburg, NM Kuderer: Comparative effectiveness research, genomics-enabled personalized medicine, and rapid learning health care: A common bond J Clin Oncol 30: 4233– 4242,2012 Link, Google Scholar|
|3.||NM Kuderer, E Culakova, M Huang, etal: Quality appraisal of clinical validation studies for multigene prediction assays of chemotherapy response in early-stage breast cancer J Clin Oncol 29: 214s,2011 suppl abstr 3082 Link, Google Scholar|
|4.||PH Conway, C Clancy: Comparative-effectiveness research: Implications of the Federal Coordinating Council's report N Engl J Med 361: 328– 330,2009 Crossref, Medline, Google Scholar|
|5.||D Menon, T Stafinski: Health technology assessment in Canada: 20 years strong? Value in Health 12: S14– S19,2009 suppl 2 Crossref, Medline, Google Scholar|
|6.||R Schwarzer, U Siebert: Methods, procedures, and contextual characteristics of health technology assessment and health policy decision making: Comparison of health technology assessment agencies in Germany, United Kingdom, France, and Sweden Int J Technol Assess Health Care 25: 305– 314,2009 Crossref, Medline, Google Scholar|
|7.||A Smart: A multi-dimensional model of clinical utility Int J Qual Health Care 18: 377– 382,2006 Crossref, Medline, Google Scholar|
|8.||Docket No. FDA-2010-N-0621: Proposal to withdraw approval for the breast cancer indication for AVASTIN (bevacizumab) Department of Health and Human Services Food and Drug Administration: http://www.fda.gov/downloads/NewsEvents/Newsroom/UCM280546.pdf Google Scholar|
|9.||HJ Burstein: Bevacizumab for advanced breast cancer: All tied up with a RIBBON J Clin Oncol 29: 1232– 1235,2011 Link, Google Scholar|
|10.||J Wennberg, A Gittelsohn: Small area variations in health care delivery Science 182: 1102– 1108,1973 Crossref, Medline, Google Scholar|
|11.||Small area variations: What are they and what do they mean? Can Med Assoc J 146: 467– 470,1992 Health Services Research Group: Google Scholar|
|12.||AM Garber, SR Tunis: Does comparative-effectiveness research threaten personalized medicine? N Engl J Med 360: 1925– 1927,2009 Crossref, Medline, Google Scholar|
|13.||DG Altman: Endorsement of the CONSORT statement by high impact medical journals: Survey of instructions for authors BMJ 330: 1056– 1057,2005 Crossref, Medline, Google Scholar|
|14.||BJ Shea, JM Grimshaw, GA Wells, etal: Development of AMSTAR: A measurement tool to assess the methodological quality of systematic reviews BMC Med Res Methodol 7: 10,2007 Crossref, Medline, Google Scholar|
|15.||GH Guyatt, AD Oxman, GE Vist, etal: GRADE: An emerging consensus on rating quality of evidence and strength of recommendations BMJ 336: 924– 926,2008 Crossref, Medline, Google Scholar|
|16.||MC Brouwers, ME Kho, GP Browman, etal: Development of the AGREE II, part 2: Assessment of validity of items and tools to support application CMAJ 182: e472– e478,2010 Crossref, Medline, Google Scholar|
|17.||HM Moore, AB Kelly, SD Jewell, etal: Biospecimen reporting for improved study quality (BRISQ) J Proteome Res 10: 3429– 3438,2011 Crossref, Medline, Google Scholar|
|18.||LM McShane, DG Altman, W Sauerbrei, etal: Reporting recommendations for tumor marker prognostic studies (REMARK) J Natl Cancer Inst 97: 1180– 1184,2005 Crossref, Medline, Google Scholar|
|19.||J Little, JP Higgins, JP Ioannidis, etal: Strengthening the reporting of genetic association studies (STREGA): An extension of the STROBE statement Ann Intern Med 150: 206– 215,2009 Crossref, Medline, Google Scholar|
|20.||SM Teutsch, LA Bradley, GE Palomaki, etal: The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative: Methods of the EGAPP Working Group Genet Med 11: 3– 14,2009 Crossref, Medline, Google Scholar|
|21.||AE Evensen, R Sanson-Fisher, C D'Este, etal: Trends in publications regarding evidence-practice gaps: A literature review Implement Sci 5: 11,2010 Crossref, Medline, Google Scholar|
|22.||ID Graham, J Logan, MB Harrison, etal: Lost in knowledge translation: Time for a map J Contin Educ Health Prof 26: 13– 24,2006 Crossref, Medline, Google Scholar|
|23.||SH Woolf: The meaning of translational research and why it matters JAMA 299: 211– 213,2008 Crossref, Medline, Google Scholar|
|24.||MP Eccles, D Armstrong, R Baker, etal: An implementation research agenda Implement Sci 4: 18,2009 Crossref, Medline, Google Scholar|
|25.||S Cheah, S Dee, A Cole, etal: An online tool for improving biospecimen reporting Biopreserv Biobank 10,2012 abstr 211 Google Scholar|
|26.||CM Micheel, SJ Nass, GS Omenn: Evolution of Translational Omics: Lessons Learned and the Path Forward 2012 Washington, DC Institute of Medicine of the National Academies Crossref, Google Scholar|
|27.||S Lewis: Toward a general theory of indifference to research-based evidence J Health Serv Res Policy 12: 166– 172,2007 Crossref, Medline, Google Scholar|
|28.||Guidelines for human biobanks, genetic research databases and associated data Office of Population Genetics Government of Western Australia Department of Health: http://www.genomics.health.wa.gov.au/publications/docs/guidelines_for_human_biobanks.pdf Google Scholar|
|29.||OECD guidelines on human biobanks and genetic research databases Organization for Economic Co-Operation and Development (OECQ): http://www.oecd.org/dataoecd/41/47/44054609.pdf Google Scholar|
|30.||National Cancer Institute best practices for biospecimen resources National Cancer Institute: http://www.allirelandnci.com/pdf/NCI_Best_Practices_060507.pdf Google Scholar|
|31.||Ethics and governance framework: version 3.0 UK Biobank: http://www.ukbiobank.ac.uk/wp-content/uploads/2011/05/EGF20082.pdf Google Scholar|
|32.||2012 best practices for repositories: Collection, storage, retrieval, and distribution of biological materials for research International Society for Biological and Environmental Repositories: http://online.liebertpub.com/doi/pdfplus/10.1089/bio.2012.1022 Google Scholar|
|33.||Standard operating procedures Canadian Tumour Repository Network: http://www.ctrnet.ca/operating-procedures Google Scholar|
|34.||ACP journal club: The best new evidence for patient care American College of Physicians (ACP): http://www.annals.org/site/acpjc/index.xhtml Google Scholar|
|35.||Ethical conduct for research involving humans Government of Canada Internal Advisory Panel on Research Ethics Tri-council policy statement: http://www.pre.ethics.gc.ca/eng/policy-politique/initiatives/tcps2-eptc2/chapter11-chapitre11/ Google Scholar|
|36.||J Lexchin: Coverage with evidence development for pharmaceuticals: A policy evolution? Int J Health Serv 41: 337– 354,2011 Crossref, Medline, Google Scholar|
I thank Peter Watson, MD, who provided a review of this article and made several valuable suggestions.