McShane and Hayes1 and Ginsburg and Kuderer2 in the special series issue of the Journal provide a comprehensive vision for improved conduct and reporting of predictive/prognostic biomarker research in personalized cancer medicine. Their visions are prompted by the need for more methodologic rigor for validation studies in this field as highlighted by a critical appraisal of 37 published reports in a systematic review.3

They invoke three strategies for improving study quality: first, comparative effectiveness research (CER) that addresses population-level effects of alternative interventions; second, quality improvement (QI) tools as methodologic standards; and third, infrastructure development for informatics and data linkage, and institutional networking with data sharing.

Comparative Effectiveness

CER was popularized with the passing in 2009 of the American Reinvestment and Recovery Act (ARRA). Among other objectives, CER aims to compensate for limitations in the inferences that can be made from randomized controlled trials (RCTs) about an intervention's performance when applied in the messier real world.4 CER blends the established fields of outcomes research informed by classical epidemiology and health technology assessment that includes economic evaluation. CER is a concept that bundles these various evaluation approaches for population-level decision making, although techniques like health technology assessment and economic analyses have been practiced at the policy level for decades in other countries where fears about so-called rationalized health care are less part of the political landscape.5,6

An important aspect of CER is exploration of the development of informatics infrastructures, networks for data linkage, and biobanking standards to achieve the evaluation requirements for personalized medicine. Ginsburg and Kuderer briefly sketch seven active CER projects in genomic medicine under the National Cancer Institute ARRA-funded program. They share the common theme of infrastructure development (technical and social) for information capture, linking, and sharing—the third strategy for QI listed. Such infrastructure development may be the first necessary step in foundation building for CER to facilitate the subsequent analytic components. So, it is not yet clear from these brief descriptions where the comparisons of alternative interventions will be made in the analysis phases of CER and whether such comparisons on the basis of large data sets will necessarily provide more valid inferences than those from well-executed RCTs.

But the articles by McShane and Hayes1 and by Ginsburg and Kuderer2 raise the concept of clinical utility as the underlying rationale for the relevance of CER, claiming that analytic and clinical validity, although necessary, are not sufficient for adopting technological innovations into the population. CER claims to address the clinical utility issue. The term clinical utility as usually applied is restricted to the clinical (physical, emotional, and social) consequences of interventions, both good and harmful, to the individuals exposed to them.7 Issues such as financial or out-of-pocket costs to individuals and convenience can be considered part of clinical utility; but Ginsburg and Kuderer include costs to society as part of the concept.

It is worthwhile distinguishing between clinical utility and health systems utility, with the former applying to decisions or judgments by health care providers and patients about consequences to individuals and the latter to policy advisors and decision makers considering consequences for the population and the system as a whole. What individuals demand for themselves and what is appropriate for society as a whole can come into conflict; and it is not clear at this time that the public in the United States is ready for setting priorities, interpreted as health care rationing. For the CER agenda to achieve its objectives in informing policies about the affordability of new interventions, whether well or imperfectly evaluated, it will be important to overcome such attitudes. A poignant illustration of differences in societal attitudes related to this issue is the contrast between how the US Food and Drug Administration and the United Kingdom National Institutes of Health and Clinical Excellence approached decisions about approval of bevacizumab for metastatic breast cancer, with economic evaluation missing from the former and an integral part of the latter evaluation.8,9 As a concept, personalized medicine does create the expectation of health care for individuals, which could undermine policy informing aspirations of CER for setting priorities for the population as a whole.

Proponents of CER also claim that it has the potential to address limitations in the inferences that can be made at a population level from the results of RCTs that are conducted in more restricted settings. But we need to avoid assuming that the failure to reproduce the results of RCTs for interventions when applied to the real world is necessarily a comment on the potential benefits of the intervention itself; discordance between RCT results and those in the real world may have more to do with the circumstances under which the intervention is delivered. Exploration of these circumstances, rather than rejection of an intervention that might benefit subpopulations when appropriately applied, is a potentially important agenda for CER.

Because of the variety of factors that can affect the performance of an intervention in the real world, it is not unreasonable to question CER results obtained from selected populations as necessarily generalizable to other populations or regions where there may be differences in health services delivery mechanisms, access to services, insurance arrangements, age structures, and socioeconomic factors among others. We already know from the pioneering work of Wennberg and Gittelsohn10 that there can be considerable variation in patterns of care and outcomes across geographic regions.11 We will need to be more explicit about how CER can most appropriately contribute, what inferences can be made from its comparisons, and what further explorations of the data might be needed before accepting them at face value.

A key contribution from these papers is reconciliation of the lingering doubts created by the political debates around ARRA in which a coalition of influential partners opposed the $1.1 billion investment in evaluation science on the grounds that it would stymie progress in personalized medicine.12 Both articles confirm, through CER, the relevance of traditional evaluation approaches for personalized medicine interventions.

QI Tools

The two sets of authors set up a vision for improved conduct and reporting of biomarker validation studies for personalized medicine using tools (often presented as recommendations or statements) patterned after the evidence-based movement (eg, CONSORT; Assessment of Multiple Systematic Reviews [AMSTAR]; Grading of Recommendations Assessment, Development, and Evaluation [GRADE]; Appraisal of Guidelines for Research and Evaluation [AGREE]).1316 But, just as McShane and Hayes1 and Ginsburg and Kuderer2 highlight the distinction between real-world circumstances requiring CER approaches and the controlled environments of RCTs, their own recommendations for use of QI tools (eg, Biospecimen Reporting for Improved Study Quality, Reporting Recommendations for Tumor Marker Prognostic Studies [REMARK], Strengthening the Reporting of Observational Genetic Association Studies, and the Evaluation of Genomic Applications in Practice and Prevention [EGAPP] initiative) will themselves face real-world challenges for adoption.1720

Analogous to the gap identified between recommendations from practice guidelines and their use in the real world,21 the recommendations from these QI tools for genomic medicine will require active implementation. Lessons for promoting adoption of standards can be learned from the emerging fields of knowledge translation and implementation science2224 that show dissemination of recommendations through the literature to be a relatively ineffective, and perhaps the least effective, strategy to promote behavior change. In this context, it is surprising how timid is the language in the publications of tools for improved reporting in genomic medicine, often couching recommendations as voluntary, awareness-raising, and intended for readers as opposed to doers of research. For example, the publication of QI standards in genomic medicine by the EGAPP Working Group is accompanied by disclaimers in almost apologetic terms about any authority (other than moral) for influence or enforcement of its recommendations.20 Although this may be appropriate given the positioning of the EGAPP Working Group, it does speak to the need for bolder approaches to stimulate adoption of these QI tools. On the other hand, it is encouraging that tools such as these have been successfully promoted and adopted in the evidence-based realm by influential journals and other organizations—from which lessons can be learned. It is encouraging that Journal of Clnical Oncology has adopted REMARK as a reporting standard. A recent proposal of a tool to facilitate adoption of the Biospecimen Reporting for Improved Study Quality recommendations by pathologists is a welcome action-oriented strategy for promoting their use.25 In addition, the recent Institute of Medicine Report on Evolution of Translational Omics26 and the availability of ARRA related-funding could stimulate accelerated uptake.

Real-World Challenges

As CER and QI tools seek to address the real-world circumstances that affect quality in the conduct and reporting of genomic research, so too do these strategies face their own real-world challenges, some of which I highlight here.


Steven Lewis27 advanced ten propositions as a theory of indifference to research-based evidence to explain unexpectedly sluggish adoption of evidence-based medicine and decision making. All 10 of Lewis's propositions cannot be recounted here, but especially relevant is his suggestion that “the first step is to reconceive evidence-based medicine and decision making as habits of mind rather than a toolbox and to recognize that the sociology of knowledge is as important as its technical content.”27(p166) The proposals in the two articles by McShane and Hayes1 and Ginsburg and Kuderer2 should be evaluated in these terms to inform socially relevant strategies of implementation for real-world circumstances. In this context, it would be useful to learn what strategies were used to persuade Journal of Clinical Oncology to adopt the REMARK standards and what progress is being made with other journals involving REMARK and other proposed genomic standards.

Role of industry, biobanks, and the culture of regulation.

To be implemented consistently, recommendations from QI tools in genomic medicine will require consistent adherence to rigorous standards not only by individual researchers but also by those responsible for managing biobanks, where standards ranging from tissue and data handling to privacy protection and ethics to governance models have been proposed.2833

Such strict standards would require a level of oversight and transparency achievable only through some voluntary mechanism (eg, accreditation and/or certification) or an official regulatory framework. Upgrading the standards of long-standing biobanks developed in less bureaucratic eras could be accomplished by empowering institutional research review boards to apply standards for new research proposals involving existing biobanks and facilitating upgrades when already approved studies come forward with annual renewals or amendments.

A large proportion of clinical research is sponsored by industry, where biobanking has become routine. Invoking standards of performance for academic institutions without similar requirements and oversight for industry-managed biobanks would undermine the entire quality movement. It is noted that publications of the QI tools proposed as standards have come from academically oriented groups and public agencies, with little apparent participation from industry.

Purpose of peer-review literature.

The peer-review literature is not organized for decision making. Implementation of the proposed standards should not suppress original preliminary research intended as hypothesis generating or as proof of concept. Criteria could be developed and used by journals to distinguish between manuscripts intended to influence practice (eg, true clinical validation studies) versus those intended to raise awareness or to inform. In clinical medicine, American College of Physicians Journal Club, a regular supplement of the Annals of Internal Medicine, distills the broader literature to advise decision makers of what might be considered ready for application.34 Journals that report genomic advances could set up different review criteria depending on a manuscript's purpose, similar to the concept of proportionate review used by institutional review boards to guide the level and depth of the review effort depending on risks to subjects.35

Role of the media, societal values, and consumers demands.

The real-world practice of publishing incompletely evaluated interventions “with promise” in journals and their presentation at highly visible medical conferences accompanied by published abstracts attracts media and patient groups whose framing of a story is beyond the influence of the researcher. Insurers (including governments), health care delivery organizations, and practitioners are lobbied heavily by advocacy groups and industry to make these incompletely evaluated interventions available. Both industry and government are often forced to compromise with patient and public demand through provisional access of incompletely evaluated medicines in the form of compassionate access programs and programs like coverage with evidence development.36


The laudatory visions of McShane and Hayes1 and Ginsburg and Kuderer2 for improvements in the quality and reporting of predictive/prognostic biomarker research—so crucial to the future of personalized medicine—sets an important agenda, but more effort is required to address real-world challenges for implementation. The QI effort for genomic medicine is itself challenged by real-world barriers to adoption of standards. To accelerate the quality agenda in genomic medicine we need to add to CER the insights and methodologies of health services research and the emerging fields of implementation science and knowledge translation. To this I would add the imperative of including policy analysis research to broaden our perspectives about how societal trends, attitudes, and changing values reflected in the notion of personalized medicine can influence the acceptability to all stakeholders of more rigorous evaluation standards and their consequences, especially if more rigorous evaluation is perceived by the public and by clinicians as counter to the availability of innovation earlier in the development cycle of an intervention.

© 2012 by American Society of Clinical Oncology

The author(s) indicated no potential conflicts of interest.

1. LM McShane, DF Hayes: Publication of tumor marker research results: The necessity for complete and transparent reporting J Clin Oncol 30: 42234232,2012 LinkGoogle Scholar
2. GS Ginsburg, NM Kuderer: Comparative effectiveness research, genomics-enabled personalized medicine, and rapid learning health care: A common bond J Clin Oncol 30: 42334242,2012 LinkGoogle Scholar
3. NM Kuderer, E Culakova, M Huang, etal: Quality appraisal of clinical validation studies for multigene prediction assays of chemotherapy response in early-stage breast cancer J Clin Oncol 29: 214s,2011 suppl abstr 3082 LinkGoogle Scholar
4. PH Conway, C Clancy: Comparative-effectiveness research: Implications of the Federal Coordinating Council's report N Engl J Med 361: 328330,2009 Crossref, MedlineGoogle Scholar
5. D Menon, T Stafinski: Health technology assessment in Canada: 20 years strong? Value in Health 12: S14S19,2009 suppl 2 Crossref, MedlineGoogle Scholar
6. R Schwarzer, U Siebert: Methods, procedures, and contextual characteristics of health technology assessment and health policy decision making: Comparison of health technology assessment agencies in Germany, United Kingdom, France, and Sweden Int J Technol Assess Health Care 25: 305314,2009 Crossref, MedlineGoogle Scholar
7. A Smart: A multi-dimensional model of clinical utility Int J Qual Health Care 18: 377382,2006 Crossref, MedlineGoogle Scholar
8. Docket No. FDA-2010-N-0621: Proposal to withdraw approval for the breast cancer indication for AVASTIN (bevacizumab) Department of Health and Human Services Food and Drug Administration: Google Scholar
9. HJ Burstein: Bevacizumab for advanced breast cancer: All tied up with a RIBBON J Clin Oncol 29: 12321235,2011 LinkGoogle Scholar
10. J Wennberg, A Gittelsohn: Small area variations in health care delivery Science 182: 11021108,1973 Crossref, MedlineGoogle Scholar
11. Small area variations: What are they and what do they mean? Can Med Assoc J 146: 467470,1992 Health Services Research Group: Google Scholar
12. AM Garber, SR Tunis: Does comparative-effectiveness research threaten personalized medicine? N Engl J Med 360: 19251927,2009 Crossref, MedlineGoogle Scholar
13. DG Altman: Endorsement of the CONSORT statement by high impact medical journals: Survey of instructions for authors BMJ 330: 10561057,2005 Crossref, MedlineGoogle Scholar
14. BJ Shea, JM Grimshaw, GA Wells, etal: Development of AMSTAR: A measurement tool to assess the methodological quality of systematic reviews BMC Med Res Methodol 7: 10,2007 Crossref, MedlineGoogle Scholar
15. GH Guyatt, AD Oxman, GE Vist, etal: GRADE: An emerging consensus on rating quality of evidence and strength of recommendations BMJ 336: 924926,2008 Crossref, MedlineGoogle Scholar
16. MC Brouwers, ME Kho, GP Browman, etal: Development of the AGREE II, part 2: Assessment of validity of items and tools to support application CMAJ 182: e472e478,2010 Crossref, MedlineGoogle Scholar
17. HM Moore, AB Kelly, SD Jewell, etal: Biospecimen reporting for improved study quality (BRISQ) J Proteome Res 10: 34293438,2011 Crossref, MedlineGoogle Scholar
18. LM McShane, DG Altman, W Sauerbrei, etal: Reporting recommendations for tumor marker prognostic studies (REMARK) J Natl Cancer Inst 97: 11801184,2005 Crossref, MedlineGoogle Scholar
19. J Little, JP Higgins, JP Ioannidis, etal: Strengthening the reporting of genetic association studies (STREGA): An extension of the STROBE statement Ann Intern Med 150: 206215,2009 Crossref, MedlineGoogle Scholar
20. SM Teutsch, LA Bradley, GE Palomaki, etal: The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative: Methods of the EGAPP Working Group Genet Med 11: 314,2009 Crossref, MedlineGoogle Scholar
21. AE Evensen, R Sanson-Fisher, C D'Este, etal: Trends in publications regarding evidence-practice gaps: A literature review Implement Sci 5: 11,2010 Crossref, MedlineGoogle Scholar
22. ID Graham, J Logan, MB Harrison, etal: Lost in knowledge translation: Time for a map J Contin Educ Health Prof 26: 1324,2006 Crossref, MedlineGoogle Scholar
23. SH Woolf: The meaning of translational research and why it matters JAMA 299: 211213,2008 Crossref, MedlineGoogle Scholar
24. MP Eccles, D Armstrong, R Baker, etal: An implementation research agenda Implement Sci 4: 18,2009 Crossref, MedlineGoogle Scholar
25. S Cheah, S Dee, A Cole, etal: An online tool for improving biospecimen reporting Biopreserv Biobank 10,2012 abstr 211 Google Scholar
26. CM Micheel, SJ Nass, GS Omenn: Evolution of Translational Omics: Lessons Learned and the Path Forward 2012 Washington, DC Institute of Medicine of the National Academies CrossrefGoogle Scholar
27. S Lewis: Toward a general theory of indifference to research-based evidence J Health Serv Res Policy 12: 166172,2007 Crossref, MedlineGoogle Scholar
28. Guidelines for human biobanks, genetic research databases and associated data Office of Population Genetics Government of Western Australia Department of Health: Google Scholar
29. OECD guidelines on human biobanks and genetic research databases Organization for Economic Co-Operation and Development (OECQ): Google Scholar
30. National Cancer Institute best practices for biospecimen resources National Cancer Institute: Google Scholar
31. Ethics and governance framework: version 3.0 UK Biobank: Google Scholar
32. 2012 best practices for repositories: Collection, storage, retrieval, and distribution of biological materials for research International Society for Biological and Environmental Repositories: Google Scholar
33. Standard operating procedures Canadian Tumour Repository Network: Google Scholar
34. ACP journal club: The best new evidence for patient care American College of Physicians (ACP): Google Scholar
35. Ethical conduct for research involving humans Government of Canada Internal Advisory Panel on Research Ethics Tri-council policy statement: Google Scholar
36. J Lexchin: Coverage with evidence development for pharmaceuticals: A policy evolution? Int J Health Serv 41: 337354,2011 Crossref, MedlineGoogle Scholar


I thank Peter Watson, MD, who provided a review of this article and made several valuable suggestions.

Downloaded 391 times


No companion articles


DOI: 10.1200/JCO.2012.44.8225 Journal of Clinical Oncology 30, no. 34 (December 01, 2012) 4188-4191.

Published online October 15, 2012.

PMID: 23071250

ASCO Career Center