|Articles|February 24, 2017

P Values and 'Data Dredging' in Clinical Research

We spoke with Dr. Garrett-Mayer about concerns regarding statistical probability (P) values and “data dredging” in clinical research.

Professor of Biostatistics and Epidemiology Elizabeth Garrett-Mayer, PhD, is Director of Biostatistics at the Medical University of South Carolina’s Hollings Cancer Center in Charleston, South Carolina. She is a member of the American Society of Clinical Oncology (ASCO) Cancer Research Committee. We spoke with Dr. Garrett-Mayer about concerns regarding statistical probability (P) values and “data dredging” in clinical research.

-Interviewed by Bryant Furlow

OncoTherapy Network: Statistical P values and their interpretation have become somewhat controversial. What is their value in assessing clinical trial outcomes in oncology?

Dr. Garrett-Mayer: P values can have some value, but should not be interpreted on their own. It is always important to consider the clinical effect size and sample size when interpreting a P value. P values are most useful in a setting where a trial was designed to have an appropriate sample size to address the primary objective of the study. If the study is much larger or smaller than required to answer the primary research question, then the P value can be misleading on its own. ASCO’s perspective paper from 2014, “Raising the Bar for Clinical Trials by Defining Clinically Meaningful Outcomes,” addresses this problem by providing guidance for what would considered “clinically meaningful” in a number of different patient populations so that trials can be designed with an appropriate sample size.

OncoTherapy Network: Does statistical significance typically imply biological significance?

Dr. Garrett-Mayer: Typically? No. But there are so many settings in which P values are reported. In cases where studies are specifically designed to detect a clinically (or biologically) meaningful difference, then statistical significance will imply clinical (or biological) significance. But, the majority of the P values reported do not fall into this category. Most trials are designed around a single primary objective, for example, to detect a clinically meaningful difference in survival between two treatments. However, when the trial is reported, there are numerous other comparisons made, such as differences in toxicity rates, differences in progression-free survival, etc. The P values from these comparisons should be interpreted cautiously because the sample size was not selected based on those other outcomes. In preclinical research, we are seeing articles in which hundreds of P values are reported from small studies. With so many P values reported, we expect quite a few to be significant just by chance alone.

OncoTherapy Network:Is the role of P values different in hypothesis-generating data exploration settings versus confirmatory hypothesis testing?

Dr. Garrett-Mayer: Yes. There is an interesting history of how the P value came to be used as it is today, and it is not used as it was ever intended. R.A. Fisher proposed a P value (without a threshold) to be interpreted qualitatively in conjunction with prior knowledge to interpret new data. Neyman and Pearson proposed setting the “alpha threshold” and to reject a hypothesis when the P value was less than the threshold without concern for how small or how large the P value was. But now, we use a conflated approach. We use the threshold approach (usually set at 0.05) and we also use the P value as a judge of the level of evidence (ala Fisher). The biggest problem, however, is that we too often ignore the other important facets of the analysis, such as the effect size, confidence intervals for the effect size, and the sample size, when interpreting our results.

Back to the specific question now: The Neyman-Pearson approach is more consistent with the idea of confirmatory hypothesis testing. A specific study is designed to confirm (or deny) a specific hypothesis and the alpha threshold is set. At the conclusion of the study, the hypothesis is either rejected or not, and because the study was designed around a specific hypothesis, statistical and clinical significance are both supported with a P value less than the threshold. However, in hypothesis-generating type settings, P values should be considered more qualitatively (as per Fisher) where the magnitude of the P value is considered in conjunction with other factors. In these types of early studies, the sample size cannot be selected to accommodate all hypotheses of interest so that statistical significance (or lack thereof) will not necessarily imply clinical (or biological) significance. In these cases, many statisticians-including myself-would encourage researchers to utilize graphical displays of data with summary statistics and confidence intervals.

Does it really make sense to use the same data analysis and interpretation approaches in basic science research as in confirmatory clinical trials? No, and this is why the notion that using and interpreting P values in the same way for all types and stages of research is silly.

OncoTherapy Network: Can you please describe concerns about P value "hacking" or "data dredging," and comment on whether or not you share these concerns (and why or why not)?

Dr. Garrett-Mayer: I definitely have concerns with P value hacking and data dredging. For those not familiar with these approaches, they bring to mind the quote, “If you torture the data long enough, it will confess” (often attributed to Ronald Coase, although his original quote differed slightly). With a large enough dataset (meaning a large enough set of measurements), eventually, one can find a statistically significant association, or one can find a subgroup of individuals for which there is a significant difference in an outcome of interest. As we know from the basic principle of frequentist statistics, with an alpha level of 0.05, 5% of the time we will conclude that we have a significant association when none exists. So, for example, if a researcher searches through 40 genes for an association with cancer, even if all 40 truly have no association with cancer, we would expect two of these genes will have P values less than 0.05.

A major problem with this is that these reported and published results become part of our body of knowledge from which we continue our research pursuits. Results from “dredged data” lead people down the wrong path in most cases and resources (including both money and time) are wasted.

OncoTherapy Network: Are there contexts or applications for which P value concerns are more germane (such as high-throughput -omics studies) than others?

Dr. Garrett-Mayer: Yes, and in the very early days of analysis of high-throughput datasets, these concerns were not fully appreciated. However, it did not take much time for statisticians to jump in and raise concerns. In most high-throughput analysis approaches, one will see that there is control over the false-discovery rate, which controls the percentage of genes that are identified that are “false-positives.” In addition, we much more commonly see validation approaches incorporated into high-throughput analyses to avoid reporting spurious findings.

OncoTherapy Network: Are there widely-accepted ways to correct, statistically, for multiple comparisons?

Dr. Garrett-Mayer: The Bonferroni correction has been popular. It is simple to implement, but has the drawback that it is very conservative. Other approaches which have gained in popularity including the Benjamini-Hochberg correction which is more powerful than the Bonferroni approach, and there are others, too. But, when considering the need for correction, one must consider the context. If one is interrogating a dataset with hundreds or thousands of markers, then one must address the multiplicity issue to define a set of markers. And, if one is performing a randomized phase III trial with two primary clinical outcomes (and the new treatment would be approved if either one yields a significant result), then one needs to correct for multiple comparisons as well. But for most other settings, multiple comparisons are less necessary. The controversy over P values will hopefully lead us away from the heavy reliance we place on them in medical research, and lead us toward better reporting of results, where clinical and biological significance is emphasized through easy to interpret graphical displays.

And, there are other hypothesis testing approaches that do not rely on P values at all, such as Bayesian and Evidential (i.e., Likelihood)-based approaches. These are gaining some popularity in clinical research and will lead to less reliance on P values.

Stay up to date on recent advances in the multidisciplinary approach to cancer.

Subscribe Now!

Latest CME

In-Person + Virtual Event

Live Tumor Board: Squamous Cell Carcinoma of the Head & Neck – Post-CRT Decisions in the Locally Advanced Setting

February 19, 2026

P Values and 'Data Dredging' in Clinical Research

Newsletter

Latest CME

Live Tumor Board: Squamous Cell Carcinoma of the Head & Neck – Post-CRT Decisions in the Locally Advanced Setting

Community Oncology Connections™: Beyond Primary End Points—Digging Into Randomized and Real-World Data to Guide Challenging Treatment Decisions in HR+/HER2− Metastatic Breast Cancer | Kentucky Society of Clinical Oncology

The 4th Annual Hawaii Lung: A Multidisciplinary Case-Based Conference

Inaugural Brain & Spine Metastases Conference: Evolving Practice and Emerging Therapies

Community Oncology Connections™: Beyond Primary End Points—Digging Into Randomized and Real-World Data to Guide Challenging Treatment Decisions in HR+/HER2− Metastatic Breast Cancer | Indiana Oncology Society

Addressing Unmet Needs in HER2+ Metastatic BTC

A Case-Guided Discussion on Managing Immune Thrombocytopenic Purpura (ITP)

Medical Crossfire®: Expert Perspectives on Targeting c-Met Overexpression and 𝘔𝘌𝘛 Genomic Alterations in NSCLC – Unveiling the Complexities of 𝘔𝘌𝘛 Dysregulation

Medical Crossfire®: Precision Medicine in Glioma Treatment — Integration of Molecular Profiling to Inform Targeted Therapies

Medical Crossfire®: Integrating Next-Generation Endocrine Targeting Therapies to Improve Outcomes for Patients With HR+/HER2- Breast Cancer

Tumor Board: Expert Insights on Managing Classical 𝘌𝘎𝘍𝘙 Mutations, 𝘌𝘎𝘍𝘙 Exon 20 Insertions, and Atypical 𝘌𝘎𝘍𝘙 Mutations in Metastatic NSCLC

Evolving Treatment Strategies in Pancreatic Cancer: Current Standards, Emerging Targets, and the Role of Molecular Testing

Breast Cancer Tumor Board: Targeting TROP2 – Innovations in Triple-Negative Breast Cancer Treatment

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Kansas

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | South Carolina

Community Oncology Connections™: Beyond Primary End Points—Digging Into Randomized and Real-World Data to Guide Challenging Treatment Decisions in HR+/HER2− Metastatic Breast Cancer | Washington State Medical Oncology Society

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Wyoming and Montana

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | New Mexico

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | North Carolina

A Breath of Strength: Managing Cancer Associated LEMS and Lung Cancer as One

Striking the Right Nerve: Managing Cancer Associated LEMS in Lung Cancer Patients

Show Me the Data™: Bridging Clinical Gaps Along the Continuum From Resectable, Early Stage to Advanced Gastric/Gastroesophageal Junction Cancers

19th Annual New York GU Cancers Congress™

Medical Crossfire®: Expert Interpretations of the Latest Data in CLL Management – Understanding the Impact of Optimal Treatment Selection on Patient Outcomes

Virtual Testing Board: Digging Deeper on Your Testing Reports to Elevate Patient Outcomes in Advanced Non–Small Cell Lung Cancer

Medical Crossfire® – From Diagnostic Dilemmas to Potential Treatment Breakthroughs: Exploring Novel Targets for Extrapulmonary Neuroendocrine Carcinomas

Community Practice Connections™: Tailored Treatment Approaches for Older Patients With Advanced HR+/HER2– Breast Cancer

Community Practice Connections™: Optimizing Treatment Outcomes and Preserving Fertility in Premenopausal HR+ Breast Cancer

From Bench to Bedside: Paradigm Shifts in HER2+ Metastatic BTC Treatment

Proactive Adverse Event Management for HER2+ BTC Treatments

Community Practice Connections™: Empowering Interventional Radiologists in the Emerging Era of Oncolytic Immunotherapies for Melanoma

GI Tumor Board—Applying Recent Advances in Biomarker Testing and Treatment in Metastatic Colorectal Cancer

Medical Crossfire®: Harnessing the Power of Modern Therapies in Newly Diagnosed Multiple Myeloma

PER Tumor Board®: Applying Recent Advances to Transform the Treatment Paradigm in SCLC—Expert Perspectives on New Approvals and Emerging Strategies

Cases & Conversations™: Transforming AML Care—Precision Strategies, Evolving Therapies, and Clinical Insights

Cases and Conversations™: Sorting Through the Expanding Treatment Options for Patients with Relapsed/Refractory Multiple Myeloma

Medical Crossfire®: Improving Patient Outcomes in Myeloproliferative Neoplasms With Novel Therapeutic Approaches

Trending on CancerNetwork

Modifiable Risk Factors Suggest Potential for Improving Cancer Prevention

2026 Tandem Meetings: What’s the Latest Research in Multiple Myeloma?

Dato-DXd Receives Priority Review in Unresectable/Metastatic TNBC

Barriers to CAR T-Cell Referral and Center Access in Multiple Myeloma

ASCO President Elect Talks Advancing Patient-Centered Care and Team-Based Innovation