How Statistics Can Capture the Unique Fingerprint of Cancer  

Not all cancer research takes place in a lab using test tubes and microscopes – some research uses complex math, statistics and thousands of different data types to identify the relevant genomic variation important in the cause, progression and treatment response in cancer patients. Brooke L. Fridley, PhD, director of the Biostatistics and Informatics Shared Resource at The University of Kansas Cancer Center and associate professor in the Department of Biostatistics at The University of Kansas Medical Center, is using genomic, transcriptomic and epigenomic data to determine cancer molecular subtypes.

Fundamentally, she’s trying to figure out why cancer patients who have seemingly similar types of cancer will respond differently to the same treatment. Dr. Fridley, however, is using computer data instead of a microscope – she’s employing her extensive experience with biostatistics to dissect the data and find patterns that might explain the differences in an individual patient’s response to treatment.

“Not all ovarian cancer tumors are the same, even if they are clinically denoted as being the same type of cancer.  Even women who may have similar types of ovarian cancer aren’t responding the same to the standard treatment and don’t have the same clinical outcomes,” said Dr. Fridley, a member of the Cancer Control and Population Program. “We can use the molecular information of the tumor to determine features that distinguish groups of patients that may or may not respond to a given therapy.  This information can point researchers towards potential drug targets and provide clinicians with information to enable individualized therapy decisions for patients.”

Tracking the small differences

Dr. Fridley is conducting research on epithelial ovarian cancer, the fifth-leading cause of cancer death among women. Even with surgery and chemotherapy, the percentage of survivors at five years or more remains at just 45 percent.

As patient outcome and response to therapy is a complex equation to figure out, use of all available genomic and epigenomic information may better refine these classifications.  Thus, comprehensive genomic profiling involving all available information, such as DNA methylation, mutations and mRNA expression, is crucial for identification of homogenous cancer subtypes. 

Below: A diagram of integrative clustering.

By analyzing different pieces of molecular data from hundreds of patient tumors, such as DNA methylation, gene expression and somatic mutations, Dr. Fridley may be able to find relevant genomic features that distinguish different tumor subgroups. These aren’t easy to detect by assessing only one level of information regarding the tumor using basic statistical models. A lot factors into how a person’s particular type of cancer will respond to treatment. Not all ovarian cancer patients will have the same genetic mutations, and the tiniest differences could mean a drug won’t work in three patients, but will in five others.

Therefore, Dr. Fridley is working to develop new analytic methods for integrating all features about the tumor into the classification of tumors using a Bayesian Integrative Molecular Clustering approach. The Bayesian Integrative Clustering model involves complex mathematical formulas that model both the correlation between features, such as high methylation for a gene region would result in lower gene expression for the corresponding gene as well as incorporate biological knowledge about the cancer into the model. 

“It’s important to take into account ‘gene silencing’ in that DNA methylation can repress the expression of some genes,” said Dr. Fridley.

The Bayesian method seeks to simplify and accelerate the process of sifting through and categorizing a tremendous amount of data. “Rather than taking individual analysis of data types from hundreds of different tumors and overlaying them on top of one another, the Bayesian model lets you analyze the DNA methylation, gene expression, and germline mutation data all at once in a comprehensive analysis framework,” said Dr. Fridley.

What to draw from data

For this research project, the tumor information is coming from two different groups - The Cancer Genome Atlas (TCGA), a nationwide project that catalogues molecular data types on many common cancers, and Ellen Goode, PhD and her Mayo Clinic research group, where Dr. Fridley formerly worked and is with whom she currently collaborates on ovarian cancer research.

By comparing the clustering data from the TCGA and Mayo studies, Dr. Fridley can see whether the commonalities and differences match between the two large studies (i.e., validation). The hope is that after Dr. Fridley and colleagues have developed the Bayesian model for integrative clustering, they will create user-friendly software allowing other cancer researchers to apply the methodology to their data.

Using these types of formulas early in the cancer diagnosis would allow doctors to choose drugs that better fit a person’s very specific type of cancer. It could potentially eliminate the waiting game of seeing if a drug will work, which can impact quality of life, explained Dr. Fridley. It’s another example of the push of research into personalized approaches for diagnosing and treating cancer.

In spite of significant developments in both our knowledge of cancer genomics and the recent advances in techniques able to capture genomic information across the genome, epigenome and transcriptome, numerous challenges still exist that slow the discovery and translation of markers from “bench to bedside.”

“This could ultimately improve the safety and efficacy of therapies used to treat cancer,” said Dr. Fridley. “The key is to catch the patterns. Cancer genomics and pharmacogenomics have the potential to reduce the cost of treatment by determining the correct treatment and dose for each individual; thus increasing efficiency in response to treatment while reducing side effects.”

Funding sources

  • NIH R21 CA 140879: "Integrative genomic models for analysis of pharmacogenomic studies"
  • NIH R21 GM 86689: "Bayesian hierarchical nonlinear models for pharmacogenomic cytotoxicity studies"
  • 2012-2013 Pilot Project, University of Kansas Cancer Center: "Integrative Analysis of the Ovarian TCGA Data"
  • NIH R21 CA182715: "Bayesian Integrative Clustering for Determining Molecular Based Cancer Subtypes"

Relevant publications

  • Fridley, BL, Abo, R, Tan, XL, Jenkins, GD, Batzler, A, Moyer, AM, Biernacka, JM, & Wang, L. (2014). Integrative gene set analysis: application to platinum pharmacogenomics. OMICS, 18(1), 34-41.
  • Fridley, BL, Koestler, DC, & Godwin, AK. (2014). Individualizing care for ovarian cancer patients using big data. J Natl Cancer Inst, 106(5).
  • Koestler, DC, Chalise, P, Cicek, MS, Cunningham, JM, Armasu, S, Larson, MC, Chien, J, Block, M, Kalli, KR, Sellers, TA, Fridley, BL, & Goode, EL. (2014). Integrative genomic analysis identifies epigenetic marks that mediate genetic risk for epithelial ovarian cancer. BMC Med Genomics, 7, 8.
  • Chalise, P, Koestler, DC, Bimali, M, Yu, Q, & Fridley, BL. (2014). Integrative clustering methods for high-dimensional molecular data. Transl Cancer Res, 3(3), 202-216.
  • Fridley, BL, Lund, S, Jenkins, GD, & Wang, L. (2012). A Bayesian integrative genomic model for pathway analysis of complex traits. Genet Epidemiol, 36(4), 352-359.
  • Chalise, P, Batzler, A, Abo, R, Wang, L, & Fridley, BL. (2012). Simultaneous analysis of multiple data types in pharmacogenomic studies using weighted sparse canonical correlation analysis. OMICS, 16(7-8), 363-373.

Types of Genomic Data

  • DNA methylation
    When a methyl group (1 carbon atom bonded to 3 hydrogen atoms) is added to the cytosine or adenine portion of a DNA nucleotide. The process of DNA methylation can gradually cause gene silencing, particularly of tumor suppressor genes.
  • Germline mutations
    A variation in the lineage of sperm or egg cells that are transferred onto children. An example of this is the BRCA1 and BRCA2 mutations
  • Somatic mutations
    A variation in a gene that isn’t inherited from a parent and isn’t passed onto kids. These can be acquired over a person’s life—such as how exposure to UV rays can lead to skin cancer.
  • Gene expression
    The abnormal expression of a gene that can be caused by mutations within the gene, deletion or overexpression of certain sequences of the gene. If these changes help regulate the expression of other genes, it’s likely the cell is malignant.