Next generation sequencing in cancer research and clinical application
© Shyr and Liu; licensee BioMed Central Ltd. 2013
Received: 6 February 2013
Accepted: 9 February 2013
Published: 13 February 2013
Skip to main content
© Shyr and Liu; licensee BioMed Central Ltd. 2013
Received: 6 February 2013
Accepted: 9 February 2013
Published: 13 February 2013
The wide application of next-generation sequencing (NGS), mainly through whole genome, exome and transcriptome sequencing, provides a high-resolution and global view of the cancer genome. Coupled with powerful bioinformatics tools, NGS promises to revolutionize cancer research, diagnosis and therapy. In this paper, we review the recent advances in NGS-based cancer genomic research as well as clinical application, summarize the current integrative oncogenomic projects, resources and computational algorithms, and discuss the challenge and future directions in the research and clinical application of cancer genomic sequencing.
Sanger sequencing has dominated the genomic research for the past two decades and achieved a number of significant accomplishments including the completion of human genome sequence, which made the identification of single gene disorders and the detection of targeted somatic mutation for clinical molecular diagnostics possible [1, 2]. Despite Sanger sequencing's accomplishments, researchers are demanding for faster and more economical sequencing, which has led to the emergence of “next-generation” sequencing technologies (NGS). NGS’s ability to produce an enormous volume of data at a low price [3, 4] has allowed researchers to characterize the molecular landscape of diverse cancer types and has led to dramatic advances in cancer genomic studies.
Recent NGS-based studies in cancer
72 WES, 68 RNA-seq, 2 WGS
Identify multiple gene fusions such as RSPO2 and RSPO3 from RNA-seq that may function in tumorigenesis
65 WGS/WES, 80 RNA-seq
36% of the mutations found in the study were expressed. Identify the abundance of clonal frequencies in an epithelial tumor subtype
1 WGS, 1 WES
Identify TSC1 nonsense substitution in subpopulation of tumor cells, intra-tumor heterogeneity, several chromosomal rearrangements, and patterns in somatic substitutions
Identify two novel protein-expression-defined subgroups and novel subtype-associated mutations
Colon and rectal cancer
224 WES, 97 WGS
24 genes were found to be significantly mutated in both cancers. Similar patterns in genomic alterations were found in colon and rectum cancers
squamous cell lung cancer
178 WES, 19 WGS, 178 RNA-seq, 158 miRNA-seq
Identify significantly altered pathways including NFE2L2 and KEAP1 and potential therapeutic targets
Discover that most high-grade serous ovarian cancer contain TP53 mutations and recurrent somatic mutations in 9 genes
Identify a significantly mutated gene, PREX2 and obtain a comprehensive genomic view of melanoma
Acute myeloid leukemia
Identify mutations in relapsed genome and compare it to primary tumor. Discover two major clonal evolution patterns
Highlights the diversity of somatic rearrangements and analyzes rearrangement patterns related to DNA maintenance
31 WES, 46 WGS
Identify eighteen significant mutated genes and correlate clinical features of oestrogen-receptor-positive breast cancer with somatic alterations
103 WES, 17 WGS
Identify recurrent mutation in CBFB transcription factor gene and deletion of RUNX1. Also found recurrent MAGI3-AKT3 fusion in triple-negative breast cancer
Identify somatic copy number changes and mutations in the coding exons. Found new driver mutations in a few cancer genes
Acute myeloid leukemia
Discover that most mutations in AML genomes are caused by random events in hematopoietic stem/progenitor cells and not by an initiating mutation
Depict the life history of breast cancer using algorithms and sequencing technologies to analyze subclonal diversification
Head and neck squamous cell carcinoma
Identify mutation in NOTCH1 that may function as an oncogene
Examine intra-tumor heterogeneity reveal branch evolutionary tumor growth
Cancer is primarily caused by the accumulation of genetic alterations, which may be inherited in the germ line or acquired somatically during a cell’s life cycle. The effects of these alterations in oncogenes, tumor suppressor genes or DNA repair genes, allows cells to escape growth and regulatory control mechanisms, leading to the development of a tumor . The progeny of the cancer cell may also undergo further mutations, resulting in clonal expansion . As clonal expansion continues, clones eventually become invasive to its surrounding tissue and metastasize to distant areas from the primary tumor .
The sequencing of cancer genomes has revealed a number of novel cancer-related genes, especially in breast cancer. Recently, six papers reported their findings on large breast cancer dataset: TCGA performed exome sequencing on 510 samples from 507 patients , Banerji et al. conducted exome sequencing on 103 samples and whole genome sequencing on 17 samples, Ellis et al. did exome sequencing on 31 samples and whole genome sequencing on 46 samples , Stephens et al. applied exome sequencing on 100 samples, Shah et al. performed whole genome/exome and RNA sequencing on 65 and 80 samples of triple-negative breast cancers , and Nik-Zainal et al. performed whole genome sequencing on 21 tumor/normal pairs . Besides confirming recurrent somatic mutations in TP53, GATA3 and PIK3CA, these studies discovered novel cancer-related mutations. Although novel mutations occur at low frequency (less than 10%), mutations of specific genes are enriched in the subtype of breast cancers and could be grouped into cancer-related pathways. For example, mutations of MAP3K1 frequently occur in luminal A subtype [5, 7]. Pathways involving p53, chromatin remodeling and ERBB signaling are overrepresented in mutated genes . Furthermore, some mutations indicate therapeutic opportunities such as the mutant GATA3, which might be a positive predictive marker for aromatase inhibitor response .
Genomic sequencing has also helped characterize the mutation profile of colorectal cancer. For example, exome sequencing performed on 72 tumor-normal pairs identified 36,303 protein-altering somatic mutations. Further analysis for significantly mutated genes led to 23 candidates that included expected cancer genes such as KRAS, TP53 and PIK3CA and novel genes such as ATM, which regulates the cell cycle checkpoint. RNA sequencing identified recurrent R-spondin fusions, which might potentiate Wnt signaling and induce tumorigenesis . Another example includes exome sequencing performed on 224 tumor and normal pairs. This study identified 15 highly mutated genes in the hypermutated cancers and 17 in the non-hypermutated cancers. Among the non-hypermutated cancers, novel frequent mutations in SOX9, ARID1A, ATM and FAM123B were detected besides the known APC, TP53 and KRAS mutations. The analysis of the mutations and functional roles of SOX9, ARID1A, ATM and FAM123B suggested they are highly potential colorectal cancer-related genes. Non-hypermutated colon and rectum cancers were found to have similar patterns in genomic alternation. Whole genome sequencing of 97 tumors with matched normal samples identified the recurrent NAV2-TCF7L1 fusion .
What makes cancer a difficult disease to conquer has much to do with the evolution of cancer that results from the selection and genetic instability occurring in each clone, leading to heterogeneity in tumors . This idea was first proposed by Peter Nowell in 1976 as the clonal evolution model of cancer, which attempted to explain the increase in tumor aggressiveness over a period of time. Further work by other researchers in the 1980s supported this theory with studies of metastatic subclones from a mouse sarcoma cell line .
The wide application of NGS has revealed substantial insights into tumor heterogeneity and tumor evolution. Variations between tumors are referred to as intertumor heterogeneity, while variations within a single tumor are intratumor heterogeneity. Intertumor heterogeneity is recognized by different morphological phenotype, expression profiles and mutation and copy number variation patterns, categorizing tumors into different subtypes [27–31]. The mRNA-expression subtype was found to be associated with somatic mutation landscapes in the recent TCGA and Eillis et al.’s studies. [5, 7]. As a huge amount of somatic mutations generated by NGS, the picture emerges like that individual tumor is unique, each containing distinct mutation patterns. For instance, Stephens et al. found that there were 73 different combination possibilities of mutated cancer genes among the 100 breast cancers .
Intratumor heterogeneity can be recognized as non-identical cellular clones or subclones within a single tumor, indicating different histology, gene expression, and metastatic and proliferative potential. The ability to generate high-resolution data makes NGS a particularly useful tool for studying intratumor heterogeneity. A recent NGS-based study on renal cell carcinoma from four patients has successfully illuminated intratumor heterogeneity . For patient 1, the pre-treatment samples of the primary tumor and chest-wall metastasis went through exon-capture multi-region sequencing on DNA. Of the 128 validated mutations found in 9 regions of the primary tumor, 40 were ubiquitous, 59 were shared by some regions, and 29 were unique to specific regions, showing that genetic heterogeneity exists within a tumor and an “ongoing regional clonal evolution” . Most importantly, the study showed that a single biopsy of a tumor only reveals a small part of a tumor’s mutational landscape; from a single biopsy, about 55% of all mutations were detected in this tumor and 34% were shared by most regions of the tumor.
The ongoing and parallel evolution of cancer cells may establish and maintain intratumor heterogeneity. For example, phylogenetic relationships of the tumor regions in patient 1 and 2 by the renal cell carcinoma study revealed a branching rather than linear evolution of the tumor . Studies have also shown branching structures of evolution in breast cancer . According to the “Trunk-Branch Model of Tumor Growth” , there are somatic events that promote tumor growth, which represents the trunk of the tree in the early stage of tumor development. These somatic aberrations would most likely be ubiquitous at this stage. Over time, other somatic events, known as drivers, cause tumor heterogeneity to occur, which causes branching to take place in tumors as well as in metastatic sites. Later, these branches will evolve and become more isolated, resulting in a ‘Bottleneck Effect’ that can result in chromosomal instability, allowing further expansion of tumor heterogeneity . This leads to the tumor’s ability to adapt and survive in changing environments, which affects the success of drug treatment . Therefore, it is important to examine tumor clonal structure and identify common mutations located in the trunk of the phylogenetic tree, which may help understand target therapy resistance and discover more robust therapeutic approaches.
Besides allowing researchers to understand mutations in cancer, NGS has already been applied to the clinic in many areas including prenatal diagnostics, pathogen detection, genetic mutations, and more . Although genetic mutations have been identified with Sanger sequencing, PCR, and microarrays in clinical application, these three have limitations that don’t apply to NGS. For example, although microarrays can detect single nucleotide variants (SNVs), they have trouble identifying larger DNA aberrations, e.g., large indels and structural rearrangements, which are common in cancer. In contrast, whole exome and whole-genome sequencing can provide the clinician a comprehensive view of the DNA aberrations, genetic recombination, and other mutations [28, 32]. Therefore, NGS platforms serve as a good diagnostic and prognostic tool and help clinicians identify specific characteristics in each patient, paving the road towards personalized medicine.
NGS has already been applied in the clinic for cancer diagnosis and prognosis. For example, whole genome sequencing identified a novel insertional fusion that created a classic bcr3 PML-RARA fusion gene for a patient with acute myeloid leukemia and the findings altered the treatment plan for the patient . By sequencing the tumor genome of a patient, clinicians are able to design patient-specific probes that uses DNA in the patient’s blood serum to monitor the progress of a patient’s treatment and detect for any signs of relapse [27–31]. The discovery of more biomarkers and the development of target-therapies will be essential in helping a clinician choose the best personalized treatment for his or her patients.
Active cancer studies using NGS as the primary outcome measure
NCT#/# Enrolled/Start Date
Tumor Specific Plasma DNA in Breast Cancer/Dartmouth-Hitchcock Medical Center
Analyze chromosomal rearrangements and genomic alterations
Whole genome sequencing
Whole Exon Sequencing of Down Syndrome Acute Myeloid Leukemia/Children’s Oncology Group
Examine DNA samples of patients with Leukemia and Down Syndrome and identify DNA alterations
Whole exome Sequencing
Studying Genes in Samples From Younger Patients with Adrenocortical Tumor/Children’s Oncology Group
Study genes from patients with adrenocortical tumor
Whole genome Sequencing
Feasibility Clinical Study of Targeted and Genome-Wide Sequencing/University Health Network, Toronto
Identify gene mutations in cancer patients
Whole genome sequencing
An Ancillary Pilot Trial Using Whole Genome Sequencing in Patients with Advance Refractor Cancer/Scottsdale Healthcare
Investigate patients with cancer that are using Phase I drugs and its effect on the patient
Whole genome Sequencing
Cancer Genome Analysis/Seoul National University Hospital
Identify and analyze genetic alterations in tumors for therapeutic agents
Targeted Sequencing, whole exome sequencing and RNA-seq
RNA Biomarkers in Tissue Samples From Infants with Acute Meyloid Leukemia/Children’s Oncology Group
Analyze tissue samples and identify biomarkers from RNA
Molecular Analysis of Solid Tumors/St. Jude Children’s Research Hospital
Pediatric Solid Tumors
Analyze gene expression profiles of tumor and examine genetic alterations
Whole genome Sequencing
Deep Sequencing of the Breast Cancer Transcriptome/University of Arkansas
Examine transcriptional regulation and triple negative breast cancer
Computational tools for cancer genomics
Function effect of mutation
Computational tools for cancer transcriptomics
Comprehensive cancer projects and resources
Comprehensive cancer projects
The Cancer Genome Atlas
A joint effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies
International Cancer Genome Consortium
International consortium with the goal of obtaining comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different cancer types and/or subtypes of clinical and societal importance across the globe
Cancer Genome Anatomy Project
Interdisciplinary program to determine the gene expression profiles of normal, precancer, and cancer cells, leading eventually to improved detection, diagnosis, and treatment for the patient
Cancer Genome Project
To identify somatically acquired sequence variants/mutations and hence identify genes critical in the development of human cancers
The Clinical Proteomic Tumor Analysis Consortium
A comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies
Catalogue of Somatic Mutations in Cancer
Copy number abnormalities in human cancer from CGH experiments
An information resource and analysis platform for study interplay of DNA methylation, gene expression and cancer
Integrates multidimensional OncoGenomics Data for the identification of genes and groups of genes involved in cancer development
A cancer microarray database and integrated data-mining platform
Provides visualization, analysis and download of large-scale cancer genomics data sets
Provides L3 data and L4 analyses packaged in a form amenable to immediate algorithmic analysis
UCSC Cancer Genomics Browser
A suite of web-based tools to visualize, integrate and analyze cancer genomics and its associated clinical data
Cancer Genome Workbench
Hosts mutation, copy number, expression, and methylation data from a number of projects, including TCGA, TARGET, COSMIC, GSK, NCI60. It has tools for visualizing sample-level genomic and transcription alterations in various cancers.
The data and the results from these projects are freely available to the research community (Table 5). A number of databases and frameworks have been developed to make the data and the results easily and directly accessible. For example, the results from CGP are collated and stored in http://COSMIC. The cBio Cancer Genomics Portal, containing dataset from TCGA and published papers, is specifically designed to interactively explore multidimensional cancer genomics data, including mutation, copy number variations, expression changes (microarray and RNA-seq), DNA methylation values, and protein and phosphoprotein levels . Intogen is also a framework that facilitates the analysis and integration of multimensional data for the identification of genes and biological modules critical in cancer development . The Broad GDAC Firehose, designed to coordinate the various tools utilized by TCGA, provides level 3 and level 4 analyses and enables researchers to easily incorporate TCGA data into their projects. Table 5 also includes resources useful for cancer research but not built on NGS data, e.g., Progenetix .
Although NGS has already helped researchers discover a plethora of information in the field of cancer, challenges in translating the large amounts of oncogenomics data into information that can be easily interpretable and accessible for cancer care still lie ahead. From a computational point of view, many technical and statistical issues remain unsolved. For example, repetitive DNA represents a major obstacle for the accuracy of read alignment and assembly, as well as structure variation detection . Furthermore, it is difficult to distinguish rare mutations in tumor from sequencing and alignment artifacts, especially when a tumor has low purity. Despite new methods to comprehensively catalogue genomic variants, the prediction of their functional effect and the identification of disease-causal variants are still in an early phase . Current algorithms for quantifying isoform expression are not computationally trivial and are incredibly difficult to explain. Although the concept of integrative analysis is not new, predictive networks or pathway models that combine various omics data are still underway. Most importantly, since sequencing technologies and methodologies are both evolving rapidly, it is a difficult challenge to store, analyze and present the data in a method that is transparent and reproducible . On the other hand, tumor complexity and heterogeneity make the analysis and the interpretation of sequencing data even harder. Heterogeneity is dynamic and evolves over time. This challenges the simple notion of binning mutations as tumorigenesis ‘driver’ and neutral ‘passenger’, since some passengers are also drivers just waiting for the right context .
From a clinical point of view, a major challenge is to assess genomic variants as potential therapeutic targets. Although many diverse variants are demonstrated to converge on similar deregulated pathways, there is still a lack of pathway-targeted therapies. With the discovery of intra-tumor heterogeneity, questions have been raised about how well a glimpse of a tumor’s genomic landscape can steer the treatment. Currently, many clinicians decide a treatment based on the genetic markers from a few biopsies. Whether these markers are over- or under-represented in the tumor is unknown, causing the selection of treatment to be difficult . In addition to heterogeneity, the tumor’s ability to evolve allows it to have more opportunities to adapt and survive to various treatments. Some researchers hope that with current target therapies, intratumor heterogeneity will decrease to a certain point  so that clinicians can then target the non-responsive clones before a tumor re-growth and more mutations can occur; however, choosing an appropriate target therapy will be a challenge. A few researchers have already shown certain treatments, such as the cytotoxic therapies, that have increased genome instability and diversity, resulting in a faster tumor evolution rate and, thus, heterogeneity. The fact is that this area of cancer is understudied ; however, one of the key challenges researchers must solve is identifying branched subclones are resistant to which target therapies. More knowledge of network medicine and the interaction between the trunk and branch mutations may lead to appropriate target therapies and personalized therapeutic strategies that can prevent drug resistance and effectively eradicate cancer [26, 91].
To accelerate the rate of translating genomic data into clinical practice, a sustained collaboration among multiple centers and effective communication among bioinformaticians, statistical geneticists, molecular biologists and physician are required. Bioinformaticians and statistical geneticists are responsible for providing reproducible and accurate analysis, identifying ‘drivers’ in the unstable and evolving cancer genome and building powerful and flexible integrative model to consider interactions among genomic, transcriptomic, metabolomics, proteomics and epigenomic alterations in the context of tumor microenvironment. Biologists interpret and confirm the functional relevance of variants to cancer. Physicians assess relationships of variants to cancer prognosis and response to therapy. Appropriate infrastructure within each research institution that integrates the clinic for patient samples, wet lab for sequencing, and Bioinformatics for data analysis should allow the sequenced data to be processed efficiently, producing results that can create effective personalized therapies applicable to the clinic. In addition, easily accessible and understandable databases that connect genomic findings with clinical outcome are also required. With these efforts and developments, NGS will greatly potentiate genome-based cancer diagnosis and personalized treatment strategies.
This work was supported by National Cancer Institute grants U01 CA163056, P30 CA068485, P50 CA098131, and P50 CA090949 and QL’s work was partially supported by the State Key Program of National Natural Science of China (no. 31230058) and the National Natural Science Foundation of China (no. 31070746).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.