Research
Extracting oncogenes and oncogenic pathways from
insertional mutagenesis screens
An effective method for identifying candidate oncogenic
loci is retroviral insertion mutagenesis. Within a large set of
tumors induced by retroviral infection, a subset of viral
insertions from independent tumors that map to the same genomic
locus is referred to as a Common Insertion Site (CIS). CISs
often represent lesions that have been selected for during
tumorigenesis. To automatically detect CISs we have developed a
computational approach, the kernel convolution framework. This
approach finds CISs using a predefined significance level while
controlling the family-wise error and takes bias stemming from
preferential viral insertion sites into account. In contrast to
existing approaches, our method operates at any biologically
relevant scale, providing new insights in the behavior of CISs
across multiple scales. To find oncogenic lesions which are
collaborating events in tumorigenesis, we extended the kernel
convolution framework to detect the occurrence of multiple
independent insertions within one tumor at a higher frequency
than expected by chance. A novel, biclustering algorithm which
finds subsets of tumors with similar insertion profiles will
also be employed to extract pathway structures from the
insertion data.
Prioritizing candidate oncogenes based on genomic data
Gene expression data of tumor series have been extensively employed to predict outcome and response to therapy in breast cancer. To uncover the underlying mechanisms that drive these expression signatures, comparative genomic hybridization and microarray gene expression data from the same sample set is analyzed jointly. We developed SIRAC, a computational approach which detects genomic regions that are significantly enriched for BAC clones highly correlated with a particular outcome variable, such as a molecular subtype or outcome. We also developed KC-SMART, an adaptation of the kernel convolution framework to detect genomic regions that are significantly frequently aberrated in a set of tumors. In contrast to SIRAC, this approach is unsupervised, i.e. it does not require a labeling of the tumor set according to subtypes. Both approaches employ expression data to prioritize genes in a given aberrated region. Application of KC-SMART to a set of p53 deficient mouse tumors resulted in the detection of well-known oncogenes such as c-Met as well as promising new candidates.
Knowledge-based approaches for outcome prediction
Algorithms which exploit biological knowledge, such as the functional grouping of genes, could lead to more accurate predictors and enlarge the possibility of generating novel insights in the disease. Over-fitting can also be overcome by constructing larger datasets by collecting related gene expression data sets in a compendium. We employed an unsupervised approach to derive a set of modules (groups of functionally related genes with coordinated expression across a subset of tumors). By employing the module activity as input for training breast cancer outcome predictors, we revealed functional modules associated with breast cancer outcome. We studied modules extracted from several compendia, and performed extensive validation of these classifiers on datasets originating from different institutions. Modules derived from a single breast cancer dataset and a cancer specific compendium performs better compared to those derived from a human cancer compendium. A functional analysis of the modules revealed general processes involved in cancer, as well as very specific modules that are predictive across multiple datasets.
Mass spectrometry-based response prediction
Poly(ADP-ribose) polymerase (PARP)-inhibitors in combination with the DNA cross-linker, cisplatin, have recently been shown to have great potential as a treatment for patients carrying germline BRCA1 or BRCA2 mutations. The research questions we address include whether we are able to find markers for “BRCAness” and PARP-inhibitor treatment response. We are doing so by following an incremental procedure. First, we performed a comprehensive comparison of currently used, as well as novel mass spectrometry normalization approaches involving six mass spectrometry datasets, three classification approaches and 17 normalization techniques. Based on the results of this comparison, we constructed a computational workflow to process mass spectra. Next, using SELDI-TOF mass spectrometry, we study the proteomic contents of cell lysates and growth media of both BRCA(1,2)/p53-/- and p53-/- cell lines. When appropriate biomarkers have been identified, we will also analyze cell lysates from spontaneous tumors from BRCA1 and BRCA2 deficient mouse models. Simultaneously, we will study sera from these tumor bearing mice, to investigate the relationship between the markers obtained from tumor tissue and markers present in serum. If all these steps have been successfully completed, the approach can be tested on human tissue from BRCA1 and BRCA2 carriers. In addition to this, we hope to gain a better understanding of the pathways affected by PARP-inhibitors by integrating proteomic data with transcriptomic data, yielded by microarray expression profiling.