Nguyen, Hai Chau (2015). Maximum entropy models in the analysis of genome-wide data in cancer research. PhD thesis, Universität zu Köln.
|
PDF
thesis.pdf - Published Version Download (8MB) |
Abstract
This thesis studies the maximum entropy principle in statistical modelling. Applications are taken from the emerging field of cancer genomics. We start with a short introduction to the biology of cancer in chapter 1. In chapter 2, we discuss general principles of statistical modelling. We discuss in detail the maximum entropy principle in statistical modelling. In particular, we show that many statistical models can be put in a unified framework based on the principle of maximum entropy, which maps them into problems of statistical mechanics. In chapter 3, we consider a particular maximum entropy model, the Ising model, in the context of the inverse Ising problem. We introduce a Bethe–Peierls approximation to the inverse Ising problem. We then also suggest a modification for the mean-field approximation to work at low temperatures. The following chapters apply maximum entropy models to different problems of cancer genomics. A direct application of the inverse Ising problem to gene copy-number data of cancer cells is described in chapter 4. In chapter 5, we extend the concepts of indirect correlations and direct couplings of the inverse Ising problem to investigate the influence of gene copy-numbers on gene expressions in cancer cells. We show that the correlations in gene expression need not be due to regulatory interactions between genes. Instead, correlations in gene expression of cancer cells can be induced by the correlations in their copy-numbers, which is due to the geometrical organisation of the genome. We show that a simple maximum entropy-model can disentangle copy-number-induced correlations and the so-called “bare-correlations” in gene expression, which capture the effect of regulatory interactions alone. Chapter 6 is devoted to cancer classification. We introduce a simple semi-supervised learning algorithm to train a mixture of paramagnetic models with Ising spins to classify cancer mutation profiles. We show that, with the capability of both learning from unlabelled samples and correcting mislabelled samples, this learning algorithm outperforms both the supervised and unsupervised learning algorithms. The two appendices A and B summarise recent studies on sensitivity and resistance of cancer cells to therapy. The results of chapter 3 were published in H. C. Nguyen and J. Berg (2012a). “Bethe– Peierls approximation and the inverse Ising problem”. J. Stat. Mech. P03004; and H. C. Nguyen and J. Berg (2012b). “Mean-field theory for the inverse Ising problem at low temperatures”. Phys. Rev. Lett. 109, p. 50602. Some results of chapter 6 were published as a part of The Clinical Lung Cancer Genome Project (CLCGP) and Network Genomic Medicine (NGM) (2013). “A genomics-based classification of human lung tumors”. Science Transl. Med. 5.209, 209ra153.
Item Type: | Thesis (PhD thesis) | ||||||||
Translated abstract: |
|
||||||||
Creators: |
|
||||||||
URN: | urn:nbn:de:hbz:38-60289 | ||||||||
Date: | 16 January 2015 | ||||||||
Language: | English | ||||||||
Faculty: | Faculty of Mathematics and Natural Sciences | ||||||||
Divisions: | Faculty of Mathematics and Natural Sciences > Department of Physics > Institute for Theoretical Physics | ||||||||
Subjects: | Data processing Computer science Physics |
||||||||
Uncontrolled Keywords: |
|
||||||||
Date of oral exam: | 16 January 2015 | ||||||||
Referee: |
|
||||||||
Refereed: | Yes | ||||||||
URI: | http://kups.ub.uni-koeln.de/id/eprint/6028 |
Downloads
Downloads per month over past year
Export
Actions (login required)
View Item |