Understanding complex traits by non-linear mixed models - Kölner UniversitätsPublikationsServer

Stephan, Johannes (2015). Understanding complex traits by non-linear mixed models. PhD thesis, Universität zu Köln. Open Access

Preview

PDF
master.pdf - Accepted Version
Bereitstellung unter der CC-Lizenz: Creative Commons Attribution.
Download (1MB)

Abstract

Population structure and other nuisance factors represent a major challenge for the analysis of genomic data. Recent advances in statistical genetics have lead to a new generation of methods for quantitative trait mapping that also account for spurious correlation as caused by population structure. In particular, linear mixed models (LMMs) gained considerable attention as they enable easy black box-like control for population structure in a wide range of genetic designs and analysis settings. The aim of this work is to transfer the advantages of LMMs into a random bagging framework in order to simultaneously address a second pressing challenge: the recovery of complex non-linear genetic effects. Existing methods that allow for identifying such relationships like epistasis typically do not provide any robust and interpretable means to control for population structure and other confounding effects. The method we present here is based on random forests, a bagged variant of the well established decision trees. We show that the proposed method greatly improves over existing methods not only in identifying causal genetic markers but also in the prediction of held out phenotypic data.

Item Type:	Thesis (PhD thesis)
Translated abstract:	Abstract Language Populationsstrukturen sowie andere unerwünschte Faktoren erschweren häufig die Analyse genomischer Daten. Aufgrund von Fortschritten in der statistischen Genetik sind neuere Methoden in der Lage, unerwünschte Korrelationen, die z.B. durch Populationsstrukturen entstehen, zu korrigieren. Insbesondere haben lineare Mixed Models stark an Popularität gewonnen. Durch ihre anwenderfreundliche Kontrolle der Populationsstruktur sind sie für viele genetische Strukturen und in vielen Studiendesigns anwendbar. Ziel dieser Arbeit ist es, die Vorteile der linearen Mixed Models mit denen eines Random Bagging Verfahrens zu vereinen, um das Finden komplexer genetischer Effekte, zu erleichtern. Bestehende Methoden, die solche Signale wie Epistasis erkennen, sind bisher nicht in der Lage, Populationsstrukturen und andere Störfaktoren zu berücksichtigen. Die hier vorgestellte Methode ist eine Erweiterung des Random Forests, eines Random Bagging-Verfahrens welches auf Entscheidungsbäumen basiert. Wie auch bei linearen Mixed Models korrigiert es Störfaktoren durch einen Random Effect. Mit Hilfe von simulierten und realen Daten zeigen wir, dass diese neue Methode nicht nur mehr kausale genetische Marker gegenüber bestehenden Ansätzen findet, sondern auch die Vorhersage ungesehener Phenotypen verbessert. German
Creators:	Creators Email ORCID ORCID Put Code Stephan, Johannes joh.stephan@gmail.com UNSPECIFIED UNSPECIFIED
URN:	urn:nbn:de:hbz:38-63508
Date:	2015
Language:	English
Faculty:	Faculty of Mathematics and Natural Sciences
Divisions:	Faculty of Mathematics and Natural Sciences > Department of Biology > Institute for Genetics
Subjects:	Data processing Computer science Life sciences
Uncontrolled Keywords:	Keywords Language associattion mapping UNSPECIFIED random forests UNSPECIFIED regression UNSPECIFIED
Date of oral exam:	14 October 0008
Referee:	Name Academic Title Beyer, Andreas Prof. Dr. Tresch, Achim Prof. Dr. Stegle, Oliver Dr.
Refereed:	Yes
URI:	http://kups.ub.uni-koeln.de/id/eprint/6350

Downloads

Downloads per month over past year

Export

Actions (login required)

View Item