Understanding complex traits by non-linear mixed models - Kölner UniversitätsPublikationsServer

Stephan, Johannes (2015). Understanding complex traits by non-linear mixed models. PhD thesis, Universität zu Köln. Open Access

Preview

PDF
master.pdf - Accepted Version
Bereitstellung unter der CC-Lizenz: Creative Commons Attribution.
Download (1MB)

Abstract

Population structure and other nuisance factors represent a major challenge for the analysis of genomic data. Recent advances in statistical genetics have lead to a new generation of methods for quantitative trait mapping that also account for spurious correlation as caused by population structure. In particular, linear mixed models (LMMs) gained considerable attention as they enable easy black box-like control for population structure in a wide range of genetic designs and analysis settings. The aim of this work is to transfer the advantages of LMMs into a random bagging framework in order to simultaneously address a second pressing challenge: the recovery of complex non-linear genetic effects. Existing methods that allow for identifying such relationships like epistasis typically do not provide any robust and interpretable means to control for population structure and other confounding effects. The method we present here is based on random forests, a bagged variant of the well established decision trees. We show that the proposed method greatly improves over existing methods not only in identifying causal genetic markers but also in the prediction of held out phenotypic data.

Item Type:

Thesis (PhD thesis)

Translated abstract:

Abstract

Language

Populationsstrukturen sowie andere unerwünschte Faktoren erschweren häufig die Analyse genomischer Daten. Aufgrund von Fortschritten in der statistischen Genetik sind neuere Methoden in der Lage, unerwünschte Korrelationen, die z.B. durch Populationsstrukturen entstehen, zu korrigieren. Insbesondere haben lineare Mixed Models stark an Popularität gewonnen. Durch ihre anwenderfreundliche Kontrolle der Populationsstruktur sind sie für viele genetische Strukturen und in vielen Studiendesigns anwendbar. Ziel dieser Arbeit ist es, die Vorteile der linearen Mixed Models mit denen eines Random Bagging Verfahrens zu vereinen, um das Finden komplexer genetischer Effekte, zu erleichtern. Bestehende Methoden, die solche Signale wie Epistasis erkennen, sind bisher nicht in der Lage, Populationsstrukturen und andere Störfaktoren zu berücksichtigen. Die hier vorgestellte Methode ist eine Erweiterung des Random Forests, eines Random Bagging-Verfahrens welches auf Entscheidungsbäumen basiert. Wie auch bei linearen Mixed Models korrigiert es Störfaktoren durch einen Random Effect. Mit Hilfe von simulierten und realen Daten zeigen wir, dass diese neue Methode nicht nur mehr kausale genetische Marker gegenüber bestehenden Ansätzen findet, sondern auch die Vorhersage ungesehener Phenotypen verbessert.

German

Creators:

Creators	Email	ORCID	ORCID Put Code
Stephan, Johannes	joh.stephan@gmail.com	UNSPECIFIED	UNSPECIFIED

URN:

urn:nbn:de:hbz:38-63508

Date:

2015

Language:

English

Faculty:

Faculty of Mathematics and Natural Sciences

Divisions:

Faculty of Mathematics and Natural Sciences > Department of Biology > Institute for Genetics

Subjects:

Data processing Computer science
Life sciences

Uncontrolled Keywords:

Keywords	Language
associattion mapping	UNSPECIFIED
random forests	UNSPECIFIED
regression	UNSPECIFIED

Date of oral exam:

14 October 0008

Referee:

Name	Academic Title
Beyer, Andreas	Prof. Dr.
Tresch, Achim	Prof. Dr.
Stegle, Oliver	Dr.

Refereed:

Yes

URI:

http://kups.ub.uni-koeln.de/id/eprint/6350

Downloads

Downloads per month over past year

Export

Actions (login required)

View Item