Genotyping by sequencing from sparse sequenced genomes representations from bi- and multi- parental mapping population using a HMM approach

Patel, Vipul Kumar (2016). Genotyping by sequencing from sparse sequenced genomes representations from bi- and multi- parental mapping population using a HMM approach. PhD thesis, Universität zu Köln. Open Access

Preview

PDF
Thesis.pdf - Accepted Version
Download (7MB)

Abstract

Genotyping is one key element for successfully carrying out molecular breeding, gene network discovery or assessment of genetic diversity. The onset of next generation sequencing has enabled high-resolution genotyping of thousands or millions of markers per individual in one analysis. Such dense information can be used to identify genetic loci associated with a trait of interest. Development of multiplexing allows sequencing of whole populations in a single run, vastly reducing inputs of time and money per sample. This high throughput genotyping is known as genotyping-by-sequencing (GBS). However, there is a trade-off for using GBS, as the total number of reads per run must be distributed across all samples, leading to a reduction of coverage per sample. The distribution of the total reads is currently not uniform, which leads to samples with only partial sequence coverage. This thesis presents a solution for handling such data by imputing missing markers based on a Hidden Markov Model approaches for bi- and multi- parental mapping populations. The developed methods were not only validated by simulation studies but also applied to several real mapping population datasets. For the bi-parental mapping population, data were derived from three different taxa (Arabidopsis thaliana, Sorghum bicolor and Fragaria vesca) and for the multi-parental mapping population the Arabidopsis multi-parental RIL (AMPRIL) population was genotyped. The successful high resolution genotyping of such mapping populations with sparse sequencing data demonstrates the advantages of the developed method and the positive effects for downstream analysis e.g. for quantitative trait analysis or genome-wide-association studies. This thesis additionally provides a theoretical approach and implementation for a hybrid correction approach of sequencing errors in third generation sequencing data from Pacific Biosciences.

Item Type:	Thesis (PhD thesis)
Translated abstract:	Abstract Language Die Genotypisierung ist ein wichtiges Verfahren, insbesondere für eine erfolgreiche molekulare Züchtung, bei der Aufdeckung von Gennetzwerken oder der Ermittlung der genetischen Vielfalt einer Population. Besonders durch die Einführung von „Next-generation-sequenzierung“, gelang es Millionen von neuen und unbekannten Markern pro Individuum zu genotypisieren. Die so gewonnene Informationsdichte erlaubt es, eine effektive Analyse der Beziehung zwischen Genen und deren Eigenschaften aufzudecken. Für solche komplizierten Analysen müssen mehrere hundert Individuen sequenziert werden, was einem hohen Investitionsaufwand entspricht. Mit der Einführung von „multiplexing“ wurde es möglich, Individuen gleichzeitig parallel zu sequenzieren und zu genotypisieren. Diese Methode wird als „Genotyping by sequencing“ (GBS) bezeichnet. Sie hat aber den Nachteil, dass nicht alle Individuen gleichmäßig sequenziert werden. Es gibt somit Individuen, deren Genome nur teilweise sequenziert werden. Dies reduziert die Anzahl der Marker, welche genotypisiert werden können. In dieser Arbeit stellen wir eine Lösung vor welche mit Hilfe eines statistischen Modells, dem „Hidden Markov Model“ fehlende Informationen vorhersagen kann. Es wurden zwei Modele entwickelt für Populationen von zwei oder mehr Eltern. Die entwickelten Methoden wurden mit simulierten Daten getestet und auf tatsächlich vorhandenen Population angewendet: für Populationen generiert aus zwei Eltern (Arabidopsis thaliana, Sorghum bicolor and Fragaria vesca) und für mehrere Eltern, die Arabidopsis multi-parental RIL Population. Die Anwendung unserer Methoden auf diese Populationen half, neue Erkenntnisse und Kandidatengene zu finden. Zusätzlich zum Thema „Genotyping by sequencing“ wird ein Algorithmus behandelt, welcher die Korrektur von langen Sequenzeninformation geeignet ist, die von der Technologie Pacific Bioscience generiert wurden. German
Creators:	Creators Email ORCID ORCID Put Code Patel, Vipul Kumar patel.kumar.vipul@gmail.com UNSPECIFIED UNSPECIFIED
Corporate Creators:	Max-Planck-Institut für Züchtungsforschung in Köln, Abteilung für Entwicklungsbiologie der Pflanzen
URN:	urn:nbn:de:hbz:38-70297
Date:	7 March 2016
Language:	English
Faculty:	Faculty of Mathematics and Natural Sciences
Divisions:	Faculty of Mathematics and Natural Sciences > Department of Biology > Institute for Genetics
Subjects:	Data processing Computer science Life sciences
Uncontrolled Keywords:	Keywords Language Genotyping-by-sequencing, Imputing, Next-generation-sequencing, Sparse Seqeuncing, Hidden-Markov-Model English
Date of oral exam:	18 April 2016
Referee:	Name Academic Title Coupland, George Prof. Dr. Tresch, Achim Prof. Dr.
Refereed:	Yes
URI:	http://kups.ub.uni-koeln.de/id/eprint/7029

Downloads

Downloads per month over past year

Export

Actions (login required)

View Item