Ebler, Jana, Ebert, Peter ORCID: 0000-0001-7441-532X, Clarke, Wayne E., Rausch, Tobias ORCID: 0000-0001-5773-5620, Audano, Peter A., Houwaart, Torsten ORCID: 0000-0002-4525-7593, Mao, Yafei ORCID: 0000-0002-9648-4278, Korbel, Jan O., Eichler, Evan E., Zody, Michael C., Dilthey, Alexander T. and Marschall, Tobias (2022). Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nature Genet., 54 (4). S. 518 - 541. BERLIN: NATURE PORTFOLIO. ISSN 1546-1718

Full text not available from this repository.

Abstract

PanGenie is an alignment-free, k-mer-based tool that utilizes a haplotype-resolved pangenome reference to genotype a wide range of variants. Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (>= 50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

Item Type: Journal Article
Creators:
CreatorsEmailORCIDORCID Put Code
Ebler, JanaUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Ebert, PeterUNSPECIFIEDorcid.org/0000-0001-7441-532XUNSPECIFIED
Clarke, Wayne E.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Rausch, TobiasUNSPECIFIEDorcid.org/0000-0001-5773-5620UNSPECIFIED
Audano, Peter A.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Houwaart, TorstenUNSPECIFIEDorcid.org/0000-0002-4525-7593UNSPECIFIED
Mao, YafeiUNSPECIFIEDorcid.org/0000-0002-9648-4278UNSPECIFIED
Korbel, Jan O.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Eichler, Evan E.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Zody, Michael C.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Dilthey, Alexander T.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Marschall, TobiasUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
URN: urn:nbn:de:hbz:38-685046
DOI: 10.1038/s41588-022-01043-w
Journal or Publication Title: Nature Genet.
Volume: 54
Number: 4
Page Range: S. 518 - 541
Date: 2022
Publisher: NATURE PORTFOLIO
Place of Publication: BERLIN
ISSN: 1546-1718
Language: English
Faculty: Unspecified
Divisions: Unspecified
Subjects: no entry
Uncontrolled Keywords:
KeywordsLanguage
DE-NOVO CNVS; STRUCTURAL VARIATION; SEQUENCING READS; IMPUTATION; DISCOVERY; DUPLICATIONS; ASSOCIATION; THOUSANDS; ALIGNMENT; DISORDERMultiple languages
Genetics & HeredityMultiple languages
URI: http://kups.ub.uni-koeln.de/id/eprint/68504

Downloads

Downloads per month over past year

Altmetric

Export

Actions (login required)

View Item View Item