Perez-Palma, Eduardo ORCID: 0000-0003-0546-5141, May, Patrick ORCID: 0000-0001-8698-3770, Iqbal, Sumaiya, Niestroj, Lisa-Marie, Du, Juanjiangmeng, Heyne, Henrike O., Castrillon, Jessica A., O'Donnell-Luria, Anne ORCID: 0000-0001-6418-9592, Nuernberg, Peter, Palotie, Aarno, Daly, Mark and Lal, Dennis (2020). Identification of pathogenic variant enriched regions across genes and gene families. Genome Res., 30 (1). S. 62 - 72. COLD SPRING HARBOR: COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT. ISSN 1549-5469

Full text not available from this repository.

Abstract

Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value= 2.72 x 10(-11)). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2x10(-16)). All pathogenic variant enriched regions (PERs) identified are available online through PER viewer, a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation.

Item Type: Journal Article
Creators:
CreatorsEmailORCIDORCID Put Code
Perez-Palma, EduardoUNSPECIFIEDorcid.org/0000-0003-0546-5141UNSPECIFIED
May, PatrickUNSPECIFIEDorcid.org/0000-0001-8698-3770UNSPECIFIED
Iqbal, SumaiyaUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Niestroj, Lisa-MarieUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Du, JuanjiangmengUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Heyne, Henrike O.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Castrillon, Jessica A.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
O'Donnell-Luria, AnneUNSPECIFIEDorcid.org/0000-0001-6418-9592UNSPECIFIED
Nuernberg, PeterUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Palotie, AarnoUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Daly, MarkUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Lal, DennisUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
URN: urn:nbn:de:hbz:38-352225
DOI: 10.1101/gr.252601.119
Journal or Publication Title: Genome Res.
Volume: 30
Number: 1
Page Range: S. 62 - 72
Date: 2020
Publisher: COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
Place of Publication: COLD SPRING HARBOR
ISSN: 1549-5469
Language: English
Faculty: Unspecified
Divisions: Unspecified
Subjects: no entry
Uncontrolled Keywords:
KeywordsLanguage
SEQUENCE VARIANTS; PROTEIN; DATABASE; CONSEQUENCES; ANNOTATION; COMMONMultiple languages
Biochemistry & Molecular Biology; Biotechnology & Applied Microbiology; Genetics & HeredityMultiple languages
URI: http://kups.ub.uni-koeln.de/id/eprint/35222

Downloads

Downloads per month over past year

Altmetric

Export

Actions (login required)

View Item View Item