Adesoji, Oluyomi Modupe ORCID: 0000-0002-8583-7170 (2023). Benchmarking of univariate pleiotropy detection methods, with an application to epilepsy phenotypes. PhD thesis, Universität zu Köln.

[img] PDF
Oluyomi_Adesoji_Thesis.pdf - Accepted Version

Download (29MB)

Abstract

Over the past decades, various methods have been used to scan the human genome to identify genetic variations associated with diseases, in particular with common, complex disorders. One of such approaches is the genome-wide association study (GWAS), which compares genetic variation between affected and healthy individuals to find genomic variants in the DNA sequence associated with a trait. GWAS are usually conducted separately for individual traits, and the same single nucleotide polymorphisms (SNP)/loci are associated with different traits in independent studies 7-10. These findings buttress the knowledge that most complex traits are correlated and have shared genetic architecture, therefore, sharing the same heritable risk factors11. Knowledge of the genetic risk factors can directly or indirectly contribute to improvements in risk assessment, drug target development, and ultimately in providing effective therapies to the affected individuals. Pleiotropy is the phenomenon of a hereditary unit affecting more than one trait, and the earliest reported evidence was provided by Mendel when he noted that some set of features were always observed together in a plant. Although this example could have been purely due to linkage and could be regarded as spurious pleiotropy in recent times, it opened up more discussion and research into pleiotropy, which has since been an active area of research12. In this work, I focused on complex epilepsies and the overlap in the genetic factors impacting their phenotypes. Epilepsy is a brain disorder comprising monogenic and common/complex forms characterized by recurrent partial or generalized seizures. However, the extent to which genetic variants contribute to the disorder and how much of the genetic contribution is shared between the different phenotypes is not yet fully understood. This motivated this project, where I benchmarked available pleiotropy detection approaches to select the best performing method in terms of power and false-positive rate to detect true pleiotropy. Then, I applied the selected method to summary statistics of focal epilepsy (FE) and genetic generalized epilepsy (GGE), provided by the International League Against Epilepsy Consortium (ILAE) on complex epilepsies and the EPI25 collaborative, to identify shared genetic factors in both phenotypes of epilepsy. Identifying pleiotropic SNPs or genes is an active area of research with multiple proposed approaches, broadly categorized into univariate and multivariate methods. Multivariate approaches have the limitation that they require all phenotypes to be measured in the same individual and their corresponding genotype data provided, which is often not the case since GWAS are usually performed per specific trait. However, various consortia studying complex traits readily share the summary statistics (effect sizes and p-values) from genome-wide association studies, making it easier to apply univariate pleiotropy detection approaches that combine these statistics to identify SNPs or loci with a concordant or discordant direction of effects. Therefore, in this project, I first compared the relative power and false-positive rate (FPR) performance of five univariate pleiotropy detection approaches, classic meta-analysis, cFDR, PLACO, ASSET, and CPBayes (see section 6.1), through simulation studies. After that, I applied the best-performing method to the analysis of phenotypes of epilepsy using actual data. The data simulation procedure was performed in 3 steps. First, a population of 1 million individuals of European ancestry was simulated via resampling using the HAPGEN2 software13 and haplotypes of central Europeans from the 1000 genomes project14. In the second phase of the simulation, disease SNPs were randomly selected and used for the additive liability threshold model (ALTM)15 to simulate multifactorial disease phenotypes from the simulated genetic data. As expected, the performance of the methods varied in terms of power and false positive rate (FPR). The variability between the methods is higher for FPR, while most methods are comparable in terms of power, especially for larger sample sizes and RR. Although the classical meta-analysis is very powerful, it is also riddled with a very high false-positive rate, making it less suitable for identifying pleiotropic loci. While all the methods performed well in terms of power, the ASSET method gave a better trade-off between power and FPR for the different simulation approaches. Applying ASSET to the two phenotypes of epilepsy, GGE and FE, resulted in identifying a new putative locus 17q21.32 while replicating locus 2q24.3, previously reported by the ILAE consortium 16. Further, applying the ASSET method to summary statistics of larger samples of epilepsy phenotypes resulted in the identification of loci 2q24.3 and 9q21.13. These findings corroborate the result obtained by the ILAE consortium through mega and meta-analysis. Classical meta-analysis (MA) is not recommended for pleiotropy detection, based on the simulation study results. Though MA demonstrated good power to detect pleiotropy, it also recorded high FPR across all simulation scenarios. However, the ASSET method is highly recommended as it kept the FPR low while demonstrating good power to detect pleiotropy. This study also contributed three new pleiotropic loci (2q24.3, 17q21.32, and 9q21.13) to understanding the relationship of genetic variation with epilepsy phenotypes and the inter-relationship between these phenotypes. Although the locus 17q21.32 could not be replicated in the larger sample set, it is not necessarily a false positive discovery. The locus was genome-wide significant for GGE but marginally significant for FE, which confirmed the trend observed in the FE cases in the EPI25 collaborative dataset, where no genome-wide significance result was found. Therefore, replication in an independent sample is desirable. One limitation of using the univariate pleiotropy detection approaches as seen with the classical MA is that one trait with a very low P-value could drive the observed pleiotropic association. Also, methods like cFDR and PLACO could only accommodate two traits, though this was not a challenge in this project. Despite these limitations, the presented work established a benchmark of the relative performance of the assessed methods and could also guide researchers in related fields in their future work. This study also contributed to understanding the shared genetic factors between GGE and FE with the expectation that larger sample sizes will lead to more discoveries.

Item Type: Thesis (PhD thesis)
Creators:
CreatorsEmailORCIDORCID Put Code
Adesoji, Oluyomi Modupeoadesoji@uni-koeln.deorcid.org/0000-0002-8583-7170UNSPECIFIED
URN: urn:nbn:de:hbz:38-649019
Date: 13 February 2023
Place of Publication: Cologne
Language: English
Faculty: Faculty of Mathematics and Natural Sciences
Divisions: Cologne Center for Genomics
Subjects: General statistics
Natural sciences and mathematics
Mathematics
Life sciences
Uncontrolled Keywords:
KeywordsLanguage
PleiotropyUNSPECIFIED
Meta-analysisUNSPECIFIED
Epilepsy PhenotypesUNSPECIFIED
Date of oral exam: 4 October 2022
Referee:
NameAcademic Title
Nothnagel, MichaelProfessor
Refereed: Yes
URI: http://kups.ub.uni-koeln.de/id/eprint/64901

Downloads

Downloads per month over past year

Export

Actions (login required)

View Item View Item