Identification of candidate genomic regions by integrating cluster analysis and genome-wide association studies

Abstract

By identifying genomic variants responsible for life-threatening complex disorders, genome-wide association studies (GWAS) has gained great potential in improving precision medicine. However, due to the large number of association tests it employs, it becomes susceptible to error rate inflation and decrease in statistical power. This study aims to address these problems by integrating cluster analysis and GWAS in order to identify candidate genomic regions of possible relevance to a phenotype. This integrative approach reduces the number of tests by focusing more on significant genetic loci and the variants residing therein. Using Hamming distance as the similarity measure we conducted cluster analysis on SNPs associated with HBsAg seroclearance and subjected all the resulting SNP-sets to a Hamming-distance based association test. Results showed that all SNP-sets are significantly associated with HBsAg seroclearance. Furthermore, the set which obtained the highest degree of association contain SNPs which belong to the locus 11p which has been previously linked with HBsAg positivity.

Previous
Previous

Machine learning-based automation of COVID-19 screening using clinical dataset

Next
Next

Machine learning approach to the classification of hepatitis B surface antigen seroclearance in hepatitis B virus