Raul Ivan Perez Martell
-
MSc (University of Victoria, 2020)
-
BSc (Monterrey Institute of Technology and Higher Education, Mexico, 2016)
Topic
Raul Ivan Perez Martell
Department of Computer Science
Date & location
- Monday, May 12, 2025
- 8:00 A.M.
- Virtual Defence
Reviewers
Supervisory Committee
-
Dr. Ulrike Stege, Department of Computer Science, University of Victoria (Co-Supervisor)
-
Dr. Hosna Jabbari, Department of Computer Science, UVic (Co-Supervisor)
-
Dr. Julian Lum, Department of Biochemistry and Microbiology, UVic (Outside Member)
External Examiner
-
Dr. Dan Tulpan, Department of Animal Biosciences, University of Guelph
Chair of Oral Examination
-
Dr. Juergen Ehlting, Department of Biology, UVic
Abstract
Human diversity often manifests through single nucleotide polymorphisms (SNPs). Among these polymorphisms, SNPs that alter amino acids can modify a protein’s three-dimensional (3D) structure. These SNPs are known as missense mutations and can impact the protein’s function and potentially elicit diseases or affect drug interactions. Thus, understanding protein single point mutations is crucial for precision medicine, as it helps tailor treatments based on individual genetic variations.
Protein tertiary structure prediction models like AlphaFold2 have revolutionized the field with unprecedented accuracy, yet predicting structural changes arising from single amino acid mutations remains a challenge. The complexity introduced by these mutations calls for models that can incorporate mutational information into their predictions. As atomic locations can be susceptible to any number of changes that might or might not affect function, we focus on the secondary structure to provide concrete results on possible protein structural deformation that may occur from missense mutations.
We assess state-of-the-art structure prediction methods regarding backbone deformations caused by missense mutations. We categorize these deformations as local, distant, or global based on the proximity of structural changes to the mutation site. Our analysis utilizes a diverse dataset from the Protein Data Bank, comprising over 500 protein clusters with experimentally determined structures and documented mutations.
Our findings indicate that missense mutations can significantly affect the accuracy of structure prediction methods. These mutations often lead to predicted structural changes even when the actual secondary structures remain unchanged, suggesting that current methods overestimate the impact of missense mutations. This issue is particularly evident in advanced prediction algorithms, which struggle to accurately model proteins with stable mutations. We also found that the addition of low-performing prediction methods during structural analysis can positively impact the results on some proteins, particularly those with low homology. Furthermore, proteins that form complexes or bind ligands—such as membrane and transport proteins—are inaccurately predicted due to the absence of extra-molecular interaction data in the models, highlighting how missense mutations can complicate accurate structure prediction.
Due to these findings, we propose a novel refinement strategy for protein secondary structure prediction that leverages missense mutational data. As part of this strategy, we introduce Mut2Dens, a model that not only yields more consistent predictions for mutational data but also maintains robust predictive performance on non-mutational datasets. These refined models take multiple predicted secondary structures and generate a mutation-aware secondary structure.
In particular, Mut2Dens employs the extremely randomized trees (ExtraTree) algorithm to avoid overfitting and make effective use of the limited mutational data available from experimentally determined three-dimensional structures. By combining predictions from highly accurate structure prediction models, we create an ensemble that integrates their strengths while enhancing mutational capabilities. This refinement strategy also improves the non-mutational performance of state-of-the-art methods by addressing their most inaccurate and least confident predictions.
Moreover, our refinement strategy reduces improbable outcomes in mutated protein structures—such as transforming π-helices into β-sheets—that can still occur in current prediction models. Finally, by using interpretable machine learning algorithms, we can reveal the underlying biological knowledge from the refinement model; the insights gained from Mut2Dens can be corroborated with known mutational outcomes, helping users pinpoint discrepancies across structure prediction models and make more informed decisions regarding the predicted structures.