February 3, Tue 2009
1:00 pm, MRB 100 Conference Room
Dr. Dong Xu
Computer Science Department, University of Missouri
MUFOLD: A New Protein Structure Prediction System
We have developed a system, MUFOLD, to predict tertiary structure from a protein sequence. The system consists of three components: (1) coarse-grain model construction using the Multi-dimensional Scaling (MDS) method, (2) coarse-grain model evaluation and refinement through clustering and scoring functions, and (3) full-atom model evaluation and refinement through molecular dynamics simulations. MUFOLD constructs models in four key steps: (1) to identify compatible structural fragments of variable lengths in PDB for a given query protein; (2) to formulate pair-wise spatial constraints derived from the alignments between a query sequence and its fragment hits of known structures; (3) to apply MDS for generating initial 3-D structural models from the spatial restraints; and (4) to refine initial models iteratively by the refined spatial restraints. The coarse-grain model evaluation applies machine-learning methods based on various scoring functions, including OPUS, Model Evaluator, Rapdf, Dfire, Hopp, etc. The scoring functions are normalized to z-scores. Full-atom model evaluation uses a gradual heating procedure through all-atom molecular dynamics simulations. With the assumption that more accurate protein structure is also more stable, we rank the quality of predicted structures by comparing their unfolding rates during the heating. The MUFOLD team achieved good prediction performance at the worldwide protein structure prediction contest CASP in 2008.
He is giving an additional seminar:
February 4, Wed 2009
9:30 am, Nichols 246 (Executive Conference Room)
A New Machine Learning Approach for Protein Phosphorylation Site Prediction in Plants
Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylation site data are rapidly accumulating. We developed P3DB (http://www.p3db.org/), a comprehensive resource of protein phosphorylation data from multiple plants. With a web-based user interface, the database is browsable, downloadable and searchable by protein accession number, description and sequence. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinases in plants and the fact that current phosphorylation prediction tools are trained with kinase-specific phosphorylation data from non-plant organisms. We proposed a new approach for phosphorylation site prediction. We incorporate protein sequence information and protein disordered regions, and integrate machine-learning techniques of k-nearest neighbor and support vector machine for predicting phosphorylation sites. Test results show good performance of our proposed phosphorylation site prediction method. Our method allowed us to assess phosphorylation patterns on the proteome scale in plants.