Unraveling the puzzle of genomic ORFans
May 4, Wed, 2005
1:30pm - 2:30pm, Apollo Room,
Genomic ORFans are ORFs (Open Reading Frames) lacking significant sequence similarity to other ORFs in the databases. Because little can be inferred about the function and structure of ORFans using standard bioinformatics tools, ORFans have been referred to as "mysteries". Genome sequencing of complete organisms has demonstrated that ORFans have become standard companions of most newly sequenced genomes and their fraction in newly sequenced genomes ranges between 20 and 30% of all ORFs. Thus, ORFans are accumulating in our databases at a rapid pace. Because very few ORFans have been studied experimentally, many speculations regarding their roles and origins have been proposed, including suggestions that most ORFans may not correspond to expressed or functional proteins.
Here, I will discuss computational studies carried out in my lab aimed at understanding the roles and origins of ORFans. I will discuss computational and experimental evidence that strongly suggests that most ORFans do correspond to real, expressed proteins. Then, I will show how structural biology is being essential in unraveling the ORFan puzzle. I will also discuss how it is becoming increasingly evident that 3D protein space might be much more diverse than previously thought. One consequence is that even after the structures of tens of thousands proteins are determined, many proteins will remain beyond the so-called "homology-modeling" distance to a known structure. Finally, I will address the urgent need for more accurate computational methods aimed at generating accurate models for those proteins with no close homologues of known structure.
In the "post-genomic" era, further computational and experimental studies will be essential for understanding the functions and evolutionary origins of genomic ORFans as well as for understanding the vast diversity observed in the genetic material.