Skip to main content


Identification of protein-coding regions

Research Achievements

Identification of protein-coding regions

The identification of all protein-coding elements in a genome is a fundamental goal of gene annotation. In previous work, IGERT student Natalie Castellana demonstrated that tandem mass spectrometry (MS) can be used to identify novel protein-coding regions in genomes and improve gene annotation. Recently, the methods for identifying the novel coding regions using MS, and producing a refined gene annotation were developed into a fully automated pipeline that runs on a computer cluster. The pipeline is available via a web interface, and is currently being used for an annotation project in Zea mays. So far Natalie Castellana and her colleagues have confirmed 17,579 known proteins and identified 8,099 peptides in genome leading to the discovery of 90 novel protein-coding genes and over 700 other gene refinements. The entire annotation pipeline is generalizable to any organism. This work would not be possible without the IGERT support.