This is the home page of the Sequence Ontology Project (SO), a joint effort by genome annotation centres, including: WormBase, FlyBase, the Mouse Genome Informatics group, and the Sanger Institute, We are a part of the Gene Ontology Project and the Open Biomedical Ontologies (OBO) . Our aim is to develop an ontology suitable for describing biological sequences.


For questions, please send mail to the SO developers list at song-devel@lists.sourceforge.net

News

New Biosapiens protein feature coordinator Sandra Orchard (EBI) will replace Gabby Reeves at the end of July 08.
SO release (2.3) Version 2.3 of the Sequence Ontology is released Jan 08.
SO software grant awarded August 2007

Current SO Ontology files

Ontology CVS Summary Release
SO so.obo SO summary so_2_3
SOFA sofa.obo SO summary sofa_2_3
Cross-products so-xp.obo explanation so_xp_2_3

File formats:

The Sequence Ontologies use OBO flat file format specification version 1.2, developed by the Gene Ontology Consortium.

The ontology is also available in OWL from http://www.berkeleybop.org/ontologies. This is updated nightly and may be slightly out of sync with the current obo file.
OWL is generated from the obo file using go-perl. The resolvable URI for the current version of SO is http://purl.org/obo/owl/SO. As of Jan 25 2007, the transform has been updated from the old lossy transform to the new non-lossy mapping.

Summary

The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence. It encompasses both "raw" features, such as nucleotide similarity hits, and interpretations such as gene models. It also provides a rich set of attributes to describe these features such as "polycistronic" and "maternally imprinted".

The Sequence Ontologies are provided as a resource to the biological community. They have the following obvious uses:

  • To provide for a structured controlled vocabulary for the description of primary annotations of nucleic acid sequence, e.g. the annotations shared by a DAS server.
  • To provide for a structured representation of these annotations within databases. Were genes within model organism databases to be annotated with these terms then it would be possible to query all these databases for, for example, all genes whose transcripts are edited, or trans-spliced, or are bound by a particular protein.
  • To provide a structured controlled vocabulary for the description of mutations at both the sequence and more gross level in the context of genomic databases.

References

The Sequence Ontology: A tool for the unification of genome annotations.
Eilbeck K., Lewis S., Mungall C.J., Yandell M., Stein L., Durbin R., Ashburner M. Genome Biology (2005) 6:R44

Relations in Biomedical Ontologies. Barry Smith, Werner Ceusters, Bert Klagges, Jacob Kohler, Anand Kumar, Jane Lomax, Chris Mungall, Fabian Neuhaus, Alan L Rector, Cornelius Rosse Genome Biology (2005) 6:R46

Sequence Ontology Annotation Guide.
Karen Eilbeck and Suzanna E. Lewis. Comparative and Functional Genomics (2004) 5:642-647


Interested in participating? See the project page for mailing lists, the CVS archive and the names of the developers. To become a SO contributor, just set up an account on SourceForge and drop a note to Karen Eilbeck;