OboInOwl:Main Page

From NCBO Wiki
Jump to navigation Jump to search

OBO in OWL: Mapping and Tools

This wiki is for discussing the mapping between Obo1.2 format and OWL. The first version of this mapping is at http://www.godatabase.org/dev/doc/mapping-obo-to-owl.html. We have finished work on a newer version of this mapping. This page provides a brief background for the effort and provides links to the relevant tools (plugins for OBOEdit and Protege) that implement the mapping.

The Gene Ontology and a significant number of biomedical ontologies are in the OBO-format. The OBO format, which originated along with the Gene Ontology, has evolved to support the needs of the biomedical ontologies that fall under the Open Biomedical Ontologies (OBO) umbrella. The OBO-format aims to have 1) human readability, 2) ease of parsing, 3) extensibility and 4) minimal redundancy. The OBO-format currently forms the backbone of most GO based annotation and data analysis tools.

In parallel with the developments in bio-ontologies, ontologies in general have become more prevalent in information technology; with the most visible push coming from the W3C in the form of the W3C recommendation of the Web Ontology Language (OWL) as an international standard for ontologies on the web. There has also been a corresponding increase in the number, diversity and quality of the tools available to construct, maintain and view ontologies in OWL.

As bio-ontologies become more popular and grow in size as well as complexity, they are becoming the focus of attention of the larger computer science research community. On one hand there is significant interest in using the life sciences domain as a “focus” for W3C semantic web activity. In this light, biological data annotated using OBO ontologies is a prime resource and there is great interest from the Semantic Web community to access the ontologies and the annotated data in OWL format. On the other hand, if bio-ontologies are to benefit from the rapid progress being made in computer science – especially the semantic web technologies – bio-ontologies need to interoperate with other ontologies which are in the OWL format. The relatively newer biomedical ontologies (such as BioPAX) are already in OWL. The NCI-thesaurus, being developed by the National Cancer Institute, is also in OWL.

As a result, there is a strong need to map the OBO-format to OWL and provide tools that enable the end user to perform the translation at the click of a button in a stable ontology editing environment without worrying about underlying formats.

Mail Lists

We have created the OBO to OWL mapping remaining faithful to the (declared) semantics of the OBO format. At places where we found the format to be vague, we have tightened the semantics and have update the documentation accordingly. We make the mapping tools available for other researchers to use and evaluate the mapping. Please feel free to contact us on these mailing lists if you find anything lacking, have suggestions or have any kind of feedback.

Obo Format List

Also of interest: Obo Cross-Product List

The Mapping

We have made the mapping available as an online Google spreadsheet. You can view the sheet at http://spreadsheets.google.com/ccc?key=pWN_4sBrd9l1Umn1LN8WuQQ. If you want edits rights (or already have them) to leave comments then use the following link http://spreadsheets.google.com/ccc?key=o06770842196506107736.4732937099693365844

Tools for the mapping

Protege plugins

OBO Converter Protege tab:

The OBO Converter is a Tab plugin for Protégé to convert OBO format files into OWL files and vice-versa (keeping in mind that OWL to OBO conversions can lose information if one encodes things in OWL that cannot be expressed in OBO). It is also developed in a manner such that it can also work as a standalone conversion program.

Tab User Guide

The OBO Converter Tab basically reads OBO files (OBO 1.0 and 1.2) into Protégé OWL projects and saves those projects back as OBO 1.0 files. The Tab has two main panels, one to read OBO files and one to write (save) them.

The save operation is straightforward as the user chooses the file name and the conversion is done. The read operation has the same functionality plus a set of options that can alter the way an OBO file is read (see figure).

OBOConverter.jpg


Read 0ptions:

1 Class name generation A Combo box allows users to choose the way the OWL class names will be generated from the OBO format file Terms. There are 3 options:

  • OBO id will generate the name from the OBO term id that is the option that generates the OWL id in the way described in the mapping.
  • Class name will generate the name from the OBO term name, be careful as those names may not be unique. If the names are not unique, there will be a parser error.
  • Class name + OBO id will generate the name from the combination of the OBO term name and id.

In all cases, characters other than letters (a-z, A-Z) or numbers will be converted to underscore characters (_). Ex: the OBO term name “nurse cell” will be converted to the OWL class name nurse_cell. The default behavior of the Tab is to generate the OWL id in the way described in the mapping and uses the OBO id to generate class names. If the user wants to see the OBO name of the classes, instead of their meaningless ids, Protégé has an option to display the OWL class label as the class identifier. As OWL labels can have language identifiers (such as en for English), converted OBO ontologies can now have names in different languages all pointing to the same entities. The other options are targeted to users that want to create their ontology using a specific OBO ontology as a start point, but need to name their entities in a different way. Those users are not interested in maintaining naming compatibility with OBO.

2 Exclusion of namespaces

In OBO format, it is possible to define more than one ontology in each file using the OBO namespace: tag (Note that OBO namespaces have no connection with OWL namespaces). For instance, the Gene Ontology is actually composed by three independent ontologies defined in the same file (gene_ontology.obo) using three different namespaces (biological_process, molecular_funtion and cellular_component). When the OBO Converter Tab reads such types of file, it collapses all ontologies in one and namespace information is stored as annotation, which may not be the intention of the user. The OBO Converter Tab has a panel where the user can specify which namespaces he does not want to be read from a file. Example: In the gene_ontology.obo file, if the namespaces biological_process and molecular_function are not read, only the cellular_component ontology will be read.

3 Default namespace URI

It is the author responsibility to set the default namespace URI for the OWL ontology. That information is not available in the OBO format file. The OBO Converter Tab has a panel where the author can enter this URI. If it does not enter one, whatever default URI Protégé is using becomes the default namespace.

Downloads:

 Source code for OBO Converter Tab
 Binaries for OBO Converter Tab
 Page at Protege Site


OBO Explorer Protege tab:

The OWL format for OBO files uses anonymous nodes to represent definitions, synonyms, and DbxRefs, and the generic Protégé GUI components are not immediately suitable to display and edit them. The existing graphical components also do not allow the user to easily access or edit the lexical information associated with an OBO term. The OBO Explorer tab allows the user to do so in an interface that is similar to that of OBO-edit. This provides the user with the flexibility to edit these lexical features (such as synonyms and dbxrefs) in an intuitive manner.

 OBO Explorer Tab

OBOExplorer.jpg

OboEdit OWL plugin

We have also added the functionality to save an OBO format file as an OWL file from within OBOEdit. We have a development version of the OWL Export/Import plugin for OboEdit available. Just download the distribution file, unzip it and copy its content to <OboEdit>/extensions folder. Start OboEdit and you should find the option "OWL Adapter" for loading ontologies, File->Load Terms..., and for saving, File->Save as...

OboEdit OWL plugin.

If you have problems with big ontologies, try to increase the size of the memory available to OboEdit.

obo2owl xslt

The oboxml_to_owl.xsl mapping now implements the OboInOwl mapping for obo-xml:

 XSLT

This will be bundled with go-perl-0.07 and higher

Sample ontologies exported from obo to owl are available here

URIs

Added a separate page on mapping OBO IDs to URIs:

OboInOwl:URIs

Overview of Other Mapping efforts

We are aware that there are several other groups that have created an OBO to OWL mapping to address immediate needs of their research groups. We have compiled a summary of the various OBO to OWL conversion efforts that we are aware of. Email me (nigam .AT. stanford.edu) with additions/deletions as you come across them.

 Spread sheet comparing the mappings

Progress Notes

I've overhauled the obo2owl mapping. I've pretty much followed Alan's recommendations (I made a lot of purely internal changes to the xslt too though which should make it much clearer). Hope these work for you Stuart. Sorry about the churn - but this will definitely be worth it in the end.

Example OWL file can be found here (also attached):

http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/examples/gotest.owl

(note that this example includes a cross-product example)

The OWL is generated from either of the following:

http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/examples/gotest.obo http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/examples/gotest.obo-xml

The XSL can be found here:

http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/xsl/obo2owl.xsl http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/xsl/obo2owl_obo_in_owl_metamodel.xsl

The XSL actually serves as fairly reasonable documentation about what's going on - but we'll also come up with a friendlier description once it's finalised

You can convert the obo-xml directly with the xslt. If you want to convert from obo you'll need the latest version of go-perl (from cvs)

Here are the changes and things still pending:

      Adopted Alan Ruttenberg's metamodel changes (see obo-format list)
      split into 2 separate xsl files
      subset (ontology views) now more consistent with obo
      * the oboInOwl class is SubsetDef
      * this does not appear in the owl:Ontology section, it stands alone
        (subsets can be used across ontologies)
      namespace changes -
      * the metamodel is now called oboInOwl
        (the format is owned by GO, so this maps to a GO URI)
      * the default ontology content namespace is now bioont
        (the URI for this will be some bioontologies.org URI)
      * slashes not hashes or underscores
      - example: rdf:about="oboContent/GO/0000001"
      fixed rdf:about/resource/ID issues
      - ID is never used
      - about and resource now used in correct places
      CHECKED
      - validates as DL in http://phoebus.cs.man.ac.uk:9999/OWL/Validator
      - works in SWOOP
      - works in Protege-OWL (but looks odd)
      TODO
      do we need an equivalentClass for intersectionOf?
      SWOOP saves this without
      decide on final URI scheme
      - Can we make the URIs less verbose? Use entities - or is this frowned on?
      new obo tags for obsoletion
      handling obsoletes
      decide on whether the oboInOwl metamodel should be exported as
      part of the content export, or linked to separately;
      and if linked to separately, do we need an owl:imports?