Difference between revisions of "OboInOwl:Main Page"
(47 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
− | = | + | =Mapping OBO to OWL= |
− | This wiki is for discussing the mapping between Obo1.2 format and OWL | + | This wiki is for discussing the mapping between Obo1.2 format and OWL. The [http://www.godatabase.org/dev/doc/mapping-obo-to-owl.html first version] of this mapping is at http://www.godatabase.org/dev/doc/mapping-obo-to-owl.html. We have finished work on a newer version of this mapping. This page provides a brief background for the effort and provides links to the relevant tools (plugins for OBOEdit and Protege) that implement the mapping. |
− | + | The Gene Ontology and a significant number of biomedical ontologies are in the OBO-format. The OBO format, which originated along with the Gene Ontology, has evolved to support the needs of the biomedical ontologies that fall under the Open Biomedical Ontologies (OBO) umbrella. The OBO-format aims to have 1) human readability, 2) ease of parsing, 3) extensibility and 4) minimal redundancy. The OBO-format currently forms the backbone of most GO based annotation and data analysis tools. | |
− | + | In parallel with the developments in bio-ontologies, ontologies in general have become more prevalent in information technology; with the most visible push coming from the W3C in the form of the W3C recommendation of the Web Ontology Language (OWL) as an international standard for ontologies on the web. There has also been a corresponding increase in the number, diversity and quality of the tools available to construct, maintain and view ontologies in OWL. | |
− | + | As bio-ontologies become more popular and grow in size as well as complexity, they are becoming the focus of attention of the larger computer science research community. On one hand there is significant interest in using the life sciences domain as a “focus” for W3C semantic web activity. In this light, biological data annotated using OBO ontologies is a prime resource and there is great interest from the Semantic Web community to access the ontologies and the annotated data in OWL format. On the other hand, if bio-ontologies are to benefit from the rapid progress being made in computer science – especially the semantic web technologies – bio-ontologies need to interoperate with other ontologies which are in the OWL format. The relatively newer biomedical ontologies (such as BioPAX) are already in OWL. The NCI-thesaurus, being developed by the National Cancer Institute, is also in OWL. | |
+ | |||
+ | As a result, there is a strong need to map the OBO-format to OWL and provide tools that enable the end user to perform the translation at the click of a button in a stable ontology editing environment without worrying about underlying formats. | ||
=Mail Lists= | =Mail Lists= | ||
+ | |||
+ | We have created the OBO to OWL mapping remaining faithful to the (declared) semantics of the OBO format. At places where we found the format to be vague, we have tightened the semantics and have update the documentation accordingly. We make the mapping tools available for other researchers to use and evaluate the mapping. Please feel free to contact us on these mailing lists if you find anything lacking, have suggestions or have any kind of feedback. | ||
[https://lists.sourceforge.net/lists/listinfo/obo-format Obo Format List] | [https://lists.sourceforge.net/lists/listinfo/obo-format Obo Format List] | ||
− | Also of interest: | + | Also of interest: [https://lists.sourceforge.net/lists/listinfo/obo-crossproduct Obo Cross-Product List] |
+ | |||
+ | =The Mapping= | ||
+ | |||
+ | We have made the mapping available as an online Google spreadsheet. You can view the sheet at http://spreadsheets.google.com/ccc?key=pWN_4sBrd9l1Umn1LN8WuQQ. If you want edits rights (or already have them) to leave comments then use the following link http://spreadsheets.google.com/ccc?key=o06770842196506107736.4732937099693365844 | ||
+ | |||
+ | =Tools for the mapping= | ||
+ | |||
+ | == Protege plugins == | ||
+ | |||
− | + | == OBO Converter Protege tab: == | |
+ | |||
+ | The OBO Converter is a Tab plugin for Protégé to convert OBO format files into OWL files and vice-versa (keeping in mind that OWL to OBO conversions can lose information if one encodes things in OWL that cannot be expressed in OBO). It is also developed in a manner such that it can also work as a standalone conversion program. | ||
− | =Progress= | + | '''Tab User Guide''' |
+ | |||
+ | The OBO Converter Tab basically reads OBO files (OBO 1.0 and 1.2) into Protégé OWL projects and saves those projects back as OBO 1.0 files. The Tab has two main panels, one to read OBO files and one to write (save) them. | ||
+ | |||
+ | The save operation is straightforward as the user chooses the file name and the conversion is done. The read operation has the same functionality plus a set of options that can alter the way an OBO file is read (see figure). | ||
+ | |||
+ | [[Image:OBOConverter.jpg]] | ||
+ | |||
+ | |||
+ | Read Options: | ||
+ | |||
+ | '''1 Class name generation''' | ||
+ | |||
+ | A Combo box allows users to choose the way the OWL class names will be generated from the OBO format file Terms. There are 3 options: | ||
+ | |||
+ | * OBO id will generate the name from the OBO term id that is the option that generates the OWL id in the way described in the mapping. | ||
+ | * Class name will generate the name from the OBO term name, be careful as those names may not be unique. If the names are not unique, there will be a parser error. | ||
+ | * Class name + OBO id will generate the name from the combination of the OBO term name and id. | ||
+ | |||
+ | In all cases, characters other than letters (a-z, A-Z) or numbers will be converted to underscore characters (_). Ex: the OBO term name “nurse cell” will be converted to the OWL class name nurse_cell. | ||
+ | |||
+ | The default behavior of the Tab is to generate the OWL id in the way described in the mapping and uses the OBO id to generate class names. If the user wants to see the OBO name of the classes, instead of their meaningless ids, Protégé has an option to display the OWL class label as the class identifier. As OWL labels can have language identifiers (such as en for English), converted OBO ontologies can now have names in different languages all pointing to the same entities. | ||
+ | |||
+ | The other options are targeted to users that want to create their ontology using a specific OBO ontology as a start point, but need to name their entities in a different way. Those users are not interested in maintaining naming compatibility with OBO. | ||
+ | |||
+ | '''2 Exclusion of namespaces''' | ||
+ | |||
+ | In OBO format, it is possible to define more than one ontology in each file using the OBO namespace: tag (Note that OBO namespaces have no connection with OWL namespaces). For instance, the Gene Ontology is actually composed by three independent ontologies defined in the same file (gene_ontology.obo) using three different namespaces (''biological_process'', ''molecular_funtion'' and ''cellular_component''). When the OBO Converter Tab reads such file types, it collapses all ontologies in one and namespace information is stored as annotation, which may not be the intention of the user. The OBO Converter Tab has a panel where the user can specify which namespaces he does not want to be read from a file. | ||
+ | |||
+ | Example: In the gene_ontology.obo file, if the namespaces ''biological_process'' and ''molecular_function'' are selected (for not being read), only the ''cellular_component'' ontology will be read. | ||
+ | |||
+ | '''3 Default namespace URI''' | ||
+ | |||
+ | It is the author responsibility to set the default namespace URI for the OWL ontology. That information is not available in the OBO format file. The OBO Converter Tab has a panel where the author can enter this URI. If it does not enter one, whatever default URI Protégé is using becomes the default namespace URI for the OWL ontology. | ||
+ | |||
+ | Downloads: | ||
+ | |||
+ | [http://smi-protege.stanford.edu/svn/obo-converter/trunk/ Source code for OBO Converter Tab] | ||
+ | [http://bioontology.org/tools/oboinowl/OBOConverter.zip Binaries for OBO Converter Tab] | ||
+ | [http://protege.cim3.net/cgi-bin/wiki.pl?OBOConverter Page at Protege Site] | ||
+ | |||
+ | == OBO Explorer Protege tab: == | ||
+ | |||
+ | |||
+ | The OWL format for OBO files uses anonymous nodes to represent definitions, synonyms, and DbxRefs, and the generic Protégé GUI components are not immediately suitable to display and edit them. The existing graphical components also do not allow the user to easily access or edit the lexical information associated with an OBO term. The OBO Explorer tab allows the user to do so in an interface that is similar to that of OBO-edit. This provides the user with the flexibility to edit these lexical features (such as synonyms and dbxrefs) in an intuitive manner. | ||
+ | |||
+ | [http://www.aiai.ed.ac.uk/project/cobra-ct/COBrA_downloads.htm OBO Explorer Tab] | ||
+ | |||
+ | [[Image:OBOExplorer.jpg]] | ||
+ | |||
+ | == OboEdit OWL plugin == | ||
+ | We have also added the functionality to save an OBO format file as an OWL file from within OBOEdit. We have a development version of the OWL Export/Import plugin for OboEdit available. Just download the distribution file, unzip it and copy its content to <OboEdit>/extensions folder. Start OboEdit and you should find the option "OWL Adapter" for loading ontologies, File->Load Terms..., and for saving, File->Save as... | ||
+ | |||
+ | [http://smi-protege.stanford.edu/~nigam/OboEditPlugin.zip OboEdit OWL plugin]. | ||
+ | |||
+ | If you have problems with big ontologies, try to increase the size of the memory available to OboEdit. | ||
+ | |||
+ | == obo2owl xslt == | ||
+ | |||
+ | The oboxml_to_owl.xsl mapping now implements the OboInOwl mapping for obo-xml: | ||
+ | |||
+ | [http://geneontology.cvs.sourceforge.net/geneontology/go-dev/xml/xsl/oboxml_to_owl.xsl?view=log XSLT] | ||
+ | |||
+ | This will be bundled with [http://search.cpan.org/~cmungall/go-perl/ go-perl-0.07] and higher | ||
+ | |||
+ | Sample ontologies exported from obo to owl are available [http://www.berkeleybop.org/ontologies here] | ||
+ | |||
+ | == obo2owl and owl2obo using PERL == | ||
+ | |||
+ | The '''onto-perl''' package implements the '''OboInOwl''' mapping. In order to run the PERL conversion scripts, first install the latest [http://search.cpan.org/~easr/ onto-perl] distribution like any other CPAN module (those modules deal with the '''OBO''' and '''OWL''' ontology parsing tasks among other things). Then, in order to convert from '''OBO to OWL''', execute the '''obo2owl.pl''' script by providing the '''input OBO''' file and, alternatively, specifying the '''output''' file location (by default the standard output) like in the following line: | ||
+ | |||
+ | <code>$./obo2owl.pl my_ontology.obo > my_ontology.owl</code> | ||
+ | |||
+ | This conversion script has been tested with some ontologies such as [http://www.cellcycleontology.org CCO] and [http://www.geneontology.org/ GO]. Moreover, the produced '''OWL''' files were successfully loaded into '''Protege'''. Besides, these OWL files were checked with [http://projects.semwebcentral.org/projects/vowlidator/ vowlidator] and [http://owl-eclipse.projects.semwebcentral.org/ SWeDE]. | ||
+ | |||
+ | Finally, to convert back from '''OWL to OBO''', use the '''owl2obo.pl''' conversion script by providing the '''input OWL''' file. This script completes the conversion circuit ('''obo2owl <=> owl2obo'''): | ||
+ | |||
+ | <code>$./owl2obo.pl my_ontology.owl > my_ontology.obo</code> | ||
+ | |||
+ | '''N.B.:''' Input OBO files should follow the OBO spec [http://www.geneontology.org/GO.format.obo-1_0.shtml 1.0] or [http://www.geneontology.org/GO.format.obo-1_2.shtml 1.2]. Output OBO files follow the spec '''1.2'''. | ||
+ | |||
+ | [[Image:obo2owl.png|center]] | ||
+ | |||
+ | =URIs= | ||
+ | |||
+ | Added a separate page on mapping OBO IDs to URIs: | ||
+ | |||
+ | [[OboInOwl:URIs]] | ||
+ | |||
+ | =Overview of Other Mapping efforts= | ||
+ | |||
+ | |||
+ | We are aware that there are several other groups that have created an OBO to OWL mapping to address immediate needs of their research groups. We have compiled a summary of the various OBO to OWL conversion efforts that we are aware of. Email me (nigam .AT. stanford.edu) with additions/deletions as you come across them. | ||
+ | |||
+ | [http://spreadsheets.google.com/ccc?key=pWN_4sBrd9l1Umn1LN8WuQQ Spread sheet comparing the mappings] | ||
+ | |||
+ | See also: Christine Golbreich and Ian Horrocks, "The OBO to OWL mapping, GO to OWL 1.1!", OWLED 2007 [http://www.med.univ-rennes1.fr/lim/doc_175.pdf] | ||
+ | |||
+ | =Progress Notes= | ||
I've overhauled the obo2owl mapping. I've pretty much followed Alan's | I've overhauled the obo2owl mapping. I've pretty much followed Alan's | ||
Line 87: | Line 200: | ||
part of the content export, or linked to separately; | part of the content export, or linked to separately; | ||
and if linked to separately, do we need an owl:imports? | and if linked to separately, do we need an owl:imports? | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 10:47, 16 September 2008
Mapping OBO to OWL
This wiki is for discussing the mapping between Obo1.2 format and OWL. The first version of this mapping is at http://www.godatabase.org/dev/doc/mapping-obo-to-owl.html. We have finished work on a newer version of this mapping. This page provides a brief background for the effort and provides links to the relevant tools (plugins for OBOEdit and Protege) that implement the mapping.
The Gene Ontology and a significant number of biomedical ontologies are in the OBO-format. The OBO format, which originated along with the Gene Ontology, has evolved to support the needs of the biomedical ontologies that fall under the Open Biomedical Ontologies (OBO) umbrella. The OBO-format aims to have 1) human readability, 2) ease of parsing, 3) extensibility and 4) minimal redundancy. The OBO-format currently forms the backbone of most GO based annotation and data analysis tools.
In parallel with the developments in bio-ontologies, ontologies in general have become more prevalent in information technology; with the most visible push coming from the W3C in the form of the W3C recommendation of the Web Ontology Language (OWL) as an international standard for ontologies on the web. There has also been a corresponding increase in the number, diversity and quality of the tools available to construct, maintain and view ontologies in OWL.
As bio-ontologies become more popular and grow in size as well as complexity, they are becoming the focus of attention of the larger computer science research community. On one hand there is significant interest in using the life sciences domain as a “focus” for W3C semantic web activity. In this light, biological data annotated using OBO ontologies is a prime resource and there is great interest from the Semantic Web community to access the ontologies and the annotated data in OWL format. On the other hand, if bio-ontologies are to benefit from the rapid progress being made in computer science – especially the semantic web technologies – bio-ontologies need to interoperate with other ontologies which are in the OWL format. The relatively newer biomedical ontologies (such as BioPAX) are already in OWL. The NCI-thesaurus, being developed by the National Cancer Institute, is also in OWL.
As a result, there is a strong need to map the OBO-format to OWL and provide tools that enable the end user to perform the translation at the click of a button in a stable ontology editing environment without worrying about underlying formats.
Mail Lists
We have created the OBO to OWL mapping remaining faithful to the (declared) semantics of the OBO format. At places where we found the format to be vague, we have tightened the semantics and have update the documentation accordingly. We make the mapping tools available for other researchers to use and evaluate the mapping. Please feel free to contact us on these mailing lists if you find anything lacking, have suggestions or have any kind of feedback.
Also of interest: Obo Cross-Product List
The Mapping
We have made the mapping available as an online Google spreadsheet. You can view the sheet at http://spreadsheets.google.com/ccc?key=pWN_4sBrd9l1Umn1LN8WuQQ. If you want edits rights (or already have them) to leave comments then use the following link http://spreadsheets.google.com/ccc?key=o06770842196506107736.4732937099693365844
Tools for the mapping
Protege plugins
OBO Converter Protege tab:
The OBO Converter is a Tab plugin for Protégé to convert OBO format files into OWL files and vice-versa (keeping in mind that OWL to OBO conversions can lose information if one encodes things in OWL that cannot be expressed in OBO). It is also developed in a manner such that it can also work as a standalone conversion program.
Tab User Guide
The OBO Converter Tab basically reads OBO files (OBO 1.0 and 1.2) into Protégé OWL projects and saves those projects back as OBO 1.0 files. The Tab has two main panels, one to read OBO files and one to write (save) them.
The save operation is straightforward as the user chooses the file name and the conversion is done. The read operation has the same functionality plus a set of options that can alter the way an OBO file is read (see figure).
Read Options:
1 Class name generation
A Combo box allows users to choose the way the OWL class names will be generated from the OBO format file Terms. There are 3 options:
- OBO id will generate the name from the OBO term id that is the option that generates the OWL id in the way described in the mapping.
- Class name will generate the name from the OBO term name, be careful as those names may not be unique. If the names are not unique, there will be a parser error.
- Class name + OBO id will generate the name from the combination of the OBO term name and id.
In all cases, characters other than letters (a-z, A-Z) or numbers will be converted to underscore characters (_). Ex: the OBO term name “nurse cell” will be converted to the OWL class name nurse_cell.
The default behavior of the Tab is to generate the OWL id in the way described in the mapping and uses the OBO id to generate class names. If the user wants to see the OBO name of the classes, instead of their meaningless ids, Protégé has an option to display the OWL class label as the class identifier. As OWL labels can have language identifiers (such as en for English), converted OBO ontologies can now have names in different languages all pointing to the same entities.
The other options are targeted to users that want to create their ontology using a specific OBO ontology as a start point, but need to name their entities in a different way. Those users are not interested in maintaining naming compatibility with OBO.
2 Exclusion of namespaces
In OBO format, it is possible to define more than one ontology in each file using the OBO namespace: tag (Note that OBO namespaces have no connection with OWL namespaces). For instance, the Gene Ontology is actually composed by three independent ontologies defined in the same file (gene_ontology.obo) using three different namespaces (biological_process, molecular_funtion and cellular_component). When the OBO Converter Tab reads such file types, it collapses all ontologies in one and namespace information is stored as annotation, which may not be the intention of the user. The OBO Converter Tab has a panel where the user can specify which namespaces he does not want to be read from a file.
Example: In the gene_ontology.obo file, if the namespaces biological_process and molecular_function are selected (for not being read), only the cellular_component ontology will be read.
3 Default namespace URI
It is the author responsibility to set the default namespace URI for the OWL ontology. That information is not available in the OBO format file. The OBO Converter Tab has a panel where the author can enter this URI. If it does not enter one, whatever default URI Protégé is using becomes the default namespace URI for the OWL ontology.
Downloads:
Source code for OBO Converter Tab Binaries for OBO Converter Tab Page at Protege Site
OBO Explorer Protege tab:
The OWL format for OBO files uses anonymous nodes to represent definitions, synonyms, and DbxRefs, and the generic Protégé GUI components are not immediately suitable to display and edit them. The existing graphical components also do not allow the user to easily access or edit the lexical information associated with an OBO term. The OBO Explorer tab allows the user to do so in an interface that is similar to that of OBO-edit. This provides the user with the flexibility to edit these lexical features (such as synonyms and dbxrefs) in an intuitive manner.
OBO Explorer Tab
OboEdit OWL plugin
We have also added the functionality to save an OBO format file as an OWL file from within OBOEdit. We have a development version of the OWL Export/Import plugin for OboEdit available. Just download the distribution file, unzip it and copy its content to <OboEdit>/extensions folder. Start OboEdit and you should find the option "OWL Adapter" for loading ontologies, File->Load Terms..., and for saving, File->Save as...
OboEdit OWL plugin.
If you have problems with big ontologies, try to increase the size of the memory available to OboEdit.
obo2owl xslt
The oboxml_to_owl.xsl mapping now implements the OboInOwl mapping for obo-xml:
XSLT
This will be bundled with go-perl-0.07 and higher
Sample ontologies exported from obo to owl are available here
obo2owl and owl2obo using PERL
The onto-perl package implements the OboInOwl mapping. In order to run the PERL conversion scripts, first install the latest onto-perl distribution like any other CPAN module (those modules deal with the OBO and OWL ontology parsing tasks among other things). Then, in order to convert from OBO to OWL, execute the obo2owl.pl script by providing the input OBO file and, alternatively, specifying the output file location (by default the standard output) like in the following line:
$./obo2owl.pl my_ontology.obo > my_ontology.owl
This conversion script has been tested with some ontologies such as CCO and GO. Moreover, the produced OWL files were successfully loaded into Protege. Besides, these OWL files were checked with vowlidator and SWeDE.
Finally, to convert back from OWL to OBO, use the owl2obo.pl conversion script by providing the input OWL file. This script completes the conversion circuit (obo2owl <=> owl2obo):
$./owl2obo.pl my_ontology.owl > my_ontology.obo
N.B.: Input OBO files should follow the OBO spec 1.0 or 1.2. Output OBO files follow the spec 1.2.
URIs
Added a separate page on mapping OBO IDs to URIs:
Overview of Other Mapping efforts
We are aware that there are several other groups that have created an OBO to OWL mapping to address immediate needs of their research groups. We have compiled a summary of the various OBO to OWL conversion efforts that we are aware of. Email me (nigam .AT. stanford.edu) with additions/deletions as you come across them.
Spread sheet comparing the mappings
See also: Christine Golbreich and Ian Horrocks, "The OBO to OWL mapping, GO to OWL 1.1!", OWLED 2007 [1]
Progress Notes
I've overhauled the obo2owl mapping. I've pretty much followed Alan's recommendations (I made a lot of purely internal changes to the xslt too though which should make it much clearer). Hope these work for you Stuart. Sorry about the churn - but this will definitely be worth it in the end.
Example OWL file can be found here (also attached):
http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/examples/gotest.owl
(note that this example includes a cross-product example)
The OWL is generated from either of the following:
http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/examples/gotest.obo http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/examples/gotest.obo-xml
The XSL can be found here:
http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/xsl/obo2owl.xsl http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/xml/xsl/obo2owl_obo_in_owl_metamodel.xsl
The XSL actually serves as fairly reasonable documentation about what's going on - but we'll also come up with a friendlier description once it's finalised
You can convert the obo-xml directly with the xslt. If you want to convert from obo you'll need the latest version of go-perl (from cvs)
Here are the changes and things still pending:
Adopted Alan Ruttenberg's metamodel changes (see obo-format list)
split into 2 separate xsl files
subset (ontology views) now more consistent with obo * the oboInOwl class is SubsetDef * this does not appear in the owl:Ontology section, it stands alone (subsets can be used across ontologies)
namespace changes - * the metamodel is now called oboInOwl (the format is owned by GO, so this maps to a GO URI) * the default ontology content namespace is now bioont (the URI for this will be some bioontologies.org URI) * slashes not hashes or underscores - example: rdf:about="oboContent/GO/0000001"
fixed rdf:about/resource/ID issues - ID is never used - about and resource now used in correct places
CHECKED
- validates as DL in http://phoebus.cs.man.ac.uk:9999/OWL/Validator - works in SWOOP - works in Protege-OWL (but looks odd)
TODO
do we need an equivalentClass for intersectionOf? SWOOP saves this without
decide on final URI scheme - Can we make the URIs less verbose? Use entities - or is this frowned on?
new obo tags for obsoletion handling obsoletes
decide on whether the oboInOwl metamodel should be exported as part of the content export, or linked to separately; and if linked to separately, do we need an owl:imports?