LexGrid 2008 and OWL
General notes:
- We agreed that a OWL-LexGrid-OWL round-trip is not required for the BioPortal purposes. This decision simplifies some of the handling of imported ontologies.
- We understand that while the model seems to account for many OWL features, some critical ones have not been validated yet. We see validation as using either the API or the UI (not necessarily both) to access the stored information. Some of the information has not been stored at all yet. We all agree that validation through the API access (and, preferably, UI) of all features of the model is required to ensure that the information is represented faithfully.
Major features that the model does not seem to handle yet
Namespaces versus imported (composite) ontologies
This problem is probably the most serious one and the hardest to address in the current model and has two parts: The model does not make a distinction between a namespace as a unit and an ontology as a unit. In reality, there is no one-to-one correspondence between an ontology and a namespace. The model assumes a uniqueness of a namespace prefix throughout all the coding schemes and uses the namespace prefix as a coding scheme id. This approach will fail in BioPortal, as many ontologies import the same ontologies (but, perhaps, different versions of them) with the same namespace prefix. For example, BIRNLex and OBI import some of the same ontologies. It seems that the current model will not be able to handle such case. Example: ontology 1:
... xml:p1="http://www.owl-ontologies.com/Ontology1222730448.owl"> <owl:Ontology rdf:about="http://www.owl-ontologies.com/ontology1.owl"> <owl:imports rdf:resource="http://www.owl-ontologies.com/Ontology1222730448.owl"/> </owl:Ontology> ...
ontology 2:
... xml:p1="http://www.owl-ontologies.com/Ontologyabc.owl"> <owl:Ontology rdf:about="http://www.owl-ontologies.com/ontology2.owl"> <owl:imports rdf:resource="http://www.owl-ontologies.com/Ontologyabc.owl"/> </owl:Ontology> ...
Or the same example without the imports
--Hsolbrig 09:38, 15 October 2008 (PDT)
Namespace as a unit and ontology as a unit
Namespace prefix issues
LexGrid supports a local name to URI mapping mechanism that is similar to that used in XML. Using the example above, the resulting codingScheme declaration would be:
<codingSchemes dc="codingSchemes"> <codingScheme codingScheme="ontology1" formalName="Ontology One" registeredName="http://www.owl-ontologies.com/ontology1.owl" defaultLanguage="en" representsVersion="UNK"> <localName>ontology1</localName> <mappings dc="mappings"> <supportedCodingScheme isImported="true" localId="p1" urn="http://www.owl-ontologies.com/Ontology1222730448.owl"/> <supportedCodingScheme isImported="false" localId="ontology1" urn="http://www.owl-ontologies.com/ontology1.owl"/> </mappings> </codingScheme> <codingScheme codingScheme="ontology2" formalName="Ontology Two" registeredName="http://www.owl-ontologies.com/ontology2.owl" defaultLanguage="en" representsVersion="UNK"> <localName>ontology2</localName> <mappings dc="mappings"> <supportedCodingScheme isImported="true" localId="p1" urn="http://www.owl-ontologies.com/Ontologyabc.owl"/> <supportedCodingScheme isImported="false" localId="ontology2" urn="http://www.owl-ontologies.com/ontology2.owl"/> </mappings> </codingScheme> </codingSchemes>
The local identifiers only apply within the scope of the containing coding scheme. The identifier, "p1", has no significance outside of the scope of the containing coding scheme, and needs to be translated into the corresponding URI for external reference. The same approach applies to all local identifiers as well. As an example:
<codingScheme codingScheme="ontology1" formalName="Ontology One" registeredName="http://www.owl-ontologies.com/ontology1.owl" defaultLanguage="en" representsVersion="UNK"> <localName>ontology1</localName> <mappings dc="mappings"> <supportedLanguage localId="en" urn="urn:oid:2.16.840.1.113883.6.84:en"/> </mappings> </codingScheme> <codingScheme codingScheme="ontology2" formalName="Ontology Two" registeredName="http://www.owl-ontologies.com/ontology2.owl" defaultLanguage="en-US" representsVersion="UNK"> <localName>ontology2</localName> <mappings dc="mappings"> <supportedLanguage localId="en-US" urn="urn:oid:2.16.840.1.113883.6.84:en"/> <supportedLanguage localId="eng" urn="urn:oid:2.16.840.1.113883.6.84:en"/> </mappings> </codingScheme>
ontology1 uses "en" to identify the English language, as identified by IETF 1176. ontology2 uses both "en-US" and "eng" to reference the same thing.
There are a couple of specific issues that need to be addressed when it comes to coding scheme inclusion:
- Two or more ontologies are imported and each ontology refers to the same URI with a different local name.
- Two or more ontologies are imported and each ontology refers to a different URI with the same local name.
Same URI, different local name
Take, as an example, http://wiktology.com/rdf/2008/PR-loon-guide/BadFood#, which references the Food Ontology as "feed", and http://wiktology.com/rdf/2008/PR-loon-guide/GoodFood#, which references the same ontology as "fud". A question arises about how these resources should be referenced when they are imported by a third ontology (e.g. http://wiktology.com/rdf/2008/PR-loon-guide/AllFood#).
As near as we can determine, the current version of Protege 4 (Version 4.0 Build 64) doesn't use local identifiers within the editor itself. Internally, local identifiers aren't used, and all references to classes, individuals and axioms are constructed in terms of the URI itself. The Ontology Imports tab appears to use the local identifier from the source ontology. We haven't determined what it does if there isn't a local name in the source. The RDF/XML Rendering tab uses the local name assigned to the resource in the immediately containing ontology.
The equivalent LexGrid rendering *should* follow the same pattern as above, although this still needs to be verified.
Same local name, different URI
The same approach as above, except that http://wiktology.com/rdf/2008/PR-loon-guide/BadBeer# uses the local name "thefood" to refer to http://wiktology.com/rdf/2008/PR-loon-guide/BadFood#, and http://wiktology.com/rdf/2008/PR-loon-guide/GoodBeer# uses the local name "thefood" to refer to http://wiktology.com/rdf/2008/PR-loon-guide/GoodFood#. The results are are the same as above - Protege 4 appears to use the URN internally and resolves the local name when producing rendering.
The equivalent LexGrid rendering *should* follow the same pattern as above, although this still needs to be verified.
Note: We attempted to read the various "loon" files above with Protege 3.3.1, but weren't able to get anything useful to happen.
The equivalent LexGrid rendering of these examples should be:
Inheritance
The issue is not so much with the model but with the need to develop an API that handles inheritance. Currently, the LexGrid API cannot provide any inherited properties or restrictions for a class. There are two possible approaches:
- have LexGrid pre-compute all the inheritance relationships at loading time and store them;
- find the inherited relationships on the fly, when returning informaiton about a class through the LexGrid API.
The first option is likely to increase the required disk space by an order of magnitude (from our experience with NCIT), but will allow using Protege to determine the inheritance relationships. The second option will require non-trivial processing on the API side, as our experience with Protege shows. Determining all the inheritance correctly is harder than it seems. Note that we need inheritance of restrictions, as well as properties for which a class is a domain (see the next item)
--Hsolbrig 09:43, 15 October 2008 (PDT) An additional issue is determining which classification algorithm to use. Not only are there multiple possibilities within the OWL-DL space, but LexGrid can also represent things like the UMLS Semantic Net, which includes "blocking" and other entertaining features. That said, there is an issue that needs to be resolved if the pre-classified approach is to be taken - we need to know which axioms were asserted and which were inferred. This is important both from a browsing and editing perspective as well as necessary for export. The existing OWL specification has no way to differentiate this. Perhaps it would be worthwhile to provide some sort of classifier plug-in that could be invoked at load-time to address this problem.
Properties with no domains and ranges
The LexGrid/OWL specification says:
- (OWLObjectProperty) "An association between two classes (hasDomain, hasRange)."
- (OWLDatatypeProperty) "An association between one class (domain) and one association (hasDomain and hasDataProperty). The conceptProperty defines the range."
However, both datatype properties and object properties can be defined without any domains and ranges (and often are)
--Hsolbrig 09:46, 15 October 2008 (PDT) The existence of domains and ranges or data properties are not a requirement for determining whether a Property is an ObjectProperty or DatatypeProperty - the description in the mapping spreadsheet was misleading.
Multiple domains and ranges
It was not clear from our discussion if the model can handle multiple domains and ranges. Usually the requiried semantics is that of a union. Suppose a property P has two domains, A and B, and two ranges, C and D. When rendering details for the class A, BioPortal will need to display all the properties where A is in the domain, and show all the ranges for those properties. So, in this case, when asked for A's properties, we would expect to get the property P, with two ranges, C and D. Note, that we would also want to get that for any subclass of A (see the inheritance point above).
--Hsolbrig 09:47, 15 October 2008 (PDT) To the best of our knowledge, multiple domains and ranges work correctly. LexGrid doesn't add any semantics beyond what is expressed in the OWL itself. We will provide an example here in a few days.
--Jpathak 08:36, 16 October 2008 (CDT) Here is a small example which has been adopted from the Pizza ontology by introducing a new class Other_Spiciness and adding it to the domain of the object property hasSpiciness. Here is a snippet from the OWL rendering:
<owl:Class rdf:ID="Other_Spiciness"> <owl:equivalentClass> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:ID="Indian"/> <owl:Class rdf:ID="Mexican"/> <owl:Class rdf:ID="Thai"/> ''''Italic text'''' </owl:unionOf> </owl:Class> </owl:equivalentClass> <rdfs:subClassOf> <owl:Class rdf:about="#ValuePartition"/> </rdfs:subClassOf> </owl:Class> <owl:ObjectProperty rdf:about="#hasSpiciness"> <rdfs:comment xml:lang="en">A property created to be used with the ValuePartition - Spiciness.</rdfs:comment> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Spiciness"/> <owl:Class rdf:about="#Other_Spiciness"/> </owl:unionOf> </owl:Class> </rdfs:range> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/> </owl:ObjectProperty>
Then, Other_Spiciness, Mexican, Indian etc. become concepts in our codingScheme, and the domain and range of the property hasSpiciness is represented as follows in LexGrid:
Major features in the model that have not been validated yet
- Instances
- Subproperties
Major features that may already be in the API or the UI, but we did not validate yet
- Distinction between annotation properties and restrictions or properties that have a class in the domain (there are different properties at a class and will need to be returned/rendered differently
- Domains and ranges of properties (in the case where there is one domain and one range)
- Property values (for datatype properties in particular)
--Hsolbrig 09:50, 15 October 2008 (PDT) Item (1) above is of some concern. What we propose doing is to represent AnnotationProperties as "Properties" and DatatypeProperties as "associations". There are apparently some issues with this, however, when it comes to the API and its rendering. At the moment, we reflect all DatatypeProperties as both, which is not a good idea in the long run.
Features that are missing from the model but that are easy to add
- Distinction between necessary and necessary and sufficient conditions for defined classes (an extra qualifier for the restriction)