Difference between revisions of "LexGrid 2008 and OWL"

From NCBO Wiki
Jump to navigation Jump to search
Line 52: Line 52:
 
The LexGrid model was originally designed with the notion that the lexical characteristics of a class, property or individual would be asserted within the same resource that provides the namespace for the resource itself.  The model did, however, allow a resource to make both ObjectProperty and DataProperty assertions about ''any'' resource, independent of where it was declared.
 
The LexGrid model was originally designed with the notion that the lexical characteristics of a class, property or individual would be asserted within the same resource that provides the namespace for the resource itself.  The model did, however, allow a resource to make both ObjectProperty and DataProperty assertions about ''any'' resource, independent of where it was declared.
  
The latest LexGrid model added the notion of ''Entity.codingSchemeId'', which partially addresses requirement 1 above.  It still assumes that the URI can be partitioned into a namespace and an entry code, which may not always be the case.  We believe, however, that completely opaque resource identifier is something that we may declare to be  
+
The latest LexGrid model added the notion of ''Entity.codingSchemeId'', which partially addresses requirement 1 above.  It still assumes that the URI can be partitioned into a namespace and an entry code, which may not always be the case.  We believe, however, that completely opaque resource identifier is something that we may declare to be out of scope.
  
 
The treatment of the semantics of CodingScheme inclusion is inconsistent, however.
 
The treatment of the semantics of CodingScheme inclusion is inconsistent, however.

Revision as of 11:16, 20 October 2008

General notes:

  • We agreed that a OWL-LexGrid-OWL round-trip is not required for the BioPortal purposes. This decision simplifies some of the handling of imported ontologies.
  • We understand that while the model seems to account for many OWL features, some critical ones have not been validated yet. We see validation as using either the API or the UI (not necessarily both) to access the stored information. Some of the information has not been stored at all yet. We all agree that validation through the API access (and, preferably, UI) of all features of the model is required to ensure that the information is represented faithfully.


Major features that the model does not seem to handle yet

Namespaces versus imported (composite) ontologies

This problem is probably the most serious one and the hardest to address in the current model and has two parts: The model does not make a distinction between a namespace as a unit and an ontology as a unit. In reality, there is no one-to-one correspondence between an ontology and a namespace. The model assumes a uniqueness of a namespace prefix throughout all the coding schemes and uses the namespace prefix as a coding scheme id. This approach will fail in BioPortal, as many ontologies import the same ontologies (but, perhaps, different versions of them) with the same namespace prefix. For example, BIRNLex and OBI import some of the same ontologies. It seems that the current model will not be able to handle such case. Example: ontology 1:

 ...
 xml:p1="http://www.owl-ontologies.com/Ontology1222730448.owl">
 <owl:Ontology rdf:about="http://www.owl-ontologies.com/ontology1.owl">
   <owl:imports rdf:resource="http://www.owl-ontologies.com/Ontology1222730448.owl"/>
 </owl:Ontology>
 ...

ontology 2:

 ...
 xml:p1="http://www.owl-ontologies.com/Ontologyabc.owl">
 <owl:Ontology rdf:about="http://www.owl-ontologies.com/ontology2.owl">
   <owl:imports rdf:resource="http://www.owl-ontologies.com/Ontologyabc.owl"/>
 </owl:Ontology>
 ...

Or the same example without the imports

--Hsolbrig 09:38, 15 October 2008 (PDT)

Namespace as a unit and ontology as a unit

Namespaces from the OWL 1.1 perspective

The OWL Ontology model (at least 1.1) has at least three namespaces associated with a given class, property or instance:

  1. The namespace of the entry identifier itself. The only official requirement for this is that it has to be a URI. Examples of valid URI's include:
  2. The ontology(s) that make assertions about the resource.
  3. The ontology(s) that import the resource - note that "import" is orthogonal to (2) above. An ontology can make assertions about any resource, whether or not it imports other resources.


Basic requirements

We can tease at least three requirements out of these namespaces:

  1. Requirement 1: - a resource URI must be independent of the ontology(s) making assertions about the resource.
  2. Requirement 2: - we need to be able to list all of the axioms that are known (about a given resource) from the perspective of a given ontology.
  3. Requirement 3: - we need to be able to list all of the axioms that are directly asserted (about a given resource) by a given ontology.

LexGrid importing today

The LexGrid model was originally designed with the notion that the lexical characteristics of a class, property or individual would be asserted within the same resource that provides the namespace for the resource itself. The model did, however, allow a resource to make both ObjectProperty and DataProperty assertions about any resource, independent of where it was declared.

The latest LexGrid model added the notion of Entity.codingSchemeId, which partially addresses requirement 1 above. It still assumes that the URI can be partitioned into a namespace and an entry code, which may not always be the case. We believe, however, that completely opaque resource identifier is something that we may declare to be out of scope.

The treatment of the semantics of CodingScheme inclusion is inconsistent, however.

At the moment, there are two different LexGrid Importers - an official one built into the LexBIG suite and second one developed by Apelon for the UN Food and Agriculture Organization (FAO). The LexBIG importer currently renders every assertion and resource in a given ontology as a part of that ontology in a single coding scheme, which addresses Requirements 1 and 2, but ignores Requirement 3.

The importer that was written by Apelon only renders the assertions that are directly made about any resource, which addresses Requirement 3, but leaves Requirement 2 (partially) unaddressed.

Namespace prefix issues

LexGrid supports a local name to URI mapping mechanism that is similar to that used in XML. Using the example above, the resulting codingScheme declaration would be:

<codingSchemes dc="codingSchemes">
                <codingScheme  codingScheme="ontology1" formalName="Ontology One" registeredName="http://www.owl-ontologies.com/ontology1.owl" defaultLanguage="en" representsVersion="UNK">
                <localName>ontology1</localName>
                <mappings dc="mappings">
                        <supportedCodingScheme isImported="true" localId="p1" urn="http://www.owl-ontologies.com/Ontology1222730448.owl"/>
                        <supportedCodingScheme isImported="false" localId="ontology1" urn="http://www.owl-ontologies.com/ontology1.owl"/>
                </mappings>
        </codingScheme>
        <codingScheme  codingScheme="ontology2" formalName="Ontology Two" registeredName="http://www.owl-ontologies.com/ontology2.owl" defaultLanguage="en" representsVersion="UNK">
                <localName>ontology2</localName>
                <mappings dc="mappings">
                        <supportedCodingScheme isImported="true" localId="p1" urn="http://www.owl-ontologies.com/Ontologyabc.owl"/>
                        <supportedCodingScheme isImported="false" localId="ontology2" urn="http://www.owl-ontologies.com/ontology2.owl"/>
                </mappings>
        </codingScheme>
</codingSchemes>

The local identifiers only apply within the scope of the containing coding scheme. The identifier, "p1", has no significance outside of the scope of the containing coding scheme, and needs to be translated into the corresponding URI for external reference. The same approach applies to all local identifiers as well. As an example:

  <codingScheme  codingScheme="ontology1" formalName="Ontology One" registeredName="http://www.owl-ontologies.com/ontology1.owl" defaultLanguage="en" representsVersion="UNK">
                <localName>ontology1</localName>
                <mappings dc="mappings">
                        <supportedLanguage localId="en" urn="urn:oid:2.16.840.1.113883.6.84:en"/>
                </mappings>
        </codingScheme>
        <codingScheme  codingScheme="ontology2" formalName="Ontology Two" registeredName="http://www.owl-ontologies.com/ontology2.owl" defaultLanguage="en-US" representsVersion="UNK">
                <localName>ontology2</localName>
                <mappings dc="mappings">
                        <supportedLanguage localId="en-US" urn="urn:oid:2.16.840.1.113883.6.84:en"/>
                        <supportedLanguage localId="eng" urn="urn:oid:2.16.840.1.113883.6.84:en"/>
                </mappings>
        </codingScheme>

ontology1 uses "en" to identify the English language, as identified by IETF 1176. ontology2 uses both "en-US" and "eng" to reference the same thing.

There are a couple of specific issues that need to be addressed when it comes to coding scheme inclusion:

  1. Two or more ontologies are imported and each ontology refers to the same URI with a different local name.
  2. Two or more ontologies are imported and each ontology refers to a different URI with the same local name.

Same URI, different local name

Take, as an example, http://wiktology.com/rdf/2008/PR-loon-guide/BadFood#, which references the Food Ontology as "feed", and http://wiktology.com/rdf/2008/PR-loon-guide/GoodFood#, which references the same ontology as "fud". A question arises about how these resources should be referenced when they are imported by a third ontology (e.g. http://wiktology.com/rdf/2008/PR-loon-guide/AllFood#).

As near as we can determine, the current version of Protege 4 (Version 4.0 Build 64) doesn't use local identifiers within the editor itself. Internally, local identifiers aren't used, and all references to classes, individuals and axioms are constructed in terms of the URI itself. The Ontology Imports tab appears to use the local identifier from the source ontology. We haven't determined what it does if there isn't a local name in the source. The RDF/XML Rendering tab uses the local name assigned to the resource in the immediately containing ontology.

The equivalent LexGrid rendering *should* follow the same pattern as above, although this still needs to be verified.

Same local name, different URI

The same approach as above, except that http://wiktology.com/rdf/2008/PR-loon-guide/BadBeer# uses the local name "thefood" to refer to http://wiktology.com/rdf/2008/PR-loon-guide/BadFood#, and http://wiktology.com/rdf/2008/PR-loon-guide/GoodBeer# uses the local name "thefood" to refer to http://wiktology.com/rdf/2008/PR-loon-guide/GoodFood#. The results are are the same as above - Protege 4 appears to use the URN internally and resolves the local name when producing rendering.

The equivalent LexGrid rendering *should* follow the same pattern as above, although this still needs to be verified.

Note: We attempted to read the various "loon" files above with Protege 3.3.1, but weren't able to get anything useful to happen.

The equivalent LexGrid rendering of these examples should be:

Inheritance

The issue is not so much with the model but with the need to develop an API that handles inheritance. Currently, the LexGrid API cannot provide any inherited properties or restrictions for a class. There are two possible approaches:

  • have LexGrid pre-compute all the inheritance relationships at loading time and store them;
  • find the inherited relationships on the fly, when returning informaiton about a class through the LexGrid API.

The first option is likely to increase the required disk space by an order of magnitude (from our experience with NCIT), but will allow using Protege to determine the inheritance relationships. The second option will require non-trivial processing on the API side, as our experience with Protege shows. Determining all the inheritance correctly is harder than it seems. Note that we need inheritance of restrictions, as well as properties for which a class is a domain (see the next item)

--Hsolbrig 09:43, 15 October 2008 (PDT) An additional issue is determining which classification algorithm to use. Not only are there multiple possibilities within the OWL-DL space, but LexGrid can also represent things like the UMLS Semantic Net, which includes "blocking" and other entertaining features. That said, there is an issue that needs to be resolved if the pre-classified approach is to be taken - we need to know which axioms were asserted and which were inferred. This is important both from a browsing and editing perspective as well as necessary for export. The existing OWL specification has no way to differentiate this. Perhaps it would be worthwhile to provide some sort of classifier plug-in that could be invoked at load-time to address this problem.

Properties with no domains and ranges

The LexGrid/OWL specification says:

  • (OWLObjectProperty) "An association between two classes (hasDomain, hasRange)."
  • (OWLDatatypeProperty) "An association between one class (domain) and one association (hasDomain and hasDataProperty). The conceptProperty defines the range."

However, both datatype properties and object properties can be defined without any domains and ranges (and often are)

--Hsolbrig 09:46, 15 October 2008 (PDT) The existence of domains and ranges or data properties are not a requirement for determining whether a Property is an ObjectProperty or DatatypeProperty - the description in the mapping spreadsheet was misleading.

--Natasha Noy The question here was whether an association can exit without any source and target

--Jpathak 15:20, 16 October 2008 (CDT) Absolutely, and here is an example where both the domain and range of the property hasIngredient is empty:

<owl:ObjectProperty rdf:ID="hasIngredient">
    <rdfs:comment xml:lang="en">NB Transitive - the ingredients of ingredients are ingredients of the whole</rdfs:comment>
    <owl:inverseOf>
      <owl:ObjectProperty rdf:about="#isIngredientOf"/>
    </owl:inverseOf>
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#TransitiveProperty"/>
</owl:ObjectProperty>
<lgRel:association codingSchemeId="rdfs" id="hasDomain" forwardName="hasDomain" isReflexive="false" isSymmetric="false" isTransitive="true">
  <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="association" sourceId="hasIngredient">
        <lgRel:targetEntity targetEntityType="concept" targetId=""/>
  </lgRel:sourceEntity>
</lgRel:association>

<lgRel:association codingSchemeId="rdfs" id="hasRange" forwardName="hasRange" isReflexive="false" isSymmetric="false" isTransitive="false">
  <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="association" sourceId="hasIngredient">
        <lgRel:targetEntity targetEntityType="concept" targetId=""/>
   </lgRel:sourceEntity>
</lgRel:Association>

Multiple domains and ranges

It was not clear from our discussion if the model can handle multiple domains and ranges. Usually the requiried semantics is that of a union. Suppose a property P has two domains, A and B, and two ranges, C and D. When rendering details for the class A, BioPortal will need to display all the properties where A is in the domain, and show all the ranges for those properties. So, in this case, when asked for A's properties, we would expect to get the property P, with two ranges, C and D. Note, that we would also want to get that for any subclass of A (see the inheritance point above).

--Hsolbrig 09:47, 15 October 2008 (PDT) To the best of our knowledge, multiple domains and ranges work correctly. LexGrid doesn't add any semantics beyond what is expressed in the OWL itself. We will provide an example here in a few days.

--Jpathak 08:36, 16 October 2008 (CDT) Here is a small example which has been adopted from the Pizza ontology ([1]) by introducing a new class Other_Spiciness and adding it to the domain of the object property hasSpiciness. Here is a snippet from the OWL rendering:


<owl:Class rdf:ID="Other_Spiciness">
    <owl:equivalentClass>
      <owl:Class>
        <owl:unionOf rdf:parseType="Collection">
          <owl:Class rdf:ID="Indian"/>
          <owl:Class rdf:ID="Mexican"/>
          <owl:Class rdf:ID="Thai"/>
       </owl:unionOf>
      </owl:Class>
    </owl:equivalentClass>
    <rdfs:subClassOf>
      <owl:Class rdf:about="#ValuePartition"/>
    </rdfs:subClassOf>
</owl:Class>

<owl:ObjectProperty rdf:about="#hasSpiciness">
    <rdfs:comment xml:lang="en">A property created to be used with the ValuePartition - Spiciness.</rdfs:comment>
    <rdfs:range>
      <owl:Class>
        <owl:unionOf rdf:parseType="Collection">
          <owl:Class rdf:about="#Spiciness"/>
          <owl:Class rdf:about="#Other_Spiciness"/>
        </owl:unionOf>
      </owl:Class>
    </rdfs:range>
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/>
</owl:ObjectProperty>

Then, Other_Spiciness, Mexican, Indian etc. become concepts in our codingScheme, and the domain and range of the property hasSpiciness is represented via the associations hasDomain and hasRange, respectively, as follows in LexGrid:


<lgRel:association codingSchemeId="rdfs" id="hasDomain" forwardName="hasDomain" isReflexive="false" isSymmetric="false" isTransitive="true">
   <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="association" sourceId="hasSpiciness">
        <lgRel:targetEntity targetEntityType="concept" targetId=""/>
   </lgRel:sourceEntity>
</lgRel:association>

<lgRel:association codingSchemeId="rdfs" id="hasRange" forwardName="hasRange" isReflexive="false" isSymmetric="false" isTransitive="false">
   <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="association" sourceId="hasSpiciness">
        <lgRel:targetEntity targetCodingScheme="pizza" targetEntityType="concept" targetId="Spiciness"/>
        <lgRel:targetEntity targetCodingScheme="pizza" targetEntityType="concept" targetId="Other_Spiciness"/>
   </lgRel:sourceEntity>
</lgRel:association>


Furthermore, if there is an OWL class defined as: OnionTopping that (hasSpiciness some Medium), it will be represented as:

<lgRel:association codingSchemeId="pizza" id="hasSpiciness" forwardName="hasSpiciness" isFunctional="true" isReverseFunctional="false" isSymmetric="false" isTransitive="false">
    <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="concept" sourceId="OnionTopping">
       <lgRel:targetEntity targetCodingScheme="pizza" targetEntityType="concept" targetId="Medium">
         <lgRel:associationQualification associationQualifier="owl:someValuesFrom"/>
       </lgRel:targetEntity>
    </lgRel:sourceEntity>

  <lgRel:associationProperty propertyId="P0015" propertyName="isDatatypeProperty">
         <lgCommon:text>false</lgCommon:text>
  </lgRel:associationProperty>
  <lgRel:associationProperty propertyId="P0016" propertyName="isObjectProperty">
         <lgCommon:text>true</lgCommon:text>
  </lgRel:associationProperty>
</lgRel:association>

Major features in the model that have not been validated yet

  1. Instances
  2. Subproperties

--Jpathak 10:10, 16 October 2008 (CDT) We address the above items as follows:

  • For instances, assume that the class Country in the Pizza ontology has been modified as follows (by adding a new hasShortName datatype property):
<owl:Class rdf:ID="Country">
    <owl:equivalentClass>
      <owl:Class>
        <owl:intersectionOf rdf:parseType="Collection">
          <owl:Class>
            <owl:oneOf rdf:parseType="Collection">
              <Country rdf:ID="America">
                <hasShortName xml:lang="en">USA</hasShortName>
                <hasCapital>
                  <Capital rdf:ID="Washington_DC"/>
                </hasCapital>
              </Country>
              <Country rdf:ID="England">
                <hasShortName xml:lang="en">UK</hasShortName>
                <hasCapital>
                  <Capital rdf:ID="London"/>
                </hasCapital>
              </Country>
              <Country rdf:ID="France">
                <hasShortName xml:lang="en">FR</hasShortName>
                <hasCapital>
                  <Capital rdf:ID="Paris"/>
                </hasCapital>
              </Country>
              <Country rdf:ID="Germany">
                <hasShortName xml:lang="en">GDR</hasShortName>
                <hasCapital>
                  <Capital rdf:ID="Berlin"/>
                </hasCapital>
              </Country>
              <Country rdf:about="#Italy"/>
            </owl:oneOf>
          </owl:Class>
          <owl:Class rdf:about="#DomainConcept"/>
        </owl:intersectionOf>
      </owl:Class>
    </owl:equivalentClass>
    <rdfs:label xml:lang="pt">Pais</rdfs:label>
    <rdfs:comment xml:lang="en">A class that is equivalent to the set of individuals that are described in the enumeration - ie Countries can only be either America, England, France, Germany or Italy and nothing else. Note that these individuals have been asserted to be allDifferent from each other.</rdfs:comment>
</owl:Class>

Then, the information about an individual, such as Germany, will be represented in LexGrid as follows:

<instances dc="instances">
 <lgCon:instance codingSchemeId="pizza" id="Germany">
      <lgCommon:entityDescription>Germany</lgCommon:entityDescription>
      <lgCon:instanceProperty propertyId="P0001" propertyName="isInstanceOf">
        <lgCommon:text>Country</lgCommon:text>
      </lgCon:instanceProperty>
      <lgCon:instanceProperty propertyId="P0002" propertyName="hasShortName">
        <lgCommon:text>GDR</lgCommon:text>
      </lgCon:instanceProperty>
      <lgCon:instanceProperty propertyId="P0003" propertyName="hasCapital">
        <lgCommon:text>Berlin</lgCommon:text>
      </lgCon:instanceProperty>
 </lgCon:instance>
</instances>
  • For sub-properties, again lets taken an OWL snippet from the Pizza ontology:
  <owl:ObjectProperty rdf:about="#hasBase">
    <rdfs:range rdf:resource="#PizzaBase"/>
    <rdfs:domain rdf:resource="#Pizza"/>
    <owl:inverseOf>
      <owl:ObjectProperty rdf:ID="isBaseOf"/>
    </owl:inverseOf>
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#InverseFunctionalProperty"/>
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/>
    <rdfs:subPropertyOf rdf:resource="#hasIngredient"/>
  </owl:ObjectProperty>

This information about the property hasBase will be represented in LexGrid with the relations container as:

<relations dc="associations">
  <lgRel:association codingSchemeId="rdfs" id="subPropertyOf" forwardName="subPropertyOf" isFunctional="false" isReflexive="true" isSymmetric="false" isTransitive="true">
      <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="association" sourceId="hasBase">
        <lgRel:targetEntity targetCodingScheme="pizza" targetEntityType="association" targetId="hasIngredient"/>
      </lgRel:sourceEntity>
   </lbRel:association>
   <lgRel:association codingSchemeId="rdfs" id="hasDomain" forwardName="hasDomain" isReflexive="false" isSymmetric="false" isTransitive="true">
      <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="association" sourceId="hasBase">
        <lgRel:targetEntity targetCodingScheme="pizza" targetEntityType="concept" targetId="Pizza"/>
      </lgRel:sourceEntity>
   </lbRel:association>
   <lgRel:association codingSchemeId="rdfs" id="hasRange" forwardName="hasRange" isReflexive="false" isSymmetric="false" isTransitive="false">
      <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="association" sourceId="hasBase">
        <lgRel:targetEntity targetCodingScheme="pizza" targetEntityType="concept" targetId="PizzaBase"/>
      </lgRel:sourceEntity>
   </lbRel:association>
    <lgRel:association codingSchemeId="owl" id="inverseOf" forwardName="inverseOf" isFunctional="false" isReflexive="true" isSymmetric="true" isTransitive="true" reverseName="inverseOf">
      <lgRel:sourceEntity sourceCodingScheme="pizza" sourceEntityType="association" sourceId="hasBase">
        <lgRel:targetEntity targetCodingScheme="pizza" targetEntityType="association" targetId="isBaseOf"/>
      </lgRel:sourceEntity>
    </lbRel:association>
</relations>

Note that additional information about hasBase (e.g., it is an InverseFunctionalProperty and a FunctionalProperty) is not displayed above (you can find this information from the XML if you download the ZIP file: [2].)

Major features that may already be in the API or the UI, but we did not validate yet

  1. Distinction between annotation properties and restrictions or properties that have a class in the domain (there are different properties at a class and will need to be returned/rendered differently
  2. Domains and ranges of properties (in the case where there is one domain and one range)
  3. Property values (for datatype properties in particular)

--Hsolbrig 09:50, 15 October 2008 (PDT) Item (1) above is of some concern. What we propose doing is to represent AnnotationProperties as "Properties" and DatatypeProperties as "associations". There are apparently some issues with this, however, when it comes to the API and its rendering. At the moment, we reflect all DatatypeProperties as both, which is not a good idea in the long run.

Features that are missing from the model but that are easy to add

  • Distinction between necessary and necessary and sufficient conditions for defined classes (an extra qualifier for the restriction)

--Jpathak 10:36, 16 October 2008 (CDT) I agree.

Resolved Items