Difference between revisions of "Ontology Metrics"
(15 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
= Ontology Metrics in BioPortal = | = Ontology Metrics in BioPortal = | ||
− | This page describes the metrics that BioPortal calculates for the ontologies in its repository. | + | This page describes the metrics that BioPortal calculates for the ontologies in its repository. BioPortal calculates metrics when the ontology is uploaded and will store them as part of the [http://www.bioontology.org/wiki/index.php/BioPortal_Metadata Ontology Metadata]. There are two groups of metrics: |
* '''Statistical''' metrics | * '''Statistical''' metrics | ||
* '''Quality-control''' and '''quality-assurance''' metrics | * '''Quality-control''' and '''quality-assurance''' metrics | ||
− | + | Some metrics are meaningful only for ontologies in a specific representation language (e.g., there are no individuals to count in the ontologies in OBO format). | |
== Statistical Metrics == | == Statistical Metrics == | ||
We calculate the following statistics on each ontology in BioPortal: | We calculate the following statistics on each ontology in BioPortal: | ||
− | * '''number of classes''': the number of classes in the ontology. | + | * '''number of classes''': the number of named (not anonymous) classes in the ontology. |
− | |||
− | |||
* '''number of properties:''' the number of properties or slots in the ontology. | * '''number of properties:''' the number of properties or slots in the ontology. | ||
− | * '''number of individuals:''' the number of individuals in the | + | * '''number of individuals:''' the number of individuals in the ontology. |
− | |||
− | |||
− | |||
* '''maximum depth:''' the maximum depth of the hierarchy tree. | * '''maximum depth:''' the maximum depth of the hierarchy tree. | ||
** For OWL, RDFS, Protege, and RRF ontologies, we consider only the ''is-a'' relationship as a hierarchical relationship | ** For OWL, RDFS, Protege, and RRF ontologies, we consider only the ''is-a'' relationship as a hierarchical relationship | ||
Line 23: | Line 18: | ||
* '''average number of siblings:''' the average number of siblings at one level in the tree. | * '''average number of siblings:''' the average number of siblings at one level in the tree. | ||
* '''maximum number of siblings:''' the maximum number of siblings in the ontology. | * '''maximum number of siblings:''' the maximum number of siblings in the ontology. | ||
− | |||
− | |||
== Quality-Control and Quality-Assurance Metrics == | == Quality-Control and Quality-Assurance Metrics == | ||
− | We calculate the following metrics that may give some indication of the quality of the ontology and help ontology authors improve the quality. | + | We calculate the following metrics that may give some indication of the quality of the ontology and help ontology authors improve the quality. To improve their usefulness, we limit the size of lists for any one of these metrics to 200. Ontologies with more than 200 classes in a category will still have the total number of classes in that category counted, but no list will be available. |
* '''classes with only one subclass''': a list of classes that have only one subclass in the ''is-a'' hierarchy. While technically there is no problem in having only one subclass, this situation often indicates that either the hierarchy is under-specified, or the distinction between the class and the subclass is not appropriate. | * '''classes with only one subclass''': a list of classes that have only one subclass in the ''is-a'' hierarchy. While technically there is no problem in having only one subclass, this situation often indicates that either the hierarchy is under-specified, or the distinction between the class and the subclass is not appropriate. | ||
− | * '''classes with | + | * '''classes with more than 25 subclasses''': a list of classes that have more than direct subclasses. A class that has more than 25 subclasses is a candidate for additional distinctions and categorization are needed. |
− | * '''classes with no definition''': a list of classes that have no value for the definition property. For ontologies in OBO and RRF formats, the property for definition is part of the language. For OWL ontologies, the authors specify this property as part of the ontology metadata (the default is <tt>skos:definition</tt>). | + | * '''classes with no definition''': a list of classes that have no value for the definition property. For ontologies in OBO and RRF formats, the property for definition is part of the language. For OWL ontologies, the authors specify this property as part of the ontology metadata (the default is <tt>skos:definition</tt>). |
− | |||
− | Please, contact us at [mailto:support@bioontology.org support@bioontology.org], if there are additional | + | Please, contact us at [mailto:support@bioontology.org support@bioontology.org], if there are additional metrics of any kind that you would like to see. |
+ | |||
+ | |||
+ | = Generating metrics = | ||
+ | |||
+ | When BioPortal loads and parses a version of an ontology, it then starts a process that calculates the metrics and stores the values as part of the metadata about that ontology version. | ||
+ | |||
+ | == Metrics computation and ontology metadata == | ||
+ | |||
+ | Metric generation relies on the following information being specified in the ontology metadata: | ||
+ | |||
+ | '''For OBO and RRF ontologies:''' | ||
+ | * property used to specify the authors of a class | ||
+ | |||
+ | '''For OWL ontologies:''' | ||
+ | * property used to specify preferred name (default <tt>rdfs:label</tt>, and if not specified, then <tt>rdf:ID</tt>) | ||
+ | * property used to specify synonyms (default <tt>skos:altLabel</tt>) | ||
+ | * property used to specify the definition of a class (default <tt>skos:definition</tt>) | ||
+ | * property used to specify the author of a class (default <tt>dc:creator</tt>) | ||
+ | |||
+ | = Representing metrics in BioPortal = | ||
+ | |||
+ | The metrics in BioPortal are stored as part of the ontology metadata. Specifically, each instance of the <tt>OMV:Ontology</tt> class, which describes the metadata about a single ontology version, has a number of properties that store the metrics. Some of these properties are imported from the Ontology Metadata Vocabulary, OMV (the <tt>omv:</tt> prefix) and the rest are defined locally in the BioPortal Metadata Ontology (the <tt>metrics:</tt> prefix). These properties include: | ||
+ | *<tt>OMV:numberOfClasses</tt> | ||
+ | *<tt>OMV:numberOfProperties</tt> | ||
+ | *<tt>OMV:numberOfAxioms</tt> | ||
+ | *<tt>OMV:numberOfIndividuals</tt> | ||
+ | * <tt>metrics:averageNumberOfSiblings</tt> | ||
+ | * <tt>metrics:maximumNumberOfSiblings</tt> | ||
+ | * <tt>metrics:classesWithNoDocumentation</tt> | ||
+ | * <tt>metrics:maximumDepth</tt> | ||
+ | * <tt>metrics:preferredMaximumSubclassLimit</tt> | ||
+ | * <tt>metrics:classesWithSingleSubclass</tt> | ||
+ | * <tt>metrics:classesWithNoAuthor</tt> | ||
+ | * <tt>metrics:classesWithMoreThanXSubclasses</tt> | ||
+ | * <tt>metrics:classesWithMoreThanOnePropertyValueForPropertyWithUniqueValue</tt> | ||
+ | |||
+ | = Accessing metrics in BioPortal = | ||
+ | |||
+ | BioPortal users will be able to access ontology metrics in two ways: | ||
+ | * through the BioPortal user interface, as part of the Ontology Details tab (not implemented yet) | ||
+ | * through a dedicated REST service that returns a <tt>MetricsBean</tt> | ||
+ | |||
+ | == REST service access == | ||
+ | |||
+ | Applications can use the following REST service to access metrics information for a specific ontology version. | ||
+ | ''Note: there is no version of this service that uses virtual ontology id yet.'' | ||
+ | |||
+ | * '''Signature''': ./ontologies/metrics/{ontologyVersionId}?apikey={YourAPIKey} | ||
+ | * '''Example''': http://rest.bioontology.org/bioportal/ontologies/metrics/40133?apikey=YourAPIKey | ||
+ | |||
+ | The service returns a <tt>ontologyMetricsBean</tt> that contains the version id for the ontology and the values for the metrics that have values computed for them: | ||
+ | |||
+ | <pre> | ||
+ | <success> | ||
+ | <accessedResource>/bioportal/ontologies/metrics/42031</accessedResource> | ||
+ | <accessDate>2010-01-25 15:15:36.742 PST</accessDate> | ||
+ | <data> | ||
+ | <ontologyMetricsBean> | ||
+ | <id>42031</id> | ||
+ | <numberOfAxioms>0</numberOfAxioms> | ||
+ | |||
+ | <numberOfClasses>293</numberOfClasses> | ||
+ | <numberOfIndividuals>0</numberOfIndividuals> | ||
+ | <numberOfProperties>8</numberOfProperties> | ||
+ | <maximumDepth>6</maximumDepth> | ||
+ | <maximumNumberOfSiblings>19</maximumNumberOfSiblings> | ||
+ | <averageNumberOfSiblings>1</averageNumberOfSiblings> | ||
+ | |||
+ | <classesWithOneSubclass> | ||
+ | <string>APO:0000003</string> | ||
+ | <string>APO:0000039</string> | ||
+ | <string>APO:0000064</string> | ||
+ | <string>APO:0000084</string> | ||
+ | <string>APO:0000101</string> | ||
+ | |||
+ | <string>APO:0000105</string> | ||
+ | <string>APO:0000210</string> | ||
+ | <string>APO:0000213</string> | ||
+ | <string>APO:0000215</string> | ||
+ | <string>APO:0000255</string> | ||
+ | </classesWithOneSubclass> | ||
+ | |||
+ | <classesWithMoreThanXSubclasses/> | ||
+ | <classesWithNoDocumentation/> | ||
+ | <classesWithNoAuthor/> | ||
+ | <classesWithMoreThanOnePropertyValue/> | ||
+ | </ontologyMetricsBean> | ||
+ | </data> | ||
+ | </success> | ||
+ | </pre> | ||
+ | |||
+ | If a given ontology has more than 200 classes identified for any single metric, then that metric will provide a parsable error message in place of the class list. Two statuses are available: | ||
+ | * '''limitpassed''': signifies that the 200 class limit was reached and passed, presents this error followed by the total number of classes in the ontology that matches this metric. limitpassed can be presented in two ways: | ||
+ | ** <pre><string>limitpassed:NNNN</string></pre> | ||
+ | ** <pre><string>limitpassed:</string></pre><pre><string>NNNN</string></pre> | ||
+ | * '''alltriggered''': Every class in the ontology matched this metric. alltriggered is presented like this: | ||
+ | ** <pre><string>alltriggered</string></pre> | ||
+ | |||
+ | |||
+ | <pre> | ||
+ | <success> | ||
+ | <accessedResource>/bioportal/ontologies/metrics/40644</accessedResource> | ||
+ | <accessDate>2010-01-25 15:13:06.40 PST</accessDate> | ||
+ | <data> | ||
+ | <ontologyMetricsBean> | ||
+ | <id>40644</id> | ||
+ | <numberOfClasses>74646</numberOfClasses> | ||
+ | <numberOfIndividuals>0</numberOfIndividuals> | ||
+ | <numberOfProperties>191</numberOfProperties> | ||
+ | <maximumDepth>14</maximumDepth> | ||
+ | <maximumNumberOfSiblings>3215</maximumNumberOfSiblings> | ||
+ | <averageNumberOfSiblings>139</averageNumberOfSiblings> | ||
+ | <classesWithOneSubclass> | ||
+ | <string>limitpassed:5153</string> | ||
+ | </classesWithOneSubclass> | ||
+ | <classesWithMoreThanXSubclasses> | ||
+ | <entry> | ||
+ | <string>limitpassed:</string> | ||
+ | <string>22932</string> | ||
+ | </entry> | ||
+ | </classesWithMoreThanXSubclasses> | ||
+ | <classesWithNoDocumentation> | ||
+ | <string>limitpassed:37788</string> | ||
+ | </classesWithNoDocumentation> | ||
+ | <classesWithNoAuthor> | ||
+ | <string>alltriggered</string> | ||
+ | </classesWithNoAuthor> | ||
+ | <classesWithMoreThanOnePropertyValue/> | ||
+ | </ontologyMetricsBean> | ||
+ | </data> | ||
+ | </success> | ||
+ | |||
+ | </pre> |
Latest revision as of 01:13, 6 September 2012
Ontology Metrics in BioPortal
This page describes the metrics that BioPortal calculates for the ontologies in its repository. BioPortal calculates metrics when the ontology is uploaded and will store them as part of the Ontology Metadata. There are two groups of metrics:
- Statistical metrics
- Quality-control and quality-assurance metrics
Some metrics are meaningful only for ontologies in a specific representation language (e.g., there are no individuals to count in the ontologies in OBO format).
Statistical Metrics
We calculate the following statistics on each ontology in BioPortal:
- number of classes: the number of named (not anonymous) classes in the ontology.
- number of properties: the number of properties or slots in the ontology.
- number of individuals: the number of individuals in the ontology.
- maximum depth: the maximum depth of the hierarchy tree.
- For OWL, RDFS, Protege, and RRF ontologies, we consider only the is-a relationship as a hierarchical relationship
- For OBO format ontologies, we consider the following relationships as hierarchical relationships: is-a, has-part, inverse of develops-from
- average number of siblings: the average number of siblings at one level in the tree.
- maximum number of siblings: the maximum number of siblings in the ontology.
Quality-Control and Quality-Assurance Metrics
We calculate the following metrics that may give some indication of the quality of the ontology and help ontology authors improve the quality. To improve their usefulness, we limit the size of lists for any one of these metrics to 200. Ontologies with more than 200 classes in a category will still have the total number of classes in that category counted, but no list will be available.
- classes with only one subclass: a list of classes that have only one subclass in the is-a hierarchy. While technically there is no problem in having only one subclass, this situation often indicates that either the hierarchy is under-specified, or the distinction between the class and the subclass is not appropriate.
- classes with more than 25 subclasses: a list of classes that have more than direct subclasses. A class that has more than 25 subclasses is a candidate for additional distinctions and categorization are needed.
- classes with no definition: a list of classes that have no value for the definition property. For ontologies in OBO and RRF formats, the property for definition is part of the language. For OWL ontologies, the authors specify this property as part of the ontology metadata (the default is skos:definition).
Please, contact us at support@bioontology.org, if there are additional metrics of any kind that you would like to see.
Generating metrics
When BioPortal loads and parses a version of an ontology, it then starts a process that calculates the metrics and stores the values as part of the metadata about that ontology version.
Metrics computation and ontology metadata
Metric generation relies on the following information being specified in the ontology metadata:
For OBO and RRF ontologies:
- property used to specify the authors of a class
For OWL ontologies:
- property used to specify preferred name (default rdfs:label, and if not specified, then rdf:ID)
- property used to specify synonyms (default skos:altLabel)
- property used to specify the definition of a class (default skos:definition)
- property used to specify the author of a class (default dc:creator)
Representing metrics in BioPortal
The metrics in BioPortal are stored as part of the ontology metadata. Specifically, each instance of the OMV:Ontology class, which describes the metadata about a single ontology version, has a number of properties that store the metrics. Some of these properties are imported from the Ontology Metadata Vocabulary, OMV (the omv: prefix) and the rest are defined locally in the BioPortal Metadata Ontology (the metrics: prefix). These properties include:
- OMV:numberOfClasses
- OMV:numberOfProperties
- OMV:numberOfAxioms
- OMV:numberOfIndividuals
- metrics:averageNumberOfSiblings
- metrics:maximumNumberOfSiblings
- metrics:classesWithNoDocumentation
- metrics:maximumDepth
- metrics:preferredMaximumSubclassLimit
- metrics:classesWithSingleSubclass
- metrics:classesWithNoAuthor
- metrics:classesWithMoreThanXSubclasses
- metrics:classesWithMoreThanOnePropertyValueForPropertyWithUniqueValue
Accessing metrics in BioPortal
BioPortal users will be able to access ontology metrics in two ways:
- through the BioPortal user interface, as part of the Ontology Details tab (not implemented yet)
- through a dedicated REST service that returns a MetricsBean
REST service access
Applications can use the following REST service to access metrics information for a specific ontology version. Note: there is no version of this service that uses virtual ontology id yet.
- Signature: ./ontologies/metrics/{ontologyVersionId}?apikey={YourAPIKey}
- Example: http://rest.bioontology.org/bioportal/ontologies/metrics/40133?apikey=YourAPIKey
The service returns a ontologyMetricsBean that contains the version id for the ontology and the values for the metrics that have values computed for them:
<success> <accessedResource>/bioportal/ontologies/metrics/42031</accessedResource> <accessDate>2010-01-25 15:15:36.742 PST</accessDate> <data> <ontologyMetricsBean> <id>42031</id> <numberOfAxioms>0</numberOfAxioms> <numberOfClasses>293</numberOfClasses> <numberOfIndividuals>0</numberOfIndividuals> <numberOfProperties>8</numberOfProperties> <maximumDepth>6</maximumDepth> <maximumNumberOfSiblings>19</maximumNumberOfSiblings> <averageNumberOfSiblings>1</averageNumberOfSiblings> <classesWithOneSubclass> <string>APO:0000003</string> <string>APO:0000039</string> <string>APO:0000064</string> <string>APO:0000084</string> <string>APO:0000101</string> <string>APO:0000105</string> <string>APO:0000210</string> <string>APO:0000213</string> <string>APO:0000215</string> <string>APO:0000255</string> </classesWithOneSubclass> <classesWithMoreThanXSubclasses/> <classesWithNoDocumentation/> <classesWithNoAuthor/> <classesWithMoreThanOnePropertyValue/> </ontologyMetricsBean> </data> </success>
If a given ontology has more than 200 classes identified for any single metric, then that metric will provide a parsable error message in place of the class list. Two statuses are available:
- limitpassed: signifies that the 200 class limit was reached and passed, presents this error followed by the total number of classes in the ontology that matches this metric. limitpassed can be presented in two ways:
<string>limitpassed:NNNN</string>
<string>limitpassed:</string>
<string>NNNN</string>
- alltriggered: Every class in the ontology matched this metric. alltriggered is presented like this:
<string>alltriggered</string>
<success> <accessedResource>/bioportal/ontologies/metrics/40644</accessedResource> <accessDate>2010-01-25 15:13:06.40 PST</accessDate> <data> <ontologyMetricsBean> <id>40644</id> <numberOfClasses>74646</numberOfClasses> <numberOfIndividuals>0</numberOfIndividuals> <numberOfProperties>191</numberOfProperties> <maximumDepth>14</maximumDepth> <maximumNumberOfSiblings>3215</maximumNumberOfSiblings> <averageNumberOfSiblings>139</averageNumberOfSiblings> <classesWithOneSubclass> <string>limitpassed:5153</string> </classesWithOneSubclass> <classesWithMoreThanXSubclasses> <entry> <string>limitpassed:</string> <string>22932</string> </entry> </classesWithMoreThanXSubclasses> <classesWithNoDocumentation> <string>limitpassed:37788</string> </classesWithNoDocumentation> <classesWithNoAuthor> <string>alltriggered</string> </classesWithNoAuthor> <classesWithMoreThanOnePropertyValue/> </ontologyMetricsBean> </data> </success>