Difference between revisions of "SPARQL BioPortal"

From NCBO Wiki
Jump to navigation Jump to search
m
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
<span style="color:red; font-weight:bold">PLEASE NOTE:</span>
 +
 +
The BioPortal open SPARQL endpoint is deprecated and shut down. This decision was made due to lack of resources to maintain the service. We understand the impact this may have on your workflows and apologize for any inconvenience caused.
 +
 +
To continue accessing data in the BioPortal ontology repository, please use our actively maintained [https://data.bioontology.org/documentation REST API].
 +
 +
If you have any questions or require assistance, please reach out [mailto:support@bioontology.org to support].<br /><br />
 +
<hr /><br />
 +
 
NCBO is releasing a free and open SPARQL endpoint to query ontologies hosted in the BioPortal ontology repository. This SPARQL service, that is in BETA status, is stable for testing by our community of users. If you encounter any errors or unexpected behavior please report it to us [mailto:support@bioontology.org support@bioontology.org].  
 
NCBO is releasing a free and open SPARQL endpoint to query ontologies hosted in the BioPortal ontology repository. This SPARQL service, that is in BETA status, is stable for testing by our community of users. If you encounter any errors or unexpected behavior please report it to us [mailto:support@bioontology.org support@bioontology.org].  
  
Line 168: Line 177:
 
In this case this query returns a count of 4393 solutions.  
 
In this case this query returns a count of 4393 solutions.  
 
It is important to notice that for mappings we use version based ontology URIs and not the ones that are based on acronyms.
 
It is important to notice that for mappings we use version based ontology URIs and not the ones that are based on acronyms.
 +
 +
=== Federated SPARQL queries ===
 +
 +
SPARQL Federation can be used with the Jena ARQ library. The Jena ARQ library handles the SERVICE SPARQL construct and directs sets of triple patterns to different endpoint and handles the joins.
 +
 +
BioPortal offers two different SPARQL endpoints one for ontologies (metadata and terms) and a second one for mappings. There are use cases that require queries with joins from both endpoints. This can be achieved by using SPARQL federation and the SERVICE feature (SPARQL 1.1). The following example looks at the CSP term 'Neck' and retrieves the sources for all the mappings for that term (from the mapping endpoint). For each of those sources then gets the parents (from the ontologies endpoint). Notice the change in the SERVICE endpoints from 'mappings' in the first block to 'ontologies' in the second one.
 +
 +
<pre>
 +
PREFIX map: <http://protege.stanford.edu/ontologies/mappings/mappings.rdfs#>
 +
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 +
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
 +
SELECT DISTINCT ?mappedParent WHERE {
 +
    SERVICE <http://sparql.bioontology.org/mappings/sparql/?apikey=YOUR API KEY HERE> {
 +
        ?mapping map:target <http://purl.bioontology.org/ontology/CSP/0468-5952> .
 +
        ?mapping map:source ?source .
 +
    }
 +
    SERVICE <http://sparql.bioontology.org/ontologies/sparql/?apikey=YOUR API KEY HERE> {
 +
        ?source rdfs:subClassOf ?mappedParent .
 +
    }
 +
}
 +
</pre>
 +
 +
Full example: https://github.com/ncbo/sparql-code-examples/blob/master/java/src/org/ncbo/stanford/sparql/examples/JenaARQFederationExample.java
 +
 +
<div style="margin-top: 10px; background: #F6F9ED; padding: 6px; border: 1px solid #aaaaaa; margin-bottom: 20px;">
 +
<b>Note</b> SPARQL Federation can be used programmatically only. Our web front-end at http://sparql.bioontology.org/ does not support mediation for the SERVICE clause. Our tests for SPARQL federation use Jena ARQ 2.9.4, this version is able to correctly submit the BioPortal API keys.
 +
</div>
  
 
=== Partial or Incomplete Results ===
 
=== Partial or Incomplete Results ===

Latest revision as of 17:16, 4 January 2024

PLEASE NOTE:

The BioPortal open SPARQL endpoint is deprecated and shut down. This decision was made due to lack of resources to maintain the service. We understand the impact this may have on your workflows and apologize for any inconvenience caused.

To continue accessing data in the BioPortal ontology repository, please use our actively maintained REST API.

If you have any questions or require assistance, please reach out to support.



NCBO is releasing a free and open SPARQL endpoint to query ontologies hosted in the BioPortal ontology repository. This SPARQL service, that is in BETA status, is stable for testing by our community of users. If you encounter any errors or unexpected behavior please report it to us support@bioontology.org.

Before using the BioPortal SPARQL service please read our SPARQL Release Notes And Usage Policy

Web Interface and Query Examples

There is a Web interface to test SPARQL queries at http://sparql.bioontology.org/

Also, interactive examples can be tested here http://sparql.bioontology.org/examples

Submitting SPARQL queries programmatically

A github project contains examples to query our SPARQL service programmatically:

https://github.com/ncbo/sparql-code-examples

A tarball with these examples is for download here:

https://github.com/ncbo/sparql-code-examples/tarball/master

This project contains examples in Java, Python, JavaScript and Perl. Some of the examples use just language built-in capabilities and other need third-party libraries like Jena, Sesame or SPARQLWrapper. The github project and the tarball are self-contained, no need to download and install extra libraries.

To run these examples or any other SPARQL queries programmatically an API key from BioPortal is required. If you do not have a BioPortal account go to [New Account] and create one. Once you have the BioPortal account, login in BioPortal and go to your account details. You should see your API Key as part of your account profile.

Database Named Graph Structure

Each ontology is asserted into a single graph. The graph is named with an acronym based URI. For example, the graph:

http://bioportal.bioontology.org/ontologies/HP

contains the Human Phenotype Ontology ontology. And the graph:

http://bioportal.bioontology.org/ontologies/SNOMEDCT

contains the SNOMEDCT ontology.

The following query would return all version IDs with the graph IDs where ontologies are located:

PREFIX meta: <http://bioportal.bioontology.org/metadata/def/> 

SELECT DISTINCT ?version ?graph
WHERE { 
    ?version meta:hasDataGraph ?graph
}

BioPortal Preferred Label

There are problematic cases of label definition. In order to provide a consistent mechanism to query by label across different ontologies we generate labels for the following cases. These label are attached to terms using the predicate http://bioportal.bioontology.org/metadata/def/prefLabel (bp:prefLabel)

  • Missing labels: for every owl:Class that is missing a label we generate a label based on the latest fragment of URI.
  • Terms that use rdfs:label as preferred name: BioPortal uses skos:prefLabel and skos:altLabel for preferred names and synonyms respectively. Both skos:prefLabel and skos:altLabel are subproperties of rdfs:label in the SKOS ontology. If someone uses rdfs:label to record preferred names, in the SKOS context, he would be saying that that name can be a preferred name or a synonym. To avoid this confusion we generate bp:prefLabel(s) for every rdfs:label used as preferred name.

Preferred Label, Synonyms and other common predicates

When ontologies are submitted to BioPortal the user can select which predicates that ontology uses for:

  • Preferred Names.
  • Synonyms or alternative names.
  • Author.
  • Description.

The BioPortal SPARQL endpoint supports rdfs:subPropertyOf reasoning to enable cross querying across all these configurable predicates. In the triple store, the following URI:

http://bioportal.bioontology.org/ontologies/globals

is used as identifier for the named graph that contains all the sub-property of statements that have been configured by users when uploading their ontologies. The root properties to be used to trigger the reasoning are the following:

  • skos:prefLabel for Preferred name.
  • skos:altLabel for Synonyms or alternative names.
  • dc:author for Author.
  • skos:definition for Description.

When using named graphs if you want to use this reasoning then you should include the globals graph that contains the subproperty statements, i.e:

PREFIX owl:  <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?termURI ?prefLabel
 FROM <http://bioportal.bioontology.org/ontologies/EHDA>
 FROM <http://bioportal.bioontology.org/ontologies/globals> 
WHERE {
      ?termURI a owl:Class;
      skos:prefLabel ?prefLabel .
} 

Otherwise the subproperty statements that take part in the query processor will not be taken into account.

Mappings

Notice that as of January 30th (2013) we are hosting the mappings in a separate endpoint. This eases our regular mapping updates. Programmatic queries that target mapping data need to use the following endpoint: http://sparql.bioontology.org/mappings/sparql

The mapping data in the triple store is stored in the following graphs:

Graph Name Description
http://purl.bioontology.org/mapping/loom Loom: lexical mappings.
http://purl.bioontology.org/mapping/rest REST: User submitted by users via the REST API.
http://purl.bioontology.org/mapping/umls_cui UMLS-CUI: CUI based mappings
http://purl.bioontology.org/mapping/cuinonumls NOUMLS-CUI: Terms with same CUIs but for ontologies that are not part of UMLS.
http://purl.bioontology.org/mapping/obo_xref OBO-XREF: Mappings for terms with same xref attribute.
http://purl.bioontology.org/mapping/uri_match URI-MATCH: Mappings that for terms that in different ontologies are represented by the same URI.

The following query returns the list of the graphs where mappings are located:

SELECT DISTINCT ?g WHERE { 
GRAPH ?g { 
?s a <http://protege.stanford.edu/ontologies/mappings/mappings.rdfs#One_To_One_Mapping> . }}

The following RDF/Turtle sample shows an example of a mapping instance:

@prefix maps: <http://protege.stanford.edu/ontologies/mappings/mappings.rdfs#> .

<http://purl.bioontology.org/mapping/2767e8e0-001b-012e-749f-005056bd0010>
    maps:has_process_info <.../procinfo/2008-04-23-38138> ;
    maps:comment "Manual mappings between Mouse anatomy and NCIT." ;
    maps:relation skos:closeMatch ;
    maps:target <http://purl.org/obo/owl/MA#MA_0001096> ;
    maps:source <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Olfactory_Nerve> ;
    maps:source_ontology_id <http://bioportal.bioontology.org/ontologies/1032> ;
    maps:target_ontology_id <http://bioportal.bioontology.org/ontologies/1000> ;
    a maps:One_To_One_Mapping .

The predicates we use to represent a mapping record both the target and source terms and target and source ontologies. A SPARQL query to return all the target terms for a giving source term would look like the following:

PREFIX map: <http://protege.stanford.edu/ontologies/mappings/mappings.rdfs#>
SELECT DISTINCT ?target WHERE {
  ?s map:source <http://purl.obolibrary.org/obo/DOID_8545>;
     map:target ?target .
}

If one wants to count at all the mappings for two given ontologies then the SPARQL would use the target_ontology and source_ontology predicates:

PREFIX map: <http://protege.stanford.edu/ontologies/mappings/mappings.rdfs#>
SELECT (count(?m) as ?c) WHERE {
  ?m map:source_ontology <http://bioportal.bioontology.org/ontologies/1032> ;
      map:target_ontology  <http://bioportal.bioontology.org/ontologies/1000> .
}

In this case this query returns a count of 4393 solutions. It is important to notice that for mappings we use version based ontology URIs and not the ones that are based on acronyms.

Federated SPARQL queries

SPARQL Federation can be used with the Jena ARQ library. The Jena ARQ library handles the SERVICE SPARQL construct and directs sets of triple patterns to different endpoint and handles the joins.

BioPortal offers two different SPARQL endpoints one for ontologies (metadata and terms) and a second one for mappings. There are use cases that require queries with joins from both endpoints. This can be achieved by using SPARQL federation and the SERVICE feature (SPARQL 1.1). The following example looks at the CSP term 'Neck' and retrieves the sources for all the mappings for that term (from the mapping endpoint). For each of those sources then gets the parents (from the ontologies endpoint). Notice the change in the SERVICE endpoints from 'mappings' in the first block to 'ontologies' in the second one.

PREFIX map: <http://protege.stanford.edu/ontologies/mappings/mappings.rdfs#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?mappedParent WHERE {
    SERVICE <http://sparql.bioontology.org/mappings/sparql/?apikey=YOUR API KEY HERE> {
        ?mapping map:target <http://purl.bioontology.org/ontology/CSP/0468-5952> .
        ?mapping map:source ?source .
    }
    SERVICE <http://sparql.bioontology.org/ontologies/sparql/?apikey=YOUR API KEY HERE> {
        ?source rdfs:subClassOf ?mappedParent .
    }
}

Full example: https://github.com/ncbo/sparql-code-examples/blob/master/java/src/org/ncbo/stanford/sparql/examples/JenaARQFederationExample.java

Note SPARQL Federation can be used programmatically only. Our web front-end at http://sparql.bioontology.org/ does not support mediation for the SERVICE clause. Our tests for SPARQL federation use Jena ARQ 2.9.4, this version is able to correctly submit the BioPortal API keys.

Partial or Incomplete Results

sparql.bioontology.org uses 4store's soft-limit internal mechanism to limit resources for expensive queries. Our setup is configured to bind 8K elements per triple pattern. If you hit these limits a warning message will be appended to the query response. This message says something like: "hit complexity limit 8 times". If you see this warning it means that the results are incomplete, and probably there is a more efficient way to write that query.

Contact our support mail list if you need help to rewrite your query in a more efficient way and avoid incomplete results.

Slides