Difference between revisions of "Annotator User Guide"

From NCBO Wiki
Jump to navigation Jump to search
(Replaced content with "This service has been replaced. Instead please see: http://data.bioontology.org/documentation#nav_annotator")
 
Line 1: Line 1:
=== Annotator Web service Validation ===
+
This service has been replaced. Instead please see:
* '''<font color='red'>Note</font>''': All NCBO REST Web services will be required to contain the parameter "apikey=YourApiKey" starting June 2011. The parameter will be added to all Web service calls for the April 27, 2011 release but will not be required until June 2011. To obtain an API key, login to BioPortal and go to "Account" where your API key will be displayed. The addition of the API key replaces the use of the email parameter.
 
  
===Sample Clients for the Annotator===
+
http://data.bioontology.org/documentation#nav_annotator
 
 
HTML Client: http://rest.bioontology.org/test_oba.html
 
 
 
Code examples: http://www.bioontology.org/wiki/index.php/Annotator_Client_Examples
 
 
 
===Annotator Web service endpoint===
 
 
 
* Web service signature: http://rest.bioontology.org/obs/annotator?email=example@example.org
 
* Type: POST
 
 
 
=== Annotator Web Service Workflow===
 
 
 
The Annotator Web service’s workflow is composed of two main steps (figure):
 
 
 
[[Image:OBA_service_workflow.png|thumb|NCBO Annotator Web service workflow]]
 
 
 
First, direct annotations are created from raw text. Annotations are based on syntactic concept recognition using a dictionary compiled from terms (concept names and synonyms) pulled from the ontologies.  The Annotator enables the selection of ontologies from one of the largest sets of available biomedical ontologies. We implemented the service using a subset of the BioPortal ontologies which includes 16 ontologies from UMLS (12 in RRF format, 2 in OBO format, 2 in OWL format). These ontologies provide a dictionary of 1,594,785 concepts and 3,200,654 terms.
 
 
 
In the second step, semantic expansion components leverage the semantics in ontologies (e.g., is_a relations and mappings) to create additional annotations. For example, the is_a transitive closure component traverses an ontology parent-child hierarchy to create new annotations with parent concepts of concepts in direct annotations. The ontology-mapping component creates new annotations based on existing mappings between different ontologies. Point-to-point mappings that link concepts one another are defined manually or by automatic algorithms in BioPortal.
 
 
 
==== The is_a transitive closure expansion component ====
 
An is_a transitive closure component traverses an ontology parent-child hierarchy to create new annotations with parent concepts. For instance, if data are annotated with a concept from the NCI Thesaurus, such as ''melanoma'', this component generates a new annotation with the concept ''skin neoplasm'', because the NCI Thesaurus provides the knowledge that ''melanoma'' is_a ''skin neoplasm.''
 
The Annotator uses the is_a relations as they are defined by BioPortal in the <SubClass>/<SuperClass> information for a given concept (accessed via REST web service).
 
 
 
==== The mapping expansion component ====
 
An ontology-mapping component creates new annotations based on existing mappings between different ontologies. For example, an annotation done with concept ''40644/Melanoma'' in NCI Thesaurus can be expanded to another one with in SNOMED-CT because the BioPortal provides the mapping information.
 
The Annotator uses the mapping as they are defined by the repositories: BioPortal mappings are defined in the BioPortal front-end and accessed in an ad-hoc manner.
 
 
 
=== Annotator Web Service Parameters===
 
 
 
*See http://www.bioontology.org/wiki/index.php/BioPortal_2.3_Release_Notes for changes in Annotator output and updated information on how to use UMLS ontologies with the Annotator. 
 
 
 
The Annotator Web service offers a set of parameters that allows a user to customize the Annotator workflow and filter the result. To customize the workflow and the result, the user can specify a set of ontologies and a specific set of semantic types. Plus, the two steps of the annotation workflow can be parametrized.
 
 
 
The Annotator Web service response time depends on the selected components as each consumes resources at a different level. For example, the ''is_a'' transitive closure takes a long time to process, even when using a pre-computed hierarchy table. As another example, an annotation with ''wholeWordOnly=false'' will be significantly longer that with ''wholeWordOnly=true''.
 
 
 
Please see below for the list of parameters and the possible values.
 
 
 
 
 
{| border="1" cellpadding="2"
 
|-valign="top"
 
|width="10%"|'''longestOnly'''
 
|width="10%"|{true, false}
 
|width="10%"|default: false
 
|width="70%"|Specifies whether or not the entity recognition step (done with University of Michigan Mgrep tool) must match the longest word only if they are several terms that match to an expression.
 
 
 
* If ''longestOnly=true'', the Annotator selects only the longest term matching phrase. For example, if ''longestOnly=true'', the phrase ''breast cancer'' returns only the result ''breast cancer.'' If ''longestOnly=false'', it generates three annotations: ''breast'', ''cancer'' and ''breast cancer.''
 
 
 
* Attention, if the Annotator finds a match  to the complete phrase in an ontology within the dictionary, partial annotations with other ontologies will not be generated.  For example, because ''breast cancer'' exists in Human Disease and the NCI Thesaurus, if ''longestOnly=true'', annotations generated with those terms will block annotations with the terms ''breast'' in the Vaccine Ontology or ''cancer'' in BIRNLex. For this reason, we discourage the uses of the '''longestOnly''' parameter when the Annotator is used with one or a small number of ontologies.
 
 
 
|-valign="top"
 
|width="10%"|'''wholeWordOnly'''
 
|width="10%"|{true, false}
 
|width="10%"|default: true
 
|width="70%"|Specifies whether the term recognition step must match whole words only or not, if they are several terms that match to a given word from the input text.
 
* For example: If ''wholeWordOnly=true'', the phrase 'neoplasms'  will match the concept NCI/C0027651 (Neoplasms) only. If ''wholeWordOnly=false'', the term http://bioportal.bioontology.org/ontologies/47638/?p=terms&conceptid=Siemens with synonym "S" or the term http://bioportal.bioontology.org/ontologies/47638/?p=terms&conceptid=Aggressive_Systemic_Mastocytosis with synonym "ASM" will also match (~80 terms in NCI).
 
* Note that the concept recognition step does not consider text cast.
 
 
 
|-valign="top"
 
|width="10%"|'''filterNumber'''
 
|width="10%"|{true, false}
 
|width="10%"|default: true
 
|width="70%"|Specifies whether the entity recognition step to filter numbers or not.
 
 
 
|-valign="top"
 
|width="10%"|'''stopWords'''
 
|width="10%"|{stopWord1,...,stopWordN}
 
|width="10%"|default: empty (i.e. none)
 
|width="70%"|Specifies the list of words to exclude from matching. 
 
|-valign="top"
 
|width="10%"|'''withDefaultStopWords'''
 
|width="10%"|{true, false}
 
|width="10%"|default: false
 
|width="70%"|Specifies whether or not to use the default stop words. The default stop word list is available from this Web service call: http://rest.bioontology.org/obs/stopwords?apikey=YourAPIKey. If this parameter is set to true, this will override any stop words provided by the user in the parameter "stopWords".
 
|-valign="top"
 
|width="10%"|'''isStopWordsCaseSenstive  '''
 
|width="10%"|{true, false}
 
|width="10%"|default: false
 
|width="70%"|Specifies whether stop words are case-sensitive or not.
 
|-valign="top"
 
|width="10%"|'''minTermSize'''
 
|width="10%"|{integer}
 
|width="10%"|default: 3
 
|width="70%"|Specifies the minimum length of the term to be included in the annotations. 
 
|-valign="top"
 
|width="10%"|'''scored'''
 
|width="10%"|{true, false}
 
|width="10%"|default: true
 
|width="70%"|Specifies whether or not the annotations are scored. A score is a number assigned to an annotation that reflects the accuracy of the annotation. The higher the score is the better the annotation is. The scoring algorithm gives a specific weight to an annotation according to the context of this annotation. For instance, an annotation done by matching a concept preferred name will be given a higher weight than an annotation done by matching a concept synonym or than an annotation done with a parent level 3 in the is_a hierarchy. Details on the scoring algorithm are given in section Scoring algorithm.
 
* For example, the phrase 'melanoma' is annotated both with the term http://bioportal.bioontology.org/ontologies/47638/?p=terms&conceptid=Melanoma "melanoma" and the term http://bioportal.bioontology.org/ontologies/47638/?p=terms&conceptid=Mouse_Melanoma "Mouse Melanoma". The former annotation is scored 10 where as the latter is scored 8.
 
|-valign="top"
 
|width="10%"|'''withSynonyms'''
 
|width="10%"|{true, false}
 
|width="10%"|default: true
 
|width="70%"|Specifies whether or not the direct annotations are generated based on term synonyms. By default it includes all the synonyms and preferred name of a term. If 'false' is selected, the direct annotations are done with only the preferred name.
 
|-valign="top"
 
|width="10%"|'''ontologiesToExpand'''
 
|width="10%"|{localOntology1,...,localOntologyN}
 
|width="10%"|default: empty (i.e. all ontologies)
 
|width="70%"|Specifies the list of ontologies to use with the mapping semantic expansion component. The list of ontologies that can be used is available in the sample HTML page. The values are separated with comma (without spaces).
 
* For example, 1353,1032,1351 (these numbers represent the virtual ontology identifier, which is stable across different versions of the ontology).
 
* '''NOTE''': The list of ontologies that are available via the Annotator Web service are those listed with <status>28</status> from this Web service call: http://rest.bioontology.org/obs/ontologies
 
|-valign="top"
 
|width="10%"|'''ontologiesToKeepInResult'''
 
|width="10%"|{localOntology1,...,localOntologyN}
 
|width="10%"|default: empty (i.e. all ontologies)
 
|width="70%"|Specifies the list of ontologies to keep in the result of the annotation process. The list of ontologies that can be used is available in the sample HTML page. The values are separated with comma (without spaces).
 
* For example, 1353,1032,1351 (these numbers represent the virtual ontology identifier, which is stable across different versions of the ontology).
 
* '''NOTE''': The list of ontologies that are available via the Annotator Web service are those listed with <status>28</status> from this Web service call: http://rest.bioontology.org/obs/ontologies?apikey=YourAPIKey
 
|-valign="top"
 
|width="10%"|'''isVirtualOntologyId'''
 
|width="10%"|{true, false}
 
|width="10%"|default: false
 
|width="70%"|Specifies whether the input values are the virtual ontology identifier instead versus the ontology version identifier. The virtual ontology identifier is stable across all versions of the ontology. '''''It is strongly recommended that this parameter be set to true.'''''
 
|-valign="top"
 
|width="10%"|'''semanticTypes'''
 
|width="10%"|{semanticType1,...,semanticTypeN}
 
|width="10%"|default: empty (i.e. all semanticTypes)
 
|width="70%"|Specifies the list of semantic types to use in the annotation process. The list of semantic types that can be used is available from http://rest.bioontology.org/obs/semanticTypes?apikey=YourAPIKey. The values are separated with comma (without spaces). Note that the restriction to semantic types is also applied during the semantic expansion steps.
 
* For example, T047,T048,T191.
 
|-valign="top"
 
|width="10%"|'''levelMax'''
 
|width="10%"|{integer}
 
|width="10%"|default: 0
 
|width="70%"|Specifies the maximum level a parent concept must have to be considered for the is_a semantic closure expansion step.
 
* For example, a call done with ''levelMax=3'' will expand a direct annotations done with a concept up to the 3rd level parent in the is_a hierarchy for this concept. A call done with ''levelMax=0'' is equivalent to disable the is_a transitive closure expansion step.
 
|-valign="top"
 
|width="10%"|'''mappingTypes'''
 
|width="10%"|{null,mappingType1,...,mappingTypeN}
 
|width="10%"|default: empty (i.e. all mappingTypes)
 
|width="70%"|Specifies the list of mapping type to use during the mapping expansion step. The list of semantic types that can be used is available from http://rest.bioontology.org/obs/semanticTypes?apikey=YourAPIKey. The values are separated with comma (without spaces). The current list is described hereafter.
 
* For example, from-mrrel,Human.
 
* Note that the use of the key word "'''null'''" in the '''mappingTypes''' list disables the mapping expansion component. Note also that the mapping expansion is also limited by other parameters such as ''ontologiesToExpand'' and ''ontologiesToKeepInResult''.
 
* The current list of mapping type is:
 
** ''inter-cui'': This mapping type corresponds to the mappings available between CUIs in the UMLS Metathesaurus. For instance, NCI/C0025202 and MSH/C0025202 are mapped together because the share the same CUI in UMLS.
 
** ''from-mrrel'': This mapping type corresponds to the mappings available in the MRREL table from UMLS.
 
** ''Automatic'': This mapping type corresponds to the mappings automatically imported in BioPortal.
 
** ''Manual'': This relation name corresponds to the mappings created by users in BioPortal.
 
** ''Human'': This relation name corresponds to the mapping s [not used anymore] in BioPortal.
 
 
 
|-valign="top"
 
|width="10%"|'''textToAnnotate'''
 
|width="10%"|
 
|width="10%"|
 
|width="70%"|Specifies the text to be annotated.
 
|-valign="top"
 
|width="10%"|'''format'''
 
|width="10%"|{xml,text,tabDelimited}
 
|width="10%"|default: xml
 
|width="70%"|Specifies the desired format of the response from Annotator. For programmatic access, XML is strongly suggested.
 
 
 
* ''xml'': returns XML representation of the annotatorResultBean.
 
* ''text'': returns plain text representation of the annotatorResultBean.
 
* ''tabDelimited'': shorter version of "Text" format. returns not the full result content but the annotations only (no statistics, etc.). The format of the tab delimited file is: score \t conceptId \t preferredName \t synonyms (separated by ' /// ') \t semanticType (separated by ' /// ') \t contextName \t isDirect \t other context information (e.g., childConceptId, mappedConceptId, level, mappingType) (separated by ' /// ').
 
 
 
The elements of the Annotator response are described in next section.
 
 
 
|}
 
 
 
===Annotator Web Service Response===
 
 
 
In this section we define the Annotator Web service model i.e., what the service returns to the user: the objects as well as their relations and the constraints that applies.
 
 
 
'''Response Content: annotatorResultBean'''
 
{| border="1" cellpadding="2"
 
|-valign="top"
 
|width="10%"|'''resultId'''
 
|width="90%"| The id of the annotatorResultBean.
 
|-valign="top"
 
|width="10%"|'''dictionary'''
 
|width="90%"| Dictionary contains the metadata (not the content) of the dictionary used for a result. ''dictionaryId'', ''dictionaryName'', and ''dictionaryDate'' identify the dictionary on the server side and give information about its content. Dictionary versioning is strongly linked to the evolution of the ontologies used. Each time ontologies change, the dictionary is updated. All the dictionary information may be useful for comparing results of the Annotator service on time.
 
|-valign="top"
 
|width="10%"|'''statistics'''
 
|width="90%"| Statistics contains information on the number of annotations done for a given context. The ''contextName'' keyword identifies the type of context and ''nbAnnotation'' is the number of annotations of this type.
 
|-valign="top"
 
|width="10%"|'''parameters'''
 
|width="90%"| Parameters summarizes all the parameters specified by the user when requesting the Annotator service. Those parameters are described in previous section.
 
|-valign="top"
 
|width="10%"|'''ontologies'''
 
|width="90%"| To keep the model simple, we provide only the global ontology identifier, ''localOntologyId'' the name (''ontologyName'') and version (''ontologyVersion''). This information come from the original repositories (UMLS/BioPortal) and might help the user to select the right ontology to use. When an ontology is used in the annotation, a result has a set of OntologyUsed which specify 2 other properties: ''nbAnnotation'', the number of annotation that have been made with concepts from this ontology. ''score'', the sum of all the scores of the annotations done with concepts from this ontology (if parameter ''scored=true''). Therefore, score represents the most accurate ontology to annotate the given text.
 
|-valign="top"
 
|width="10%"|'''annotations'''
 
|width="90%"| Annotation is a representation of one annotation. An annotation has a ''score'' which represents the accuracy of the annotation computed by the scoring algorithm (if the scored=true parameter was chosen, otherwise score=-1). An annotation is done with a ''concept'' in a ''context''.
 
 
 
{| border="2" <!-- The nested table must be on a new line -->
 
|-valign="top"
 
|width="10%"|'''score'''
 
|width="90%"| The scoring algorithm gives weight to each annotation according to the annotation context. The final score for an annotation is then calculated as the sum of all the scores of the annotations done with the same concept. Direct annotation done with a concept preferred name=10, Direct annotation done with a concept synonym=8, Expanded annotation done with a mapping=7, Expanded annotation done with a parent level n(e.g., 9 for n=1; 7 for n=2; 4 for n=5; 3 for n=8; 1 for n>12) = 1+10.e-0.2*n. Read the [http://www.ncbi.nlm.nih.gov/pubmed/20626921 OntologyRecommender] paper for more details.
 
 
 
|-valign="top"
 
|width="10%"|'''concept'''
 
|width="90%"| ''Concept'' is a representation of an ontology concept in the Annotator web service ontology model
 
* ''localConceptId'' - global identifier for the concept in its original repository.
 
* ''localOntologyId'' - identifier for the ontology in which the concept is defined.
 
* ''isTopLevel'' - specifies if the concept is a root concept in its ontology.
 
* ''preferredName'' - label or preferred term for this concept (as assigned by the original repository).
 
* ''synonyms'' - the set of possible terms that represent the concept but are not preferred.
 
* ''semanticTypes'' - the set of the semantic types of the concept (assigned by UMLS + T000 and T999).
 
|-valign="top"
 
|width="10%"|'''context'''
 
|width="90%"| ''Context'' specifies if it is a direct or expanded annotation and give precision about the origin of the annotation. ''contextName'' identifies the type of context. The context properties vary with the type of concept. There are 3 possible contexts identified by their contextName:
 
{| border="2" style="background:#ABCDEF;" <!-- The nested table must be on a new line -->
 
|-valign="top"
 
|width="10%"|'''MGREP'''
 
|width="90%"| represents direct annotations done with the Mgrep concept recognizer. A Mgrep context has 3 properties:
 
* ''termName'' - the expression (preferred name or synonyms) that was matched by Mgrep.
 
* ''from'' and ''to'' - specify the character index in the given text for the matched expression. Attention, those values are in number of bytes, not characters e.g., ½ is a two-byte character, thus will count for 2 characters.
 
|-valign="top"
 
|width="10%"|'''ISA_CLOSURE'''
 
|width="90%"| represents  expanded annotations done with the ''is_a'' transitive closure expansion component. A ISA_CLOSURE context has 2 properties:
 
* ''childConceptId'' - the concept from which the annotation was derived.
 
* ''level'' - the distance in the is_a hierarchy between the annotating concept and the concept from which the annotation was derived.
 
* For example, if a direct annotation with NCI/C0025202 (melanoma) was done, the is_a transitive closure component may expand it to another annotation with NCI/C1302746 (Melanocytic Neoplasm) because the latter is a direct parent (i.e., level 1) concept of the former. The ISA_CLOSURE annotation generated will have the following properties {NCI/C0025202, 1}.
 
|-valign="top"
 
|width="10%"|'''MAPPING'''
 
|width="90%"| represents expanded annotations done with the mapping expansion component. A MAPPING context has 2 properties:
 
* mappedConceptId identifies the concept from which the annotation was derived.
 
* mappingType specifies the type of mapping.
 
|}
 
|}
 
|}
 

Latest revision as of 14:46, 3 March 2014

This service has been replaced. Instead please see:

http://data.bioontology.org/documentation#nav_annotator