Difference between revisions of "Annotator User Guide"
Line 115: | Line 115: | ||
|width="10%"|{xml,text,tabDelimited} | |width="10%"|{xml,text,tabDelimited} | ||
|width="10%"|default: xml | |width="10%"|default: xml | ||
− | |width="70%"|Specifies the desired format of the response from Annotator. For programmatic access, XML is strongly suggested | + | |width="70%"|Specifies the desired format of the response from Annotator. For programmatic access, XML is strongly suggested. |
− | + | * ''xml'': returns XML representation of the annotatorResultBean. | |
+ | * ''text'': returns plain text representation of the annotatorResultBean. | ||
+ | * ''tabDelimited'': shorter version of "Text" format. returns not the full result content but the annotations only (no statistics, etc.). The format of the tab delimited file is: score \t conceptId \t preferredName \t synonyms (separated by ' /// ') \t semanticType (separated by ' /// ') \t contextName \t isDirect \t other context information (e.g., childConceptId, mappedConceptId, level, mappingType) (separated by ' /// '). | ||
− | + | The elements of the Annotator response are described in next section. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|} | |} | ||
+ | ===Annotator Web Service Response=== | ||
'''Response Content: annotatorResultBean''' | '''Response Content: annotatorResultBean''' |
Revision as of 15:39, 8 January 2010
Sample HTTP Client for the Annotator
HTML http://rest.bioontology.org/test_oba.html
Annotator web service endpoint
POST your requests at http://rest.bioontology.org/obs/annotator?email=example@example.org
Annotator Web Service Workflow
The Annotator Web service’s workflow is composed of two main steps (figure):
First, direct annotations are created from raw text. Annotations are based on syntactic concept recognition using a dictionary compiled from terms (concept names and synonyms) pulled from the ontologies. The Annotator enables the selection of ontologies from one of the largest sets of available biomedical ontologies. We implemented the service using the 98 English ontologies in UMLS 2008AA and a subset of the BioPortal ontologies (122 as of this writing). These ontologies provide a dictionary of 4,222,921 concepts and 7,943,757 terms.
In the second step, semantic expansion components leverage the semantics in ontologies (e.g., is_a relations and mappings) to create additional annotations. For example, the is_a transitive closure component traverses an ontology parent-child hierarchy to create new annotations with parent concepts of concepts in direct annotations. The ontology-mapping component creates new annotations based on existing mappings between different ontologies. Point-to-point mappings that link concepts one another are defined manually or by automatic algorithms in the UMLS Metathesaurus and in BioPortal.
The is_a transitive closure expansion component
An is_a transitive closure component traverses an ontology parent-child hierarchy to create new annotations with parent concepts. For instance, if data are annotated with a concept from the NCI Thesaurus, such as melanoma, this component generates a new annotation with the concept skin neoplasm, because the NCI Thesaurus provides the knowledge that melanoma is_a skin neoplasm. The Annotator uses the is_a relations as they are defined by the repositories: UMLS is_a relations are extracted from the MRHIER table. BioPortal is_a relations are extracted from the <SubClass>/<SuperClass> information for a given concept (accessed via REST web service).
The mapping expansion component
An ontology-mapping component creates new annotations based on existing mappings between different ontologies. For example, an annotation done with concept NCI/C0025202 (melanoma) in NCI Thesaurus can be expanded to another one with in SNOMED-CT because the UMLS Metathesaurus provides the mapping information. The Annotator uses the mapping as they are defined by the repositories: UMLS mappings come both from the CUI information and the MRREL table. BioPortal mappings are defined in the BioPortal backend and accessed in an ad-hoc manner.
Annotator Web Service Parameters
The Annotator web service offers a set of parameters that allows a user to customize the Annotator workflow and filter the result. To customize the workflow and the result, the user can specify a set of ontologies and a specific set of semantic types. Plus, the two steps of the annotation workflow can be parametrized.
The Annotator web service response time depends on the selected components as each consumes resources at a different level. For example, the is_a transitive closure takes a long time to process, even when using a pre-computed hierarchy table. As another example, an annotation with wholeWordOnly=false will be significantly longer that with wholeWordOnly=true.
Please see below for the list of parameters and the possible values.
longestOnly | {true, false} | default: false | Specifies either or not the concept recognition step (done with University of Michigan Mgrep tool) must match the longest words only if they are several concepts that match to an expression.
|
wholeWordOnly | {true, false} | default: true | Specifies whether the concept recognition step must match whole words only or not, if they are several concepts that match to a given word.
|
stopWords | {stopWord1,...,stopWordN} | default: empty (i.e. none) | Specifies the list of stop words to use. |
withDefaultStopWords | {true, false} | default: false | Specifies whether to use the default stop words or not. The default stop word list is available from sample HTML page. If set to true, this override the value of stopWords given by the user. |
scored | {true, false} | default: true | Specifies either or not the annotations are scored. A score is a number assigned to an annotation that reflects the accuracy of the annotation. The higher the score is the better the annotation is. The scoring algorithm gives a specific weight to an annotation according to the context of this annotation. For instance, an annotation done by matching a concept preferred name will be given a higher weight than an annotation done by matching a concept synonym or than an annotation done with a parent level 3 in the is_a hierarchy. Details on the scoring algorithm are given in section Scoring algorithm.
|
ontologiesToExpand | {localOntology1,...,localOntologyN} | default: empty (i.e. all ontologies) | Specifies the list of ontologies to use with the mapping semantic expansion component. The list of ontologies that can be used is available in the sample HTML page. The values are separated with comma (without spaces).
|
ontologiesToKeepInResult | {localOntology1,...,localOntologyN} | default: empty (i.e. all ontologies) | Specifies the list of ontologies to keep in the result of the annotation process. The list of ontologies that can be used is available in the sample HTML page. The values are separated with comma (without spaces).
|
semanticTypes | {semanticType1,...,semanticTypeN} | default: empty (i.e. all semanticTypes) | Specifies the list of semantic types to use in the annotation process. The list of semantic types that can be used is available in the sample HTML page. The values are separated with comma (without spaces). Note that the restriction to semantic types is also applied during the semantic expansion steps.
|
levelMax | {integer} | default: 0 | Specifies the maximum level a parent concept must have to be considered for the is_a semantic closure expansion step.
|
mappingTypes | {null,mappingType1,...,mappingTypeN} | default: empty (i.e. all mappingTypes) | Specifies the list of mapping type to use during the mapping expansion step. The list of semantic types that can be used is available in the sample HTML page. The values are separated with comma (without spaces). The current list is described hereafter.
|
textToAnnotate | Specifies the text to be annotated. | ||
format | {xml,text,tabDelimited} | default: xml | Specifies the desired format of the response from Annotator. For programmatic access, XML is strongly suggested.
The elements of the Annotator response are described in next section. |
Annotator Web Service Response
Response Content: annotatorResultBean
resultId | |||||||||||||
dictionary | Dictionary contains the metadata (not the content) of the dictionary used for a result. dictionaryId, dictionaryName, and dictionaryDate identify the dictionary on the server side and give information about its content. Dictionary versioning is strongly linked to the evolution of the ontologies used. Each time ontologies change, the dictionary is updated. All the dictionary information may be useful for comparing results of the Annotator Restlet service on time. | ||||||||||||
statistics | Statistics contains information on the number of annotations done for a given context. The contextName keyword identifies the type of context and nbAnnotation is the number of annotations of this type. | ||||||||||||
parameters | Parameters summarizes all the parameters specified by the user when requesting the Annotator Restlet service. Those parameters are described in section Service parameters | ||||||||||||
ontologies | To keep the model simple, we provide only the global ontology identifier, localOntologyId the name (ontologyName) and version (ontologyVersion). This information come from the original repositories (UMLS/BioPortal) and might help the user to select the right ontology to use. When an ontology is used in the annotation, a result has a set of OntologyUsed which specify 2 other properties: nbAnnotation, the number of annotation that have been made with concepts from this ontology. score, the sum of all the scores of the annotations done with concepts from this ontology (if parameter scored=true). Therefore, score represents the most accurate ontology to annotate the given text. | ||||||||||||
annotations | Annotation is a representation of one annotation. An annotation has a score which represents the accuracy of the annotation computed by the scoring algorithm (if the scored=true parameter was chosen, otherwise score=-1). An annotation is done with a concept in a context.
|