Annotator Optimizing and Troublehooting

From NCBO Wiki
Revision as of 19:50, 7 February 2017 by Graybeal (talk | contribs) (created page, with best advice conceivable...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

If you have a lot of material to annotate, or are having trouble getting your annotation jobs to run, this page provides some tips on making things better.

Caveat: The information in this page was produced by analysis, and may not reflect real-world experience. Your experience and test results are welcome to improve this content.

Optimizing the Annotator

Let's assume you have thousands or tens of thousands (or more!) of full page text articles, for which you wish to retrieve and parse the annotations. Assuming it takes about 20 second to parse and organize the response for each article, what's the right way to organize and optimize the job for the Annotator?

Annotation API Optimizations and Recommendations

BIoPortal has a fair amount or compute power, but it can definitely be overwhelmed by requests if there are enough of them. (This is especially true for the Recommender, which has maybe 100 times the compute requirements of the Annotator.) Therefore organizing the API requests appropriately is important for you, and for other BioPortal users. If the constraints suggested by this article mean your annotations will take too long, then you will need to set up a BioPortal Virtual Appliance in your own organization to accomplish your goals.

Optimizing Your Query We'll assume that you want to make your query execute as quickly as possible. How can you set it up to make that happen?

The key adjustments to improve efficiency are (a) select ontologies to annotate with; (b) set options to minimize the number of annotations; and (c) as a special case, select specific UMLS semantic type to annotate with.

By selecting specific ontologies that you want to use in the annotation, you will reduce the processing speed considerably, in comparison to the very large set of ontologies the Annotator will check otherwise.

In the API, the only option that needs to be changed for optimal results is exclude_synonyms and longest_only; the other settings are already appropriate. (In the UI, options to check to minimize the number of annotations are Match Longest Only (excluding shorter phrases within long ones), Exclude Synonyms, and Exclude Numbers. Other options should remain unset, and the Match Ancestors option should be none.) Of course, these changes limit the annotations you will get, so they must be configured to meet your needs.

Setting the UMLS semantic type further constrains the items that are mapped, but this produces a very specific result.

Identifying Your Rate Limit Appreciate that BioPortal has a rate limit on API requests for a particular key; this is currently set to 15 requests per second. However, this limit does not reflect BioPortal's ability to process complex requests, and submitting 15 annotation requests of one page each might be enough to slow down or stop BioPortal over time, not to mention inconveniencing many other users.

To stay within BioPortal's capacity, we recommend that you monitor the response time for your requests, and adjust the flow of requests to keep that response time close to normal. Submit ten different annotation requests, each one after the previous one's results are received. Measure the average time for a single annotation request.

Now, try submitting more multiple annotation requests in a batch, starting with a small number and going up to 15 per second. Submit each of your 10 batches only after the previous batch is fully processed. See if the average response time for the annotations in these 10 batches is significantly slower. If it isn't, you can double the batch size and try the experiment again. Once the average response goes up by, say, more than 50% (estimate), you are likely to be loading BioPortal faster than it is processing your requests. If you keep placing your requests at that right, BioPortal will eventually run out of resources, and until then all other users will be heavily impacted.

Note that for particularly big requests, like a 20-page document in the Recommender, BioPortal might not even be able to process one request per second without using up its resources, and you will have to spread out the requests further.

On the other hand, if you have trivial requests, it might be that BioPortal can keep up with no problem. In this case, using 2 API keys to submit requests faster than 15/second might be acceptable, but please consult with the BioPortal staff before doing this.

Note that initially (in 2009, see [1], the service responded on average in 1.8 seconds for a mean input word count of 180 words, and in 2.3 seconds for the mean input word count of 280 words. When simulating 10 simultaneous users, the response time was between 4.5 and 5.0 seconds for 280 words. These numbers are likely to be significantly faster today.

Annotation API Troubleshooting

If you are getting errors when using the API, a simple first test is to try accessing the UI version of the Annotator, and see if it works. If it works with the sample text, try it with your text, to make sure the specific text does not excite an issue referencing a particular ontology.

If the UI version does not work with the sample text, send a report to support@bioontology.org. If it's the weekend, or you are particularly eager, try the UI version after some time (15 minutes, or an hour) has passed; sometimes the system will recover itself, or we will go on-line and see the issue.

If the UI version works with the sample text, but not your text, there may be an ontology that is failing. Try performing the annotations with a single ontology; if that works, it indicates an ontology issue, which will probably require expert support to fix. Again, send a report to support@bioontology.org, and consider whether you want to troubleshoot by looking for failing ontology(ies) with a divide by 2 strategy of ontology selection.

If the UI version is fully working, but your API call does not, try the call using curl or via https in the browser (note the sample https string at the end of the returned content), verifying your API is the same as the one in your BioPortal profile. If this works, it suggests your code may have an issue in presenting the API call to BioPortal, or that BioPortal may be rate-limiting your requests. You can test the latter by spreading our your queries, perhaps echoing each one to the console to confirm the expected rate.

If a single curl or https call does not work. look at the returned error code for a clue. These error codes should accurately reflect the reason the system has rejected the request. If you do not understand the error code, please contact us via support@bioontology.org. Even if we do not see your post, someone else may see it and offer advice.

References

[1] Shah et al BMC Bioinformatics. 2009 Sep 17;10 Suppl 9:S14 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2745685/)