Degrees of Annotation

From NCBO Wiki
Revision as of 15:15, 15 May 2006 by Cjm2 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

"Annotation" has various definitions. One of them (from wikipedia) is - Annotation is information associated with a particular point [or item] in a document or other piece of information.

In the context of the bio* domain, annotation referes to some descriptive text that assists in the interpretation of the primary information [about an item] at hand. There can be various levels of annotation for example:

  • If the assembled genome coming out of a sequence assembly program is the primary information then the assignment of "labels" such as 'start site', 'exon', 'intron' to different parts of the sequence is an annotation.
  • At the second level this annotation itself [the statement about the boundries of 'start site', 'exon', 'intron'] is a conceptual item called a gene. Which is further "annotated" to have a particular 'function' or 'location' in the cell.
  • At a still higher level this annotation [about the function or location] is an item that is "annotated" to be relevant only in a certain genetic or experimental context.

so defining the boundry between an information item (or data) and an annotation is dependent on the use case.

Note from cjm: I have minor disagreements with most of the above. I think in the bio* domain annotations are much more than descriptive text. It is typically highly structured data. I think this is glossing over a lot - from base calling through assemblies and compute pipelines (the results of which are sometimes called annotations, although the word is typically only used for human-curated data). As well as simply labeling, there is a lot more going on with annotating intron-exon structure before functional annotation even starts. Why are genes conceptual items and exons aren't? Not sure what the final point means. But yes, the basic idea that annotation refers to different things at different levels is correct, so we should ban use of the word annotation as it is inherently ambiguous.