Difference between revisions of "Degrees of Annotation"

From NCBO Wiki
Jump to navigation Jump to search
 
Line 4: Line 4:
 
In the context of the bio* domain, annotation referes to some descriptive text that assists in the interpretation of the primary information [about an item] at hand. There can be various levels of annotation for example:
 
In the context of the bio* domain, annotation referes to some descriptive text that assists in the interpretation of the primary information [about an item] at hand. There can be various levels of annotation for example:
  
* If the sequence trace coming out of a sequencer is the primary information then the assignment of "labels" such as 'start site', 'exon', 'intron' to different parts of the sequence is an annotation.
+
* If the assembled genome coming out of a sequence assembly program is the primary information then the assignment of "labels" such as 'start site', 'exon', 'intron' to different parts of the sequence is an annotation.
  
 
* At the second level this annotation itself [the statement about the boundries of 'start site', 'exon', 'intron'] is a conceptual item called a gene. Which is further "annotated" to have a particular 'function' or 'location' in the cell.
 
* At the second level this annotation itself [the statement about the boundries of 'start site', 'exon', 'intron'] is a conceptual item called a gene. Which is further "annotated" to have a particular 'function' or 'location' in the cell.
Line 11: Line 11:
  
 
so defining the boundry between an information item (or data) and an annotation is dependent on the use case.
 
so defining the boundry between an information item (or data) and an annotation is dependent on the use case.
 +
 +
Note from cjm: I have minor disagreements with most of the above. I think in the bio* domain annotations are much more than descriptive text. It is typically highly structured data. I think this is glossing over a lot - from base calling through assemblies and compute pipelines (the results of which are sometimes called annotations, although the word is typically only used for human-curated data). As well as simply labeling, there is a lot more going on with annotating intron-exon structure before functional annotation even starts. Why are genes conceptual items and exons aren't? Not sure what the final point means. But yes, the basic idea that annotation refers to different things at different levels is correct, so we should ban use of the word annotation as it is inherently ambiguous.

Latest revision as of 15:15, 15 May 2006

"Annotation" has various definitions. One of them (from wikipedia) is - Annotation is information associated with a particular point [or item] in a document or other piece of information.

In the context of the bio* domain, annotation referes to some descriptive text that assists in the interpretation of the primary information [about an item] at hand. There can be various levels of annotation for example:

  • If the assembled genome coming out of a sequence assembly program is the primary information then the assignment of "labels" such as 'start site', 'exon', 'intron' to different parts of the sequence is an annotation.
  • At the second level this annotation itself [the statement about the boundries of 'start site', 'exon', 'intron'] is a conceptual item called a gene. Which is further "annotated" to have a particular 'function' or 'location' in the cell.
  • At a still higher level this annotation [about the function or location] is an item that is "annotated" to be relevant only in a certain genetic or experimental context.

so defining the boundry between an information item (or data) and an annotation is dependent on the use case.

Note from cjm: I have minor disagreements with most of the above. I think in the bio* domain annotations are much more than descriptive text. It is typically highly structured data. I think this is glossing over a lot - from base calling through assemblies and compute pipelines (the results of which are sometimes called annotations, although the word is typically only used for human-curated data). As well as simply labeling, there is a lot more going on with annotating intron-exon structure before functional annotation even starts. Why are genes conceptual items and exons aren't? Not sure what the final point means. But yes, the basic idea that annotation refers to different things at different levels is correct, so we should ban use of the word annotation as it is inherently ambiguous.