Difference between revisions of "Resource Index Dataset Workflow Howto"

From NCBO Wiki
Jump to navigation Jump to search
(New page: == Introduction == The Resource Index is a system for automated ontology-based annotation and indexing of biomedical data. The application processes the textual metadata of diverse elemen...)
 
(Replaced content with "As of Virtual Appliance v2.2, populating the Resource Index Dataset is no longer supported. A separate Resource Index Virtual Appliance may be released at some point.")
 
(20 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Introduction ==
+
As of Virtual Appliance v2.2, populating the Resource Index Dataset is no longer supported. A separate Resource Index Virtual Appliance may be released at some point.
 
 
The Resource Index is a system for automated ontology-based annotation and indexing of biomedical data. The application processes the textual metadata of diverse elements of biomedical resources, such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed abstracts, to annotate and index them with terms from ontologies. The workflow that computes the annotations and indices is run from the shell using a provided shell script.
 
 
 
== Running the Resource Index Workflow ==
 
 
 
Execute the following shell commands to build and execute resource index workflow:<br>
 
<pre>
 
[root@ncbobioportal ~]# cd /bioportal/sources/resource_index/workflow/2000/
 
[root@ncbobioportal 2000]# sh ./all.sh
 
</pre>
 
* The script will build the resource index workflow project and create execution environment in the /bioportal/sources/resource_index/workflow/2000/dist/ folder and execute script run.sh
 
* Logs files location:
 
<pre>
 
/bioportal/sources/resource_index/workflow/2000/dist/files/logs/branch1.0/localhost/resource_index
 
</pre>
 
 
 
The application will display its progress in the console as it is running. While it is running, or after it is finished, you can look at the resource_index database to validate that data is actually being processed and written.
 
 
 
=== Database Structure ===
 
The Resource Index Database contains many tables, with a common set of six per processed resource. These resource-specific tables are named with the resource acronym as a prefix. For example, for the WikiPathways resource these are the tables that are populated:
 
* obr_wp_aggregation
 
* obr_wp_annotation
 
* obr_wp_concept_frequency
 
* obr_wp_element
 
* obr_wp_isa_annotation
 
* obr_wp_map_annotation
 
 
 
There is also a common set of tables that are not resource-specific. These include obr_resource, obr_context, obr_dictionary, obr_execution and obr_statistics.
 
 
 
== Initial Population, Ontology Update Population, Resource Update Population ==
 
 
 
=== Initial Population ===
 
The first time you run the Resource Index workflow, it contains no data about any of the ontologies, terms, or resources that you want to process. Therefore, it is necessary to run the entire workflow just to achieve a minimum state in which the API will function. This type of workflow execution will gather all of the ontology data that's available in OBS/Annotator (ontologies with status 28) and will then process all of the resources configured in the build.properties file.
 
    obs.slave.populate=true
 
    obr.table.index.disabled=true
 
 
 
=== Ontology Update ===
 
When Annotator has processed new ontologies, the resource index must be updated to match. Running this workflow type will only update the ontology data, none of the resource-related data will be modified.
 
    obs.slave.populate=true
 
    obr.table.index.disabled=true
 
 
 
=== Resource Update ===
 
From time to time the resources used with in the Resource Index will make new information available. Running this workflow type will process the resources that have been configured in the build.properties file. None of the ontology-related data will be modified.
 
    obs.slave.populate=false
 
    obr.table.index.disabled=false
 
 
 
== Troubleshooting ==
 
* The shell script may not have execution permissions by default. If this is the case, you will get a permissions error and will have to run the following command to change the permissions:
 
<code>
 
chmod +x ./all.sh
 
</code>
 
 
 
[[Category:NCBO Virtual Machine Image]]
 

Latest revision as of 16:07, 14 January 2015

As of Virtual Appliance v2.2, populating the Resource Index Dataset is no longer supported. A separate Resource Index Virtual Appliance may be released at some point.