Difference between revisions of "Importing UMLS To Virtual Appliance"

From NCBO Wiki
Jump to navigation Jump to search
(Created page with "Category:NCBO Virtual Appliance <p>The NCBO Virtual Appliance supports [http://www.geneontology.org/GO.format.obo-1_2.shtml OBO] and [http://www.w3.org/TR/owl-features/ O...")
 
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Category:NCBO Virtual Appliance]]
 
[[Category:NCBO Virtual Appliance]]
 +
[[Category:Migrated to GitHub]]
  
<p>The NCBO Virtual Appliance supports [http://www.geneontology.org/GO.format.obo-1_2.shtml OBO] and [http://www.w3.org/TR/owl-features/ OWL] ontology formats but not UMLS in its native form. To bridge this gap, we have developed a project called [https://github.com/ncbo/umls2rdf/ UMLS2RDF] that transforms UMLS ontologies into OWL/RDF.</p>
+
This content has been moved! Please find the new content for the 3.0 version of the Virtual Appliance at our new [https://ontoportal.github.io/documentation/administration OntoPortal Virtual Appliance Administration pages].  
  
<p>UMLS2RDF is a Python script that connects to a UMLS MySQL installation and extracts the UMLS ontologies in a format that the Appliance can work with.</p>
+
In particular '''this content''' is mostly at the '''[https://ontoportal.github.io/documentation/administration/ontologies/handling_umls Submitting UMLS Content page]'''.
 
 
<h3>Install UMLS MySQL</h3>
 
 
 
<p>To import UMLS ontologies, a local installation of the [http://www.nlm.nih.gov/research/umls/implementation_resources/scripts/index.html UMLS MySQL release] needs to be available.  Please refer to the [http://www.nlm.nih.gov/research/umls/new_users/index.html UMLS documentation] for instructions on how to install the UMLS MySQL distribution.</p>
 
 
 
<h3>Install UMLS2RDF</h3>
 
 
 
<ol>
 
<li>First clone the github project:<br/><code>git clone https://github.com/ncbo/umls2rdf/</code></li>
 
<li>Install the MySQL Python driver. We recommend to use <code>pip</code> for this:<br/><code>pip install MySQL-python</code></li>
 
</ol>
 
 
 
<h3>Configure UMLS2RDF</h3>
 
 
 
<p>UMLS2RDF has two configuration files:</p>
 
 
 
<ol>
 
<li><strong>conf.py</strong> where the database configuration (host,name,user and password) needs to be specified. Also the output folder.</li>
 
<li><strong>umls.conf</strong> where one can specified the UMLS ontologies to be extracted. This is a comma separated file with the following 4 fields:
 
<ol type="a">
 
<li>SAB</li>
 
<li>This is legacy. Any value works.</li>
 
<li>Output file name.</li>
 
<li>Conversion strategy. Accepted values (load_on_codes, load_on_cuis).</li>
 
</ol>
 
</li>
 
</ol>
 
 
 
 
 
<p>With <em>load_on_codes</em> the original source of the ontology will be used as strategy. The Class IDs will be constructed with the MRCONSO.CODE field. If <em>load_on_cuis</em> is selected then the strategy to transform the ontology will use CUIs to construct the Class IDs.</p>
 
 
 
<p>In our [https://github.com/ncbo/umls2rdf/blob/master/umls.conf configuration file], you can see the settings used by our production system. These are all the UMLS ontologies that are publicly available in BioPortal.</p>
 
 
 
<h3>Run UMLS2RDF</h3>
 
 
 
<p>Once the configuration files have the settings run the command:</p>
 
 
 
<p><code>python umls2rdf.py</code></p>
 
 
 
<p>Depending on how many ontologies are extracted the run time can range from a few minutes to four hours. This process is memory intensive and to transform the largest UMLS ontologies (i.e: SNOMED) one needs at least 16G RAM available.</p>
 
 
 
<h3>Upload files to the NCBO Virtual Appliance</h3>
 
 
 
<p>The output files will be located in the folder specified in <strong>conf.py</strong>. Use the BioPortal Web form available in your appliance to submit the extracted ontologies. <strong>IMPORTANT:</strong> The ontology format in the submission form should be UMLS.</p>
 
 
 
<h3>Hardware Considerations</h3>
 
 
 
<p>NCBO dedicates a fair amount of resources (powerful servers) to handle a good portion of UMLS ontologies. Some of the UMLS ontologies contain millions of classes. To import the largest UMLS ontologies (i.e: RXNORM or SNOMEDCT) Users will have to run the Appliance in a powerful dedicated environment with 8GB RAM and 5GB hard disk space available.</p>
 

Latest revision as of 14:52, 23 March 2023


This content has been moved! Please find the new content for the 3.0 version of the Virtual Appliance at our new OntoPortal Virtual Appliance Administration pages.

In particular this content is mostly at the Submitting UMLS Content page.