Text Mining: Integrate OpenNLP with UIMA

In this series, I will introduce how to integrate OpenNLP, UIMA and Solr.
Integrate OpenNLP with UIMA
Talk about how to install UIMA, build OpenNLP pear, and run OpenNLP pear in CVD or UIMA Simple Server. 
Integrate OpenNLP, UIMA and Solr via SOAP Web Service
Talk about how to deploy OpenNLP UIMA pear as SOAP web service, and integrate it with Solr.
Integrate OpenNLP, UIMA AS and Solr
Talk about how to deploy OpenNLP UIMA pear as UIMA AS Service, and integrate it with Solr.

Installing the UIMA SDK
Follow README in UIMA SDK: section 2. Installation and Setup
Set JAVA_HOME, UIMA_HOME
Append %UIMA_HOME%/bin to your PATH
Run %UIMA_HOME%/bin/adjustExamplePaths.bat
Build OpenNLP UIMA Pear
Follow instruction at OpenNLP UIMA
Download latest source code, go to %opennlp_src_home%\opennlp, type mvn install.
Go to %opennlp_src_home%\opennlp-uima, type ant -f createPear.xml.

The built OpenNlpTextAnalyzer.pear would be in %opennlp_src_home%\opennlp-uima\target folder.

Run OpenNLP Pear in UIMA Cas Visual Debugger
Call Set UIMA_JVM_OPTS=-Xms128M -Xmx8g to adjust JVM heap size, we can also change this in runUimaClass.bat or add it to system environment.

Execute runPearInstaller.bat, point to the built OpenNlpTextAnalyzer.pear in PEAR file, and specify installation directory, for example: %PEARS_HOME_REPLACE_THIS%\opennlp.uima.OpenNlpTextAnalyzer

To run OpenNLP analysis engine, click "Run your AE in the CAS", paste some text, then click "Run" -> "Run OpenNlpTextAnalyzer", or use shortcut key: Ctrl+R.

In future, we call cvd.bat, click "Run" -> "Local AE", browse to the location where OpenNLP pear is installed. Select %PEARS_HOME_REPLACE_THIS%\opennlp.uima.OpenNlpTextAnalyzer\opennlp.uima.OpenNlpTextAnalyzer_pear.xml, paste some test, then click click "Run" -> "Run OpenNlpTextAnalyzer".
Deploy OpenNLP Pear in UIMA Simple Server
We can deploy OpenNLP Pear in web service as a REST service. Please follow UIMA Simple Server User Guide to build the war.
http://uima.apache.org/downloads/sandbox/simpleServerUserGuide/simpleServerUserGuide.html

Then copy OpenNlpTextAnalyzer.pear to WEB-INF/resources, add opennlp servlet in web.xml:
<servlet>
  <servlet-name>opennlp</servlet-name>
  <servlet-class>
      org.apache.uima.simpleserver.servlet.SimpleServerServlet
  </servlet-class>
  <!-- Define the path to the pear file -->
  <init-param>
      <param-name>PearPath</param-name>
      <param-value>
          WEB-INF/resources/OpenNlpTextAnalyzer.pear
      </param-value>
  </init-param>
</servlet>
<servlet-mapping>
  <servlet-name>opennlp</servlet-name>
  <url-pattern>/opennlp</url-pattern>
</servlet-mapping>
Browse to http://localhost:8080/uima-server/opennlp
Go to http://localhost:8080/uima-server/opennlp?mode=form, type some text, then hit the "Submit Query" button.

Or you can send a Get request: http://localhost:8080/uima-server/opennlp?text=some_text_here
Or send a post request:
curl http://localhost:8080/uima-server/opennlp -X POST -d "text=some_text_here"

Resources
UIMA Documentation Overview
UIMA Asynchronous Scaleout Documentation Overview
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (58) Interview (58) J2SE (53) Algorithm (43) Soft Skills (36) Eclipse (34) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts