2.0 Module 2: Lab Setup – Configuring Your OpenNLP Development Environment
2.1 Introduction: Preparing for Practical NLP
A correctly configured development environment is the launchpad for any successful software project. In the context of Natural Language Processing with OpenNLP, this step is particularly critical. The following instructions will guide you through acquiring the necessary library files, downloading the essential pre-trained statistical models, and integrating them into your system and development environment. Completing these steps is crucial for your ability to execute the practical examples and exercises in the subsequent modules of this course.
2.2 Acquiring the OpenNLP Library and Pre-trained Models
Our first task is to download the OpenNLP library itself, followed by the pre-trained models required for our NLP tasks.
Downloading the Apache OpenNLP Library
- Navigate to the official Apache OpenNLP homepage: https://opennlp.apache.org/.
- Locate and click the ‘Downloads’ link on the homepage.
- You will be directed to a page with several mirror links. Clicking one will take you to the Apache Software Foundation’s distribution directory.
- Browse the directory to find the latest version of the OpenNLP distribution (for this series, we will reference version 1.6.0).
- Download both the binary and source files for your operating system. For Windows, these will be named apache-opennlp-1.6.0-bin.zip and apache-opennlp-1.6.0-src.zip. Unzip the binary file to a known location on your hard drive (e.g., E:\).
Downloading the Pre-trained Models
- Navigate to the OpenNLP models download page. The models compatible with version 1.6.0 of the library can be found here: http://opennlp.sourceforge.net/models-1.5/.
- This page contains a list of pre-trained models for various NLP tasks (e.g., sentence detection, tokenization, name finding) across different languages.
- Download all the available models and save them to a dedicated local directory. For consistency in our examples, we will assume this location is C:/OpenNLP_models/. It is essential that you note the path to this directory, as you will need it to load the models in your code.
2.3 System Environment Configuration: Setting the Classpath
The system’s classpath is an environment variable that tells the Java Virtual Machine (JVM) where to find the class libraries, including the OpenNLP JAR files we just downloaded. Setting this path correctly allows you to run OpenNLP tools from any directory in your command line.
Setting the classpath on a Windows System
- Right-click on ‘My Computer’ (or ‘This PC’) and select ‘Properties’.
- Navigate to ‘Advanced system settings’ and click the ‘Environment Variables’ button.
- In the ‘System variables’ section, find and select the path variable, then click ‘Edit’.
- Add a new entry pointing to the bin directory of your OpenNLP installation. For example, if you unzipped the library to E:\, the path would be E:\apache-opennlp-1.6.0\bin.
- Click ‘OK’ to save your changes.
2.4 IDE Integration: Setting Up OpenNLP in Eclipse
To develop applications with OpenNLP, you must integrate its library into your Integrated Development Environment (IDE). We will cover two common methods for doing this in Eclipse.
Method 1: Setting Build Path to the JAR Files
This method involves directly adding the OpenNLP library files to your Java project’s build path.
- Open Eclipse and create a new Java Project from the File -> New -> Java Project menu.
- Once the project is created, right-click on it in the ‘Package Explorer’ view. Select ‘Build Path’, and then ‘Configure Build Path…’.
- In the ‘Java Build Path’ wizard that appears, select the ‘Libraries’ tab and click the ‘Add External JARs…’ button.
- Navigate to the lib folder inside your OpenNLP installation directory (e.g., E:\apache-opennlp-1.6.0\lib). Select the opennlp-tools-1.6.0.jar and opennlp-uima-1.6.0.jar files and click ‘Open’.
- Verify that the selected JAR files now appear in the list of libraries for your project. You can also see them under the ‘Referenced Libraries’ node in the ‘Package Explorer’.
Method 2: Using pom.xml for Maven Projects
If you are using Apache Maven for dependency management, you can add OpenNLP to your project by including it in your pom.xml file. This method automates the process of downloading and linking the required libraries.
- Ensure your project is configured as a Maven project.
- Open the pom.xml file at the root of your project and add the following dependencies within the <dependencies> section.
<project xmlns=”http://maven.apache.org/POM/4.0.0″ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd”>
<modelVersion>4.0.0</modelVersion>
<groupId>myproject</groupId>
<artifactId>myproject</artifactId>
<version>0.0.1-SNAPSHOT</version>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-uima</artifactId>
<version>1.6.0</version>
</dependency>
</dependencies>
</project>
With your development environment now correctly configured, you are prepared to begin exploring the programmatic components of the OpenNLP API in our next module.