8.0 Module 8: Practical Implementation – Syntactic Parsing
8.1 Introduction: From Flat Text to Hierarchical Parse Trees
Syntactic parsing is the process of analyzing a string of symbols—in our case, a sentence—to determine its grammatical structure with respect to a given formal grammar. While Part-of-Speech tagging provides a flat, linear annotation of a sentence, parsing goes a significant step further. It reveals the hierarchical relationships between words and phrases, typically representing this structure as a parse tree. This tree structure makes complex grammatical relationships explicit, such as identifying the subject and object of a verb, or determining how modifiers relate to the words they describe. Parsing is essential for deep semantic understanding and is a core component of advanced NLP applications like machine translation and question-answering systems.
8.2 The OpenNLP Parsing Workflow
The parsing process in OpenNLP leverages a pre-trained model, en-parser-chunking.bin, which has been trained to identify the constituent parts of sentences and their hierarchical relationships.
The workflow for parsing a sentence consists of three main steps:
- Load the ParserModel from the en-parser-chunking.bin file.
- Create a Parser object from the model using the static create() method of the ParserFactory class.
- Use the static parseLine() method from the ParserTool utility class to generate one or more parse trees for an input sentence.
8.3 Practical Implementation: Generating a Parse Tree
The following Java program demonstrates how to generate a parse tree for a simple sentence.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
public class ParserExample {
public static void main(String args[]) throws Exception{
//Loading parser model
InputStream inputStream = new FileInputStream(“C:/OpenNLP_models/en-parser-chunking.bin”);
ParserModel model = new ParserModel(inputStream);
//Creating a parser
Parser parser = ParserFactory.create(model);
//Parsing the sentence
String sentence = “Tutorialspoint is the largest tutorial library.”;
Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
for (Parse p : topParses)
p.show();
}
}
Code Walkthrough:
- Model Loading and Parser Creation: The en-parser-chunking.bin model is loaded, and a Parser instance is created from it via ParserFactory.create(model).
- Parsing the Sentence: The key operation happens in the call to ParserTool.parseLine(). This method takes three arguments:
- sentence: The input string to be parsed.
- parser: The parser instance we just created.
- 1: An integer specifying the number of top-scoring parses to return. In this case, we are requesting only the single best parse.
- Displaying the Parse Tree: The result, topParses, is an array of Parse objects. We loop through this array (which contains only one element in our case) and call the p.show() method on each Parse object. This method prints a formatted, human-readable representation of the parse tree to the console.
Analysis of the Output:
(TOP (S (NP (NN Tutorialspoint)) (VP (VBZ is) (NP (DT the) (JJS largest) (NN tutorial) (NN library.)))))
This bracketed notation represents the hierarchical structure of the sentence. Let’s deconstruct it:
- (TOP …): This is the root of the entire parse.
- (S …): This represents the main Sentence clause.
- The sentence S is composed of two main constituents:
- (NP (NN Tutorialspoint)): A Noun Phrase (NP), which contains a singular Noun (NN), “Tutorialspoint”. This is the subject of the sentence.
- (VP (VBZ is) (NP …)): A Verb Phrase (VP), which contains the verb “is” (VBZ) and another NP. This is the predicate of the sentence.
- The second NP is more complex: (NP (DT the) (JJS largest) (NN tutorial) (NN library.)). It contains a Determiner (“the”), a superlative adjective (JJS, “largest”), and two nouns (“tutorial”, “library.”). This entire phrase serves as the object of the verb “is”.
This tree structure provides a much deeper understanding of the sentence’s grammar than a simple list of POS tags.
Having explored the deep hierarchical structure provided by full parsing, we will next examine a simpler but often equally useful task: identifying key phrases through chunking.