Coding Synergy: Bridging CoreNLP in Java with Python for NLP

Bagiyalakshmi

Published in

featurepreneur

3 min readJul 9, 2023

setup and use Stanford CoreNLP Server with Python.

Download Stanford CoreNLP

To download Stanford CoreNLP, visit https://nlp.stanford.edu/software/stanford-corenlp-4.5.4.zip
and unzip the folder

2. Install Java8

3. Run the Stanford CoreNLP server.

Go to the path of the unzipped Stanford CoreNLP and execute the below command

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000

This command starts the server on port 9000 with a timeout of 30 seconds.

4. Accessing Stanford CoreNLP Server using Python

Installing Dependencies:

To access the CoreNLP Server from Python, we need to install the ‘Stanford’ or ‘stanza’ library, which provides a Python interface for communicating with the server. Use the following command to install the library.

You can use either one,

pip install stanfordnlp

pip install stanza

Now that we have set up the CoreNLP Server and installed the necessary dependencies, we can access it using Python. Here’s an example code snippet to get started.

This is an example code snippet using the ‘Stanfordnlp’ library

from stanfordcorenlp import StanfordCoreNLP
import json
    
class StanfordNLP:
    def __init__(self, host='http://localhost', port=9000):
        self.nlp = StanfordCoreNLP(host, port=port,
                                   timeout=30000)  # , quiet=False, logging_level=logging.DEBUG)
        self.props = {
            'annotators': 'tokenize,ssplit,pos,lemma,ner,parse,depparse,dcoref,relation',
            'pipelineLanguage': 'en',
            'outputFormat': 'json'
        }

    def word_tokenize(self, sentence):
        return self.nlp.word_tokenize(sentence)

    def pos(self, sentence):
        return self.nlp.pos_tag(sentence)

    def ner(self, sentence):
        return self.nlp.ner(sentence)

    def parse(self, sentence):
        return self.nlp.parse(sentence)

    def dependency_parse(self, sentence):
        return self.nlp.dependency_parse(sentence)

    def annotate(self, sentence):
        return json.loads(self.nlp.annotate(sentence, properties=self.props))

    @staticmethod
    def tokens_to_dict(_tokens):
        tokens = defaultdict(dict)
        for token in _tokens:
            tokens[int(token['index'])] = {
                'word': token['word'],
                'lemma': token['lemma'],
                'pos': token['pos'],
                'ner': token['ner']
            }
        return tokens
    
def startpy():

    sNLP = StanfordNLP()
    text = 'Meet Dr.Shaun Murphy, who is autistic and a doctor'
    print("Annotate:", sNLP.annotate(text))
    print ("POS:", sNLP.pos(text))
    print ("Tokens:", sNLP.word_tokenize(text))
    print ("NER:", sNLP.ner(text))
    print ("Parse:", sNLP.parse(text))
    print ("Dep Parse:", sNLP.dependency_parse(text))

if __name__ == '__main__':
    startpy()

2. This is an example code snippet using the ‘stanza’ library

import stanza
from collections import defaultdict

class stanza:
    def __init__(self):
        self.nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos,lemma,ner,depparse')

    def word_tokenize(self, sentence):
        doc = self.nlp(sentence)
        return [token.text for sent in doc.sentences for token in sent.tokens]
    
    def pos(self, sentence):
        doc = self.nlp(sentence)
        return [word.xpos for sent in doc.sentences for word in sent.words]
    
    def ner(self, sentence):
        doc = self.nlp(sentence)
        return [ent.type for sent in doc.sentences for ent in sent.ents]
    
    def parse(self, sentence):
        doc = self.nlp(sentence)
        return [sent.deptree for sent in doc.sentences]
    
    def dependency_parse(self, sentence):
        doc = self.nlp(sentence)
        return [
            [
                (word.head, word.deprel)
                for word in sent.words
            ]
            for sent in doc.sentences
        ]
    
    @staticmethod
    def tokens_to_dict(_tokens):
        tokens = defaultdict(dict)
        for token in _tokens:
            tokens[int(token['id'])] = {
                'word': token['text'],
                'lemma': token['lemma'],
                'pos': token['xpos'],
                'ner': token['ner']
            }
        return tokens
    
def startpy():

    sNLP = stanza()
    text = 'Meet Dr.Shaun Murphy, who is autistic and a doctor'
    print("Word Tokenize:", sNLP.word_tokenize(text))
    print("POS:", sNLP.pos(text))
    print("NER:", sNLP.ner(text))
    print("Dependency Parse:", sNLP.dependency_parse(text))


if __name__ == '__main__':
    startpy()

There is an extensive list of Annotators, here we will see some of them

tokenize

Tokenization is the process of turning text into tokens. For example, the sentence “Claire is a good singer.” this would be tokenized as “Claire”, “is”, “a”, “good”, “singer”, “.”.

Part of speech tagging assigns part of speech labels to tokens, such as whether they are verbs or nouns. Every token in a sentence has applied a tag

Recognizes named entities (person and Organization names, etc.) in text.

parse

Parsing refers to the process of analyzing the syntactic structure of sentences to determine the relationships between words and their roles in a sentence. It involves building a parse tree that represents the hierarchical structure of the sentence based on grammar rules and syntactic dependencies.

deparse

Deparsing, also known as un-parsing, is the reverse process of parsing. It involves reconstructing a sentence from its parse tree or syntactic representation, allowing the sentence to be generated or outputted in its original form. ref: here

In this article, we have explored how to access the Stanford CoreNLP Server using Python. By leveraging the ‘Stanfordnlp’ and ‘stanza’ libraries, we can conveniently communicate with the server and perform various natural language processing tasks. This integration of CoreNLP’s powerful analysis capabilities with the flexibility of Python opens up a world of possibilities for NLP applications.

Let’s keep exploring more and dig a lot!!

Happy learning and coding !!!

Coding Synergy: Bridging CoreNLP in Java with Python for NLP

setup and use Stanford CoreNLP Server with Python.

Written by Bagiyalakshmi