Using Stanford CoreNLP with Python and Docker Containers

8 minute read

In this brief guide you will learn how to easily setup two Docker containers, one for Python and the other for the Stanford CoreNLP server. Using this setup you will be able to quickly have an environment where you can experiment with natural language processing.

Prerequisites

This guide assumes that you are using Micrsoft Windows as your host operating system and that Docker is already installed. If Docker is not installed on your computer you can follow the Get Started with Docker guide.

Creating CoreNLP Docker Image

The first step is to create a Stanford CoreNLP Docker image, so that later on we can run a container which will handle our natural language processing requests. Copy and paste the code below and save it as a text file. Name it Dockerfile without any extension.

FROM openjdk:8u181-jre-stretch

MAINTAINER Stefan Fiott <stefan at stefanfiott dot com>

ENV CORENLP_ARCHIVE_VERSION=2018-02-27
ENV CORENLP_ARCHIVE=stanford-corenlp-full-${CORENLP_ARCHIVE_VERSION}
ENV CORENLP_PATH=/corenlp

RUN apt-get install --no-install-recommends -y \
                        wget \
                        unzip

RUN wget http://nlp.stanford.edu/software/$CORENLP_ARCHIVE.zip

RUN unzip $CORENLP_ARCHIVE

RUN mv $CORENLP_ARCHIVE $CORENLP_PATH
RUN rm $CORENLP_ARCHIVE.zip

WORKDIR $CORENLP_PATH

RUN export CLASSPATH="`find . -name '*.jar'`"

EXPOSE 9000

CMD java -mx1g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000

Once you have the above Dockerfile on your computer, load a Windows PowerShell and change the working directory to the location where you saved the file. Next run the following command, but first make sure to change the namespace part to whatever you like. You could for example change it to your first name.

docker build -t namespace/corenlp-debian:latest .

Once you hit Enter give Docker some time to finish building the image. Depending on your computer and Internet speed this might take a few minutes to complete.

Running CoreNLP and Python Containers

Now that you have a CoreNLP Docker image, it is time to run two containers, one based on the CoreNLP Docker image you just created and another one to run Python and any libraries required. In this manner, you will be able to write Python code and execute it on the Python container. The Python code will then send requests to the CoreNLP container. In turn, the CoreNLP container will respond to the requests returning the results generated by processing the text passed.

For this reason, the two containers need to be able to communicate with each other. This is achieved by setting a bridged network over which the two containers will be able to communicate. No need to worry. The following batch file will take care to setup the required bridged network, run the containers and then clean up everything once you are done.

Copy and paste the below code into a file and name it something sensible, such as run-corenlp-lab.bat. Once more, remember to change namespace in the batch file to match the one you set in the Dockerfile.

docker network create corenlp-net
docker create --rm -it -v "%cd%":/usr/src/app -w /corenlp --network corenlp-net --name corenlp-inst namespace/corenlp-debian:latest
docker start corenlp-inst
docker create --rm -it -v "%cd%":/usr/src/app -w /usr/src/app --network corenlp-net --name python3-inst python:3.7.0-stretch /bin/bash
docker start python3-inst
docker attach python3-inst
echo "Shutting down..."
docker stop corenlp-inst
docker network rm corenlp-net
echo "Done."

Finally, change your path to the location where you want to write your experimental Python NLP code. Once done, execute the batch file you just created from this new location by specifying the full path to it. For example, if you saved the run-corenlp-lab.bat file in C:\projects\docker\nlp\ and your NLP code is in C:\projects\nlp\exp01\ you shoud execute C:\projects\docker\nlp\run-corenlp-lab.bat from within C:\projects\nlp\exp01\.

The run-corenlp-lab.bat instructs Docker to mount the current working directory, in this example C:\projects\nlp\exp01\ as /usr/src/app. Once the Python container is running you will be able to write code in your preferred text editor on your host machine and execute it in the Python container.

Setting Python Virtual Environment and Installing py-corenlp

First, create a virtual environment like so, python -m venv nlpenv. In the running example, this will create a nlpenv directory under /usr/src/app. Next, activate the virtual environment by executing source nlpenv/bin/activate, assuming you named the virtual environment nlpenv.

Finally, install the required python libraries, in this case let us install ipython and py-corenlp.

pip install --upgrade pip
pip install ipython pycorenlp

Testing py-corenlp with Stanford CoreNLP

Now that all is in place let us test CoreNLP’s ability to perform part-of-speech (POS) tagging and sentiment analysis, as an example. First, load ipython and then execute the following code.

In [1]:	from pycorenlp import StanfordCoreNLP

In [2]: nlp = StanfordCoreNLP('http://corenlp-inst:9000')

In [3]: text = "The food was delicious, matched with top-notch service."

In [4]: output = nlp.annotate(text, properties={
   ...:     'annotators': 'pos,sentiment',
   ...:     'outputFormat': 'json'
   ...: })

In [5]: output['sentences'][0]['sentimentValue']
Out[5]: '4'

In [6]: output['sentences'][0]['sentiment']
Out[6]: 'Verypositive'

In [7]: print(output['sentences'][0]['parse'])
(ROOT
  (S
    (NP (DT The) (NN food))
    (VP (VBD was)
      (ADJP (JJ delicious))
      (, ,)
      (S
        (VP (VBN matched)
          (PP (IN with)
            (NP (JJ top-notch) (NN service))))))
    (. .)))

When you are done, just execute exit and the containers will be stopped and the bridged network removed. The code you created will still be present on your host machine, in this example C:\projects\nlp\exp01\.

That is all. Now you have an easy way to create an environment using Docker containers in which you can experiment with natural language processing through the Stanford CoreNLP library.