Introduction

NLTK Server enables you to access the features provided by NLTK Library over a REST interface. It can be easily installed on your Application VM or a seperate VM. NLTK Server makes it easy to use NLTK with all other Languages which can make REST calls and parse JSON.

Documentation

Installation

Prerequisites

To install NLTK Server you will need the following:

  • Python 2.7.x
  • Pip

Read more on installing PIP here. https://pip.pypa.io/en/latest/installing.html

How to Install

  1. Clone the NLTK server repository.
git clone https://github.com/preems/nltk-server

Or download latest Zip from Github:

  1. Install dependencies from pip.
cd nltk-server
pip install -r requirements.txt

Some linux distros might require to use sudo.

  1. Install JRE ( only required for Stanford NER ).

Installation of JRE is specific to Operating system your running on. For Ubuntu, it can be installed by running following commands

$ sudo apt-get update
$ sudo apt-get install default-jre

Run NLTK Server

python wsgi.py

API Documentation

Sentence Tokenizer

POST /sent_tokenize

Takes a document and return a array of sentences. Uses nltk.sent_tokentize.

Example request:

POST /sent_tokenize HTTP/1.1
Host: example.com
Accept: application/json

Lorem Ipsum is simply dummy text of the printing. Lorem Ipsum has been the industry's standard dummy text, when an unknown printer took a galley of type. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.

Example response:

HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{
  "result": [
    "Lorem Ipsum is simply dummy text of the printing.",
    "Lorem Ipsum has been the industry's standard dummy text, when an unknown printer took a galley of type.",
    "It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged."
  ],
  "status": "success"
}

Word Tokenizer

POST /word_tokenize

Takes a sentence and tokenizes into words. Uses nltk.word_tokenize.

Example request:

POST /word_tokenize HTTP/1.1
Host: example.com
Accept: application/json

Lorem Ipsum is simply dummy text of the printing.

Example response:

HTTP/1.1 200 OK
Content-Length: 164
Content-Type: application/json
Date: Wed, 24 Dec 2014 01:58:15 GMT
Server: Werkzeug/0.9.6 Python/2.7.6

{
  "result": [
    "Lorem",
    "Ipsum",
    "is",
    "simply",
    "dummy",
    "text",
    "of",
    "the",
    "printing"
  ],
  "status": "success"
}

Part of Speech Tagging

POST /pos_tag

Takes a array of words tokenized by the word tokenizer.

Example request:

POST /pos_tag HTTP/1.1
Host: example.com
Accept: application/json

[
  "Lorem",
  "Ipsum",
  "is",
  "simply",
  "dummy",
  "text",
  "of",
  "the",
  "printing"
]

Example response:

HTTP/1.1 200 OK
Content-Length: 164
Content-Type: application/json
Date: Wed, 24 Dec 2014 02:12:15 GMT
Server: Werkzeug/0.9.6 Python/2.7.6

{
"result": [
  [
    "Lorem",
    "NNP"
  ],
  [
    "Ipsum",
    "NNP"
  ],
  [
    "is",
    "VBZ"
  ],
  [
    "simply",
    "RB"
  ],
  [
    "dummy",
    "JJ"
  ],
  [
    "text",
    "NN"
  ],
  [
    "of",
    "IN"
  ],
  [
    "the",
    "DT"
  ],
  [
    "printing",
    "NN"
  ]
],
"status": "success"
}

Stemming

POST /stem/(string: algorithm)

Takes an array of words and return the stem of words. The valid algorithms are ‘porter’, ‘lancaster’ and ‘snowball’.

Example request:

POST /stem/porter HTTP/1.1
Host: example.com
Accept: application/json

[
  "the",
  "buses",
  "are",
  "crowded"
]

Example response:

HTTP/1.1 200 OK
Content-Length: 212
Content-Type: application/json
Date: Wed, 24 Dec 2014 06:45:29 GMT

{
"result": [
  [
    "the",
    "the"
  ],
  [
    "buses",
    "buse"
  ],
  [
    "are",
    "are"
  ],
  [
    "crowded",
    "crowd"
  ]
],
"status": "success"
}

Lemmatizing

POST /lemmatize/wordnet

Takes an array of words or words with corressponing POS Tag. POS Tag is optional and by default every word is considered noun. Both Wordnet and Penn style Tags are supported. Example for both of them are below.

Example request without POS Tag:

POST /lemmatize/wordnet HTTP/1.1
HOST: example.com
Accept: application/json

[
  "the",
  "buses",
  "are",
  "crowded"
]

Example response:

HTTP/1.1 200 OK
Content-Length: 213
Content-Type: application/json
Date: Sat, 27 Dec 2014 21:19:54 GMT

{
"result": [
  [
    "the",
    "the"
  ],
  [
    "buses",
    "bus"
  ],
  [
    "are",
    "are"
  ],
  [
    "crowded",
    "crowded"
  ]
],
"status": "success"
}

Example request with POS Tag:

POST /lemmatize/wordnet HTTP/1.1
HOST: example.com
Accept: application/json

[
  [
    "the",
    "DT"
  ],
  [
    "buses",
    "NNS"
  ],
  [
    "are",
    "VBP"
  ],
  [
    "crowded",
    "VBN"
  ]
]

Example Response:

HTTP/1.1 200 OK
Content-Length: 210
Content-Type: application/json
Date: Sat, 27 Dec 2014 21:44:28 GMT

{
  "result": [
    [
      "the",
      "the"
    ],
    [
      "buses",
      "bus"
    ],
    [
      "are",
      "be"
    ],
    [
      "crowded",
      "crowd"
    ]
  ],
  "status": "success"
}

Named Entity Recognition

POST /stanfordNER

This API uses the Stanford NER Library. You can read more details about this project on http://nlp.stanford.edu/software/CRF-NER.shtml.

The API requires JRE to be installed.

Example request:

POST /stanfordNER HTTP/1.1
Host: example.com
Accept: application/json

Rami Eid is studying at Stony Brook University in NY.

Example response:

HTTP/1.1 200 OK
Content-Length: 479
Content-Type: application/json
Date: Tue, 30 Dec 2014 19:23:14 GMT
Server: Werkzeug/0.9.6 Python/2.7.6

{
"result": [
  [
    "Rami",
    "PERSON"
  ],
  [
    "Eid",
    "PERSON"
  ],
  [
    "is",
    "O"
  ],
  [
    "studying",
    "O"
  ],
  [
    "at",
    "O"
  ],
  [
    "Stony",
    "ORGANIZATION"
  ],
  [
    "Brook",
    "ORGANIZATION"
  ],
  [
    "University",
    "ORGANIZATION"
  ],
  [
    "in",
    "O"
  ],
  [
    "NY",
    "O"
  ]
],
"status": "success"
}