Part-of-Speech Tagging with NLTK

This notebook is a quick demonstration of verta's run.log_setup_script() feature.

We'll create a simple and lightweight text tokenizer and part-of-speech tagger using NLTK,
which will require not only installing the nltk package itself,
but also downloading pre-trained text processing models within Python code.

Prepare Verta



In [1]:

    
import six

from verta import Client
from verta.utils import ModelAPI



In [2]:

    
HOST = "app.verta.ai"

PROJECT_NAME = "Part-of-Speech Tagging"
EXPERIMENT_NAME = "NLTK"



In [3]:

    
client = Client(HOST)

proj = client.set_project(PROJECT_NAME)
expt = client.set_experiment(EXPERIMENT_NAME)
run = client.set_experiment_run()

Prepare NLTK

This Notebook was tested with nltk v3.4.5, though many versions should work just fine.



In [4]:

    
import nltk

nltk.__version__

NLTK requires the separate installation of a tokenizer and part-of-speech tagger before these functionalities can be used.



In [5]:

    
# for tokenizing
nltk.download('punkt')

# for part-of-speech tagging
nltk.download('averaged_perceptron_tagger')

Log Model for Deployment

Create Model

Our model will be a thin wrapper around nltk,
returning the constituent tokens and their part-of-speech tags for each input sentence.



In [6]:

    
class TextClassifier:
    def __init__(self, nltk):
        self.nltk = nltk

    def predict(self, data):
        predictions = []
        for text in data:
            tokens = self.nltk.word_tokenize(text)
            predictions.append({
                'tokens': tokens,
                'parts_of_speech': [list(pair) for pair in self.nltk.pos_tag(tokens)],
            })

        return predictions

model = TextClassifier(nltk)

data = [
    "I am a teapot.",
    "Just kidding I'm a bug?",
]
model.predict(data)

Create Deployment Artifacts

As always, we'll create a couple of descriptive artifacts to let the Verta platform know how to handle our model.



In [7]:

    
model_api = ModelAPI(data, model.predict(data))

run.log_model(model, model_api=model_api)
run.log_requirements(["nltk"])

Create Setup Script

As we did in the beginning of this Notebook,
the deployment needs these NLTK resources downloaded and installed before it can run the model,
so we'll define a short setup script to send over and execute at the beginning of a model deployment.



In [8]:

    
setup = """
import nltk

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
"""

run.log_setup_script(setup)

Make Live Predictions

Now we can visit the Web App, deploy the model, and make successful predictions!



In [9]:

    
run



In [10]:

    
data = [
    "Welcome to Verta!",
]



In [11]:

    
from verta.deployment import DeployedModel

DeployedModel(HOST, run.id).predict(data)