Spacy Tutorial continued

This walkthrough is based on this spaCy tutorial.

Train a convolutional neural network text classifier on the IMDB dataset, using the TextCategorizer component. The dataset will be loaded automatically via Thinc's built-in dataset loader. The model is added to spacy.pipeline, and predictions are available via doc.cats.

Set Up Environment

This notebook has been tested with the following package versions:
(you may need to change pip to pip3, depending on your own Python environment)


In [1]:
# Python >3.5
!pip install verta
!pip install spacy==2.1.6
!python -m spacy download en

Set Up Verta


In [2]:
HOST = 'app.verta.ai'

PROJECT_NAME = 'Film Review Classification'
EXPERIMENT_NAME = 'spaCy CNN'

In [3]:
# import os
# os.environ['VERTA_EMAIL'] = 
# os.environ['VERTA_DEV_KEY'] =

In [4]:
from verta import Client
from verta.utils import ModelAPI

client = Client(HOST, use_git=False)

proj = client.set_project(PROJECT_NAME)
expt = client.set_experiment(EXPERIMENT_NAME)
run = client.set_experiment_run()

Imports


In [5]:
from __future__ import print_function

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import random

import six

import numpy as np
import thinc.extra.datasets
import spacy
from spacy.util import minibatch, compounding

Reconstitute a run


In [6]:
run_id = ""

In [7]:
run = expt.expt_runs.find("id == '{}'".format(run_id))[0]

In [8]:
# test the logged model
print("Loading from verta..")
nlp2 = run.get_model()

In [9]:
test_text = "I would definitely watch this again!"
doc2 = nlp2(test_text)
print(test_text)
print(doc2.cats)

In [10]:
run.log_metric("val_metric_direct", 0.5)

Test on a deployed model

Click the link above to view your Experiment Run in the Verta Web App, and deploy it.
Once it's ready, you can make predictions against the deployed model.


In [11]:
from verta._demo_utils import DeployedModel

deployed_model = DeployedModel(HOST, run.id)

In [12]:
deployed_model.predict(["I would definitely watch this again!"])

In [13]:
train_data, _ = thinc.extra.datasets.imdb()

In [14]:
import time
ctr = 0
live_metric = 0
for row in train_data:
    print(row[:100])
    prediction = deployed_model.predict([row[0]])
    print("prediction:", prediction)
    time.sleep(0.5)
    ctr += 1
    if ctr > 10:
        break
    if ((row[1] == 0) and (prediction == "NEGATIVE")) or ((row[1] == 1) and (prediction == "POSITIVE")):
        live_metric += 1

run.log_metric("val_metric_deployed", live_metric * 1.0 / ctr)