Background

This notebook illustrates the use of a utility, InferenceWrapper.df_to_emb that can be used to perform inference in bulk on large amounts of data. A benchmark is provided that compares doing inference one at a time in a serial fashion and demonstrates a 10x speedup in inference time over the previous method.

Location of Model Artifacts

Google Cloud Storage

  • model for inference (965 MB): https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/trained_model_22zkdqlr.pkl
  • encoder (for fine-tuning w/a classifier) (965 MB): https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/trained_model_encoder_22zkdqlr.pth
  • fastai.databunch (27.1 GB): https://storage.googleapis.com/issue_label_bot/model/lang_model/data_save.pkl
  • checkpointed model (2.29 GB): https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/best_22zkdqlr.pth

Load Minimal Model For Inference


In [1]:
from inference import InferenceWrapper, pass_through
from IPython.display import display, Markdown
import pandas as pd
from torch.nn.utils.rnn import pad_sequence
from torch import Tensor, cat, device
from torch.cuda import empty_cache
from typing import List
from tqdm import tqdm
from numpy import concatenate as cat
import torch
import numpy as np

# from fastai.torch_core import defaults
# defaults.device = torch.device('cpu')

Create an InferenceWrapper object


In [2]:
wrapper = InferenceWrapper(model_path='/ds/Issue-Embeddings/notebooks',
                           model_file_name='trained_model_22zkdqlr.pkl')

Download a test dataset

The test dataset has 2,000 GitHub Issues in the below format:


In [12]:
testdf = pd.read_csv(f'https://storage.googleapis.com/issue_label_bot/language_model_data/000000000000.csv.gz').head(8000)

testdf.head(3)


Out[12]:
url repo title title_length body body_length
0 https://github.com/egingric/2016-Racing-Game/i... egingric/2016-Racing-Game Got stuck near shortcut 25 After being blown up by the barrel, I got stuc... 314
1 https://github.com/Microsoft/nodejstools/issue... Microsoft/nodejstools Guidance for unit test execution - How to prop... 95 What is the appropriate way to set NODE_ENV fo... 507
2 https://github.com/raphapari/dummy/issues/3 raphapari/dummy Génération du catalogue 25 ## User story xxxlnbrk - En tant que : **gest... 480

Perform Batch Inference

Why Batch-Inference? When there are a large number of issues for which you want to retrieve document embedddings, batch inference on a gpu (should be) significantly faster than on a cpu.

Generate Embeddings From Pre-Trained Language Model

See help for wrapper.df_to_emb:


In [13]:
help(wrapper.df_to_emb)


Help on method df_to_emb in module inference:

df_to_emb(dataframe:pandas.core.frame.DataFrame, bs=100) -> numpy.ndarray method of inference.InferenceWrapper instance
    Retrieve document embeddings for a dataframe with the columns `title` and `body`.
    Uses batching for effiecient computation, which is useful when you have many documents
    to retrieve embeddings for. 
    
    Paramaters
    ----------
    dataframe: pandas.DataFrame
        Dataframe with columns `title` and `body`, which reprsent the Title and Body of a
        GitHub Issue. 
    bs: int
        batch size for doing inference.  Set this variable according to your available GPU memory.
        The default is set to 200, which was stable on a Nvida-Tesla V-100.
    
    Returns
    -------
    numpy.ndarray
        An array with of shape (number of dataframe rows, 2400)
        This numpy array represents the latent features of the GitHub issues.
    
    Example
    -------
    >>> import pandas as pd
    >>> wrapper = InferenceWrapper(model_path='/path/to/model',
                               model_file_name='model.pkl')
    # load 200 sample GitHub issues
    >>> testdf = pd.read_csv(f'https://bit.ly/2GDY5NY').head(200)
    >>> embeddings = wrapper.df_to_emb(testdf)
    
    >>> embeddings.shape
    (200, 2400)

Benchmarking inference time on 8,000 Issues

Below is when inference is done using batching (New Method)


In [14]:
%%time
embeddings = wrapper.df_to_emb(testdf)


CPU times: user 1min 6s, sys: 21.5 s, total: 1min 27s
Wall time: 1min 28s

Below is when inference is done one at a time (Old Method)


In [15]:
%%time
# prepare data
test_data = [wrapper.process_dict(x)['text'] for x in testdf.to_dict(orient='rows')]

emb_single = []
for d in tqdm(test_data):
    emb_single.append(wrapper.get_pooled_features(d).detach().cpu().numpy())
    
emb_single_combined = cat(emb_single)


100%|██████████| 8000/8000 [14:48<00:00,  9.92it/s]
CPU times: user 12min 42s, sys: 2min 54s, total: 15min 37s
Wall time: 15min 34s

Notes:

There is a 10x speedup for inference by chunking the data into batches of similar length (to minimize padding) and passing that through the GPU.

In order to get a further speed improvement we must utilize pad_packed_sequence. We leave this a future exercise to optimize batching more.

Test

This section tests that the embeddings retrieved from the one-at-a time approach are sufficently close to the embeddings from the batching approach


In [17]:
assert np.allclose(emb_single_combined, embeddings, atol=1e-5)