Background

This notebook illustrates the use of a utility, InferenceWrapper.df_to_emb that can be used to perform inference in bulk on large amounts of data. A benchmark is provided that compares doing inference one at a time in a serial fashion and demonstrates a 10x speedup in inference time over the previous method.

Location of Model Artifacts

Google Cloud Storage

model for inference (965 MB): https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/trained_model_22zkdqlr.pkl

encoder (for fine-tuning w/a classifier) (965 MB): https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/trained_model_encoder_22zkdqlr.pth

fastai.databunch (27.1 GB): https://storage.googleapis.com/issue_label_bot/model/lang_model/data_save.pkl

checkpointed model (2.29 GB): https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/best_22zkdqlr.pth

Load Minimal Model For Inference



In [1]:

    
from inference import InferenceWrapper, pass_through
from IPython.display import display, Markdown
import pandas as pd
from torch.nn.utils.rnn import pad_sequence
from torch import Tensor, cat, device
from torch.cuda import empty_cache
from typing import List
from tqdm import tqdm
from numpy import concatenate as cat
import torch
import numpy as np

# from fastai.torch_core import defaults
# defaults.device = torch.device('cpu')

Create an `InferenceWrapper` object



In [2]:

    
wrapper = InferenceWrapper(model_path='/ds/Issue-Embeddings/notebooks',
                           model_file_name='trained_model_22zkdqlr.pkl')

Download a test dataset

The test dataset has 2,000 GitHub Issues in the below format:



In [12]:

    
testdf = pd.read_csv(f'https://storage.googleapis.com/issue_label_bot/language_model_data/000000000000.csv.gz').head(8000)

testdf.head(3)









    Out[12]:







  
    
      
      url
      repo
      title
      title_length
      body
      body_length
    
  
  
    
      0
      https://github.com/egingric/2016-Racing-Game/i...
      egingric/2016-Racing-Game
      Got stuck near shortcut
      25
      After being blown up by the barrel, I got stuc...
      314
    
    
      1
      https://github.com/Microsoft/nodejstools/issue...
      Microsoft/nodejstools
      Guidance for unit test execution - How to prop...
      95
      What is the appropriate way to set NODE_ENV fo...
      507
    
    
      2
      https://github.com/raphapari/dummy/issues/3
      raphapari/dummy
      Génération du catalogue
      25
      ## User story xxxlnbrk - En tant que :  **gest...
      480

Perform Batch Inference

Why Batch-Inference? When there are a large number of issues for which you want to retrieve document embedddings, batch inference on a gpu (should be) significantly faster than on a cpu.

Generate Embeddings From Pre-Trained Language Model

See help for wrapper.df_to_emb:



In [13]:

    
help(wrapper.df_to_emb)









    



Help on method df_to_emb in module inference:

df_to_emb(dataframe:pandas.core.frame.DataFrame, bs=100) -> numpy.ndarray method of inference.InferenceWrapper instance
    Retrieve document embeddings for a dataframe with the columns `title` and `body`.
    Uses batching for effiecient computation, which is useful when you have many documents
    to retrieve embeddings for. 
    
    Paramaters
    ----------
    dataframe: pandas.DataFrame
        Dataframe with columns `title` and `body`, which reprsent the Title and Body of a
        GitHub Issue. 
    bs: int
        batch size for doing inference.  Set this variable according to your available GPU memory.
        The default is set to 200, which was stable on a Nvida-Tesla V-100.
    
    Returns
    -------
    numpy.ndarray
        An array with of shape (number of dataframe rows, 2400)
        This numpy array represents the latent features of the GitHub issues.
    
    Example
    -------
    >>> import pandas as pd
    >>> wrapper = InferenceWrapper(model_path='/path/to/model',
                               model_file_name='model.pkl')
    # load 200 sample GitHub issues
    >>> testdf = pd.read_csv(f'https://bit.ly/2GDY5NY').head(200)
    >>> embeddings = wrapper.df_to_emb(testdf)
    
    >>> embeddings.shape
    (200, 2400)

Benchmarking inference time on 8,000 Issues

Below is when inference is done using batching (New Method)



In [14]:

    
%%time
embeddings = wrapper.df_to_emb(testdf)









    





 
 










    



CPU times: user 1min 6s, sys: 21.5 s, total: 1min 27s
Wall time: 1min 28s

Below is when inference is done one at a time (Old Method)



In [15]:

    
%%time
# prepare data
test_data = [wrapper.process_dict(x)['text'] for x in testdf.to_dict(orient='rows')]

emb_single = []
for d in tqdm(test_data):
    emb_single.append(wrapper.get_pooled_features(d).detach().cpu().numpy())
    
emb_single_combined = cat(emb_single)









    



100%|██████████| 8000/8000 [14:48<00:00,  9.92it/s]






    



CPU times: user 12min 42s, sys: 2min 54s, total: 15min 37s
Wall time: 15min 34s

Notes:

There is a 10x speedup for inference by chunking the data into batches of similar length (to minimize padding) and passing that through the GPU.

In order to get a further speed improvement we must utilize pad_packed_sequence. We leave this a future exercise to optimize batching more.

Test

This section tests that the embeddings retrieved from the one-at-a time approach are sufficently close to the embeddings from the batching approach



In [17]:

    
assert np.allclose(emb_single_combined, embeddings, atol=1e-5)

	url	repo	title	title_length	body	body_length
0	https://github.com/egingric/2016-Racing-Game/i...	egingric/2016-Racing-Game	Got stuck near shortcut	25	After being blown up by the barrel, I got stuc...	314
1	https://github.com/Microsoft/nodejstools/issue...	Microsoft/nodejstools	Guidance for unit test execution - How to prop...	95	What is the appropriate way to set NODE_ENV fo...	507
2	https://github.com/raphapari/dummy/issues/3	raphapari/dummy	Génération du catalogue	25	## User story xxxlnbrk - En tant que : **gest...	480