This notebook illustrates the use of a utility, InferenceWrapper.df_to_emb
that can be used to perform inference in bulk on large amounts of data. A benchmark is provided that compares doing inference one at a time in a serial fashion and demonstrates a 10x speedup in inference time over the previous method.
https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/trained_model_22zkdqlr.pkl
https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/trained_model_encoder_22zkdqlr.pth
https://storage.googleapis.com/issue_label_bot/model/lang_model/data_save.pkl
https://storage.googleapis.com/issue_label_bot/model/lang_model/models_22zkdqlr/best_22zkdqlr.pth
In [1]:
from inference import InferenceWrapper, pass_through
from IPython.display import display, Markdown
import pandas as pd
from torch.nn.utils.rnn import pad_sequence
from torch import Tensor, cat, device
from torch.cuda import empty_cache
from typing import List
from tqdm import tqdm
from numpy import concatenate as cat
import torch
import numpy as np
# from fastai.torch_core import defaults
# defaults.device = torch.device('cpu')
In [2]:
wrapper = InferenceWrapper(model_path='/ds/Issue-Embeddings/notebooks',
model_file_name='trained_model_22zkdqlr.pkl')
In [12]:
testdf = pd.read_csv(f'https://storage.googleapis.com/issue_label_bot/language_model_data/000000000000.csv.gz').head(8000)
testdf.head(3)
Out[12]:
In [13]:
help(wrapper.df_to_emb)
In [14]:
%%time
embeddings = wrapper.df_to_emb(testdf)
In [15]:
%%time
# prepare data
test_data = [wrapper.process_dict(x)['text'] for x in testdf.to_dict(orient='rows')]
emb_single = []
for d in tqdm(test_data):
emb_single.append(wrapper.get_pooled_features(d).detach().cpu().numpy())
emb_single_combined = cat(emb_single)
There is a 10x speedup for inference by chunking the data into batches of similar length (to minimize padding) and passing that through the GPU.
In order to get a further speed improvement we must utilize pad_packed_sequence. We leave this a future exercise to optimize batching more.
In [17]:
assert np.allclose(emb_single_combined, embeddings, atol=1e-5)