Automated Patent Landscaping

This notebook walks through the process of creating a patent landscape as described in the paper Automated Patent Landscaping (Abood, Feltenberger 2016). The basic outline is:

load pre-trained word embeddings
load a seed set of patents and generate the positive and negative training data
train a deep neural network on the data
show how to do inference using the model

If you haven't already, please make sure you've setup an environment following the instructions in the README so you have all the necessary dependencies.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Your Cloud Project ID

You'll need a Google Cloud project with BigQuery enabled (it's enabled by default) for this notebook and associated code to work. Put your project ID below. Go to cloud.google.com to create one if you don't already have an account. You can create the Cloud account for free and won't be auto-billed. Then copy your Project ID and paste it below into bq_project.



In [1]:

    
# bq_project = 'patent-landscape-165715'
bq_project = ''

Basic Configuration



In [2]:

    
import tensorflow as tf
import pandas as pd
import os

seed_name = 'video_codec'
seed_file = 'seeds/video_codec.seed.csv'
patent_dataset = 'patents-public-data:patents.publications_latest'
num_anti_seed_patents = 15000
if bq_project == '':
    raise Exception('You must enter a bq_project above for this code to run.')

Download Embedding Model if Necessary

We provide a pre-trained word2vec word embedding model that we trained on 5.9 million patent abstracts. See also word2vec.py in this repo if you'd like to train your own (though gensim is likely an easier path). The code below will download the model from Google Cloud Storage (GCS) and store it on the local filesystem, or simply load it from local disk if it's already present.



In [3]:

    
from word2vec import W2VModelDownload

model_name = '5.9m'
model_download = W2VModelDownload(bq_project)
model_download.download_w2v_model('patent_landscapes', model_name)
print('Done downloading model {}!'.format(model_name))









    



Model 5.9m already exists. Using local copy.
Done downloading model 5.9m!

Load Word2Vec Embeddings

This loads Word2Vec embeddings from a model trained on 5.9 million patent abstracts. Just as a demonstration, this also finds the k most similar words to a given word ranked by closeness in the embedding space. Finally, we use tSNE to visualize the word closeness in 2-dimensional space.

Note that the actual model files are fairly large (e.g., the 5.9m dataset is 760mb per checkpoint), so they are not stored in the Github repository. They're stored in the patent_landscapes Google Cloud Storage bucket under the models/ folder. If you'd like to use them, download the models folder and put it into the root repository folder (e.g., if you checked out this repository into the patent-models folder, then the 5.9m model should be in patent-models/models/5.9m, for example.



In [4]:

    
from word2vec import Word2Vec

word2vec5_9m = Word2Vec('5.9m')
w2v_runtime = word2vec5_9m.restore_runtime()









    



Will use models/5.9m directory to load/persist model information.
INFO:tensorflow:Restoring parameters from models\5.9m\checkpoints\5.9m_abstracts.ckpt-1325000



In [14]:

    
w2v_runtime.find_similar('codec', 10)









    Out[14]:





[{'distance': 0.2504613225085045, 'index': 18666, 'word': 'codecs'},
 {'distance': 0.35244482373193076, 'index': 7908, 'word': 'coder'},
 {'distance': 0.4056498009988756, 'index': 3439, 'word': 'modem'},
 {'distance': 0.42499157512238861, 'index': 689, 'word': 'audio'},
 {'distance': 0.4471146088434419, 'index': 6847, 'word': 'pcm'},
 {'distance': 0.44941640727995558, 'index': 14431, 'word': 'transcoder'},
 {'distance': 0.45933947720840873, 'index': 1277, 'word': 'voice'},
 {'distance': 0.45978361764924314, 'index': 7047, 'word': 'dsp'},
 {'distance': 0.4601201119691829, 'index': 4256, 'word': 'quantization'},
 {'distance': 0.46738456399275319, 'index': 6996, 'word': 'mpeg'}]



In [6]:

    
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

w2v_runtime.visualize_embeddings(500)

Patent Landscape Expansion

This section of the notebook creates an instance of the PatentLandscapeExpander, which accesses a BigQuery table of patent data to do the expansion of a provided seed set and produces each expansion level as well as the final training dataset as a Pandas dataframe.



In [7]:

    
import expansion

expander = expansion.PatentLandscapeExpander(
    seed_file,
    seed_name,
    bq_project=bq_project,
    patent_dataset=patent_dataset,
    num_antiseed=num_anti_seed_patents)

This does the actual expansion and displays the head of the final training data dataframe.



In [8]:

    
training_data_full_df, seed_patents_df, l1_patents_df, l2_patents_df, anti_seed_patents = \
    expander.load_from_disk_or_do_expansion()

training_df = training_data_full_df[
    ['publication_number', 'title_text', 'abstract_text', 'claims_text', 'description_text', 'ExpansionLevel', 'refs', 'cpcs']]
training_data_full_df.head()









    



Loading landscape data from filesystem at data\video_codec\landscape_data.pkl






    Out[8]:







  
    
      
      pub_num
      publication_number
      family_id
      priority_date
      title_text
      abstract_text
      claims_text
      description_text
      refs
      cpcs
      ExpansionLevel
    
  
  
    
      0
      7929782
      US-7929782-B2
      15501626
      19970609
      Recording medium having recorded thereon coded...
      Image decoding method including: receiving an ...
      unused
      unused
      US-5054103-A,US-5054103-A,US-5054103-A,US-5054...
      G06T7/223,G06T7/262,H04N19/52,H04N19/44,H04N19...
      Seed
    
    
      1
      7801368
      US-7801368-B2
      15501626
      19970609
      Recording medium having recorded thereon coded...
      An image decoder wherein motion compensation i...
      unused
      unused
      US-5054103-A,US-5054103-A,US-5054103-A,US-5054...
      G06T7/262,G06T7/223,H04N19/52,H04N19/44,H04N19...
      Seed
    
    
      2
      7724967
      US-7724967-B2
      15501626
      19970609
      Recording medium having recorded thereon coded...
      A recording medium having recorded thereon inf...
      unused
      unused
      US-5054103-A,US-5054103-A,US-5054103-A,US-5054...
      G06T7/223,G06T7/262,H04N19/52,H04N19/44,H04N19...
      Seed
    
    
      3
      7801367
      US-7801367-B2
      15501626
      19970609
      Recording medium having recorded thereon coded...
      An image decoder including: a demultiplexer ex...
      unused
      unused
      US-5054103-A,US-5054103-A,US-5054103-A,US-5054...
      G06T7/223,G06T7/262,H04N19/52,H04N19/44,H04N19...
      Seed
    
    
      4
      8611685
      US-8611685-B2
      15501626
      19970609
      Recording medium having recorded thereon coded...
      Computer-readable medium having an image decod...
      unused
      unused
      JP-H0198729-A,JP-H0198729-A,JP-H0198729-A,JP-H...
      G06T7/223,G06T7/262,H04N19/52,H04N19/44,H04N19...
      Seed

Show some stats about the landscape training data



In [9]:

    
print('Seed/Positive examples:')
print(training_df[training_df.ExpansionLevel == 'Seed'].count())

print('\n\nAnti-Seed/Negative examples:')
print(training_df[training_df.ExpansionLevel == 'AntiSeed'].count())









    



Seed/Positive examples:
publication_number    1631
title_text            1631
abstract_text         1631
claims_text           1631
description_text      1631
ExpansionLevel        1631
refs                  1631
cpcs                  1631
dtype: int64


Anti-Seed/Negative examples:
publication_number    10033
title_text            10033
abstract_text         10033
claims_text           10033
description_text      10033
ExpansionLevel        10033
refs                  10033
cpcs                  10033
dtype: int64

Preparing / Transforming Training Data

The following takes the input landscape training dataframe and transforms it into a format readable by TensorFlow and Keras.



In [10]:

    
import train_data
import tokenizer

# TODO: persist this tokenization data too
td = train_data.LandscapeTrainingDataUtil(training_df, w2v_runtime)
td.prepare_training_data(
    training_df.ExpansionLevel,
    training_df.abstract_text,
    training_df.refs,
    training_df.cpcs,
    0.8,
    50000,
    500)









    



Using TensorFlow backend.






    



No Keras tokenizer supplied so using vocab size (50000) and series to build new one
No Keras tokenizer supplied so using vocab size (500) and series to build new one
Randomizing training data
Creating NumPy arrays for train/test set out of randomized training data.
Train (embed) data shapes: train: (9331,), train labels shape: (9331,)
Test (embed) data shape: (2333,), test labels shape: (2333,)
doc lengths for embedding layer: median: 104, mean: 105.72210909870324, max: 514
Using sequence length of 514 to pad LSTM sequences.
Training data ready.






    Out[10]:





<train_data.LandscapeTrainingDataUtil at 0x1c3dfaf6198>

Show some sample training data



In [11]:

    
pos_idx = -1
neg_idx = -1
for idx in range(0, len(td.prepped_labels)):
    if td.prepped_labels[idx] == 0 and pos_idx < 0:
        pos_idx = idx
    if td.prepped_labels[idx] == 1 and neg_idx < 0:
        neg_idx = idx

    if pos_idx > -1 and neg_idx > -1:
        break

print('Showing positive example (instance #{}).'.format(pos_idx))
td.show_instance_details(pos_idx)
print('\n------------------------------------\n')
print('Showing negative example (instance #{}).'.format(neg_idx))
td.show_instance_details(neg_idx)









    



Showing positive example (instance #0).

Original: Image decoding method including: receiving an encoded bitstream including information of I and P frames; and executing motion compensation by synthesizing a predicted image of a current frame using motion vector information included in the encoded bitstream and a previously decoded reference image. The motion compensation includes calculating intensity values at points where no pixels actually exist by bilinear interpolation performable using a positive and negative rounding method, the bilinear interpolation using the rounding method specified by rounding method information included in the encoded bitstream of the current frame when the current frame is a P frame, wherein the rounding method information specifies one of two values specifying a positive or a negative rounding method; wherein the rounding method information consists of one bit; and wherein the rounding method information is not received from the encoded bitstream of the current frame when the current frame is an I frame.
Tokenized: image decoding method including receiving an encoded bitstream including information of i and p frames and executing motion compensation by synthesizing a predicted image of a current frame using motion vector information included in encoded bitstream and a previously decoded reference image motion compensation includes calculating intensity values at points where no pixels actually exist by bilinear interpolation performable using a positive and negative rounding method bilinear interpolation using rounding method specified by rounding method information included in encoded bitstream of current frame when current frame is a p frame wherein rounding method information specifies one of two values specifying a positive or a negative rounding method wherein rounding method information consists of one bit and wherein rounding method information is not received from encoded bitstream of current frame when current frame is an i frame
Integerized: [59, 1725, 30, 69, 149, 8, 1462, 6410, 69, 64, 2, 262, 3, 574, 1149, 3, 1992, 565, 1627, 12, 4586, 1, 3233, 59, 2, 1, 119, 109, 89, 565, 1223, 64, 896, 6, 1462, 6410, 3, 1, 1648, 2909, 264, 59, 565, 1627, 24, 1444, 1168, 422, 16, 727, 210, 538, 1089, 5028, 4703, 12, 22036, 3957, 26627, 89, 1, 797, 3, 914, 8746, 30, 22036, 3957, 89, 8746, 30, 1188, 12, 8746, 30, 64, 896, 6, 1462, 6410, 2, 119, 109, 52, 119, 109, 5, 1, 574, 109, 79, 8746, 30, 64, 4731, 21, 2, 56, 422, 3479, 1, 797, 11, 1, 914, 8746, 30, 79, 8746, 30, 64, 1373, 2, 21, 433, 3, 79, 8746, 30, 64, 5, 138, 231, 15, 1462, 6410, 2, 119, 109, 52, 119, 109, 5, 8, 262, 109]
LabelIntegerized: 0

------------------------------------

Showing negative example (instance #49).

Original: A surgical cutting and fastening instrument is disclosed. According to various embodiments, the instrument includes an end effector comprising an elongate channel, a clamping member pivotably connected to the channel, and a moveable cutting instrument for traversing the channel to cut an object clamped in the end effector by the clamping member when the clamping member is in a clamped position. The instrument may also comprise a main drive shaft assembly for actuating the cutting instrument in the end effector, a gear drive train connected to the main drive shaft assembly, and a motor for actuating the gear drive train. The instrument may also includes a closure trigger and a firing trigger, separate from the closure trigger, for actuating the motor when the firing trigger is retracted. Also, the instrument may comprise a mechanical closure system connected to the closure trigger and to the clamping member for causing the clamping member to pivot to the clamped position when the closure trigger is retracted.
Tokenized: a surgical cutting and fastening instrument is disclosed according to various embodiments instrument includes an end effector comprising an elongate channel a clamping member pivotably connected to channel and a moveable cutting instrument for traversing channel to cut an object clamped in end effector by clamping member when clamping member is in a clamped position instrument may also comprise a main drive shaft assembly for actuating cutting instrument in end effector a gear drive train connected to main drive shaft assembly and a motor for actuating gear drive train instrument may also includes a closure trigger and a firing trigger separate from closure trigger for actuating motor when firing trigger is retracted also instrument may comprise a mechanical closure system connected to closure trigger and to clamping member for causing clamping member to pivot to clamped position when closure trigger is retracted
Integerized: [1, 1782, 485, 3, 1527, 1067, 5, 90, 227, 4, 459, 418, 1067, 24, 8, 44, 5159, 78, 8, 1386, 181, 1, 956, 43, 3296, 60, 4, 181, 3, 1, 2191, 485, 1067, 9, 5452, 181, 4, 833, 8, 224, 2961, 6, 44, 5159, 12, 956, 43, 52, 956, 43, 5, 6, 1, 2961, 54, 1067, 39, 61, 771, 1, 233, 180, 169, 80, 9, 1356, 485, 1067, 6, 44, 5159, 1, 504, 180, 2091, 60, 4, 233, 180, 169, 80, 3, 1, 207, 9, 1356, 504, 180, 2091, 1067, 39, 61, 24, 1, 1060, 1497, 3, 1, 2323, 1497, 660, 15, 1060, 1497, 9, 1356, 207, 52, 2323, 1497, 5, 1993, 61, 1067, 39, 771, 1, 799, 1060, 27, 60, 4, 1060, 1497, 3, 4, 956, 43, 9, 1146, 956, 43, 4, 890, 4, 2961, 54, 52, 1060, 1497, 5, 1993]
LabelIntegerized: 1

Train Model

The following cells specify hyperparameters, the neural network architecture (using Keras) and actually trains and tests the model.

The model is generally composed of:

sequential word embeddings from the patent Abstract of each training instance
references one-hot encoded into a fully-connected layer
CPC codes one-hot encoded into a fully-connected layer
the above three layers concatenated into a final fully-connected layer
a final, single sigmoid layer with the classification result

Model Hyperparameters



In [12]:

    
batch_size = 64
dropout_pct = 0.4
num_epochs = 4

# Convolution (not currently used)
kernel_size = 5
filters = 64
pool_size = 4

# LSTM
lstm_size = 64

Build the Deep Neural Network



In [13]:

    
import model
import importlib
importlib.reload(model)

model = model.LandscapeModel(td, 'data', seed_name)
model.wire_model_functional(lstm_size, dropout_pct, sequence_len=td.sequence_len)









    



Building Functional model.
Done building graph.
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
refs_input (InputLayer)          (None, 50000)         0                                            
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 256)           12800256    refs_input[0][0]                 
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 256)           0           dense_1[0][0]                    
____________________________________________________________________________________________________
embed_input (InputLayer)         (None, 514)           0                                            
____________________________________________________________________________________________________
batch_normalization_1 (BatchNorm (None, 256)           1024        dropout_1[0][0]                  
____________________________________________________________________________________________________
embedding_1 (Embedding)          (None, 514, 300)      33072000    embed_input[0][0]                
____________________________________________________________________________________________________
elu_1 (ELU)                      (None, 256)           0           batch_normalization_1[0][0]      
____________________________________________________________________________________________________
LSTM_1 (LSTM)                    (None, 64)            93440       embedding_1[0][0]                
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 64)            16448       elu_1[0][0]                      
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 300)           19500       LSTM_1[0][0]                     
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 64)            0           dense_2[0][0]                    
____________________________________________________________________________________________________
dropout_4 (Dropout)              (None, 300)           0           dense_4[0][0]                    
____________________________________________________________________________________________________
batch_normalization_2 (BatchNorm (None, 64)            256         dropout_2[0][0]                  
____________________________________________________________________________________________________
batch_normalization_4 (BatchNorm (None, 300)           1200        dropout_4[0][0]                  
____________________________________________________________________________________________________
elu_2 (ELU)                      (None, 64)            0           batch_normalization_2[0][0]      
____________________________________________________________________________________________________
elu_4 (ELU)                      (None, 300)           0           batch_normalization_4[0][0]      
____________________________________________________________________________________________________
concatenated_layer (Concatenate) (None, 364)           0           elu_2[0][0]                      
                                                                   elu_4[0][0]                      
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 64)            23360       concatenated_layer[0][0]         
____________________________________________________________________________________________________
dropout_5 (Dropout)              (None, 64)            0           dense_5[0][0]                    
____________________________________________________________________________________________________
batch_normalization_5 (BatchNorm (None, 64)            256         dropout_5[0][0]                  
____________________________________________________________________________________________________
elu_5 (ELU)                      (None, 64)            0           batch_normalization_5[0][0]      
____________________________________________________________________________________________________
dense_6 (Dense)                  (None, 1)             65          elu_5[0][0]                      
====================================================================================================
Total params: 46,027,805
Trainable params: 12,954,437
Non-trainable params: 33,073,368
____________________________________________________________________________________________________
None

Train / Fit the Network



In [15]:

    
model.train_or_load_model(batch_size, num_epochs)









    



Model has not been trained yet.
Training model.
Train on 9331 samples, validate on 2333 samples
Epoch 1/4
9331/9331 [==============================] - 252s - loss: 0.2266 - acc: 0.9173 - precision: 0.9836 - recall: 0.9191 - fmeasure: 0.9453 - val_loss: 0.1251 - val_acc: 0.9833 - val_precision: 0.9813 - val_recall: 0.9995 - val_fmeasure: 0.9902
Epoch 2/4
9331/9331 [==============================] - 320s - loss: 0.0779 - acc: 0.9830 - precision: 0.9943 - recall: 0.9862 - fmeasure: 0.9901 - val_loss: 0.0420 - val_acc: 0.9889 - val_precision: 0.9878 - val_recall: 0.9995 - val_fmeasure: 0.9935
Epoch 3/4
9331/9331 [==============================] - 222s - loss: 0.0357 - acc: 0.9921 - precision: 0.9942 - recall: 0.9967 - fmeasure: 0.9954 - val_loss: 0.0312 - val_acc: 0.9931 - val_precision: 0.9955 - val_recall: 0.9965 - val_fmeasure: 0.9960
Epoch 4/4
9331/9331 [==============================] - 223s - loss: 0.0249 - acc: 0.9942 - precision: 0.9968 - recall: 0.9966 - fmeasure: 0.9966 - val_loss: 0.0278 - val_acc: 0.9931 - val_precision: 0.9935 - val_recall: 0.9985 - val_fmeasure: 0.9960
Saving model to data\video_codec\model.pb
Model persisted and ready for inference!

Evaluate the Model on the Test Set



In [16]:

    
score, acc, p, r, f1 = model.evaluate_model(batch_size)









    



2333/2333 [==============================] - 22s    

Test score: 0.0278
Test accuracy: 0.9931
Test p/r (f1): 0.99/1.00 (1.00)

Inference

NOTE: the below inference code assumes you've already trained a model in the currently active kernel such that the following variables are already set:

model is the Keras-trained DNN
l1_patents_df is the dataframe returned by the PatentLandscapeExpander

Future iterations will save model checkpoints and allow saving off models and then loading them later just for inference.

Load Inference Data

This randomly selects a subset of the L1 patents (that we got from reference and CPC expansions) and loads the details (text, CPCs, refs, etc.) from BigQuery.



In [17]:

    
subset_l1_pub_nums, l1_texts, padded_abstract_embeddings, refs_one_hot, cpc_one_hot = \
    expander.sample_for_inference(td, 0.2)









    



Loading inference data from filesystem at data\video_codec\landscape_inference_data.pkl

Classification of L1

We use a very rough heuristic to determine what the "ground truth" is for our L1 expansion -- whether something contains both the string "hair" and "dry"; if it does, we consider it a positive example; if not, we consider it a negative example. We can explore our positives, false positives, negatives, and false negatives in the confusion matrix and Pandas DataFrame below.



In [18]:

    
predictions = model.batch_predict(padded_abstract_embeddings, refs_one_hot, cpc_one_hot)



In [20]:

    
l1_texts['score'] = pd.DataFrame(predictions, columns=['score'])['score']
l1_texts['label'] = 'Seed'
l1_texts[['publication_number', 'score', 'label', 'abstract_text', 'refs', 'cpcs']]
l1_texts['has_video'] = l1_texts.abstract_text.str.lower().str.contains('video')
l1_texts['has_codec'] = l1_texts.abstract_text.str.lower().str.contains('codec')
l1_texts.loc[(l1_texts.has_video == False) & (l1_texts.has_codec == False), 'label'] = 'AntiSeed'
# Shows the CPC codes from an example item from our inference dataframe.
l1_texts['cpcs'].iloc[0]









    Out[20]:





'G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00,G06T9/001,G06T11/60,H04N19/44,G06T9/00'



In [21]:

    
classify_report, confusion_matrix = model.reports(l1_texts)

print(classify_report)

print('Confusion matrix:\n{}'.format(confusion_matrix))
model.show_confusion_matrix(confusion_matrix)









    



             precision    recall  f1-score   support

       seed       0.66      0.47      0.55       616
   antiseed       0.42      0.62      0.50       381

avg / total       0.57      0.53      0.53       997

Confusion matrix:
[[289 327]
 [146 235]]

Classified as 'hair dryer', but doesn't have the words 'hair' or 'dry' in the abstract

These are good candidates for false positives. You can also explore the negatives to see whether they seem accurate or not.



In [22]:

    
predicted_seed_class = l1_texts.score <= 0.5
predicted_nonseed_class = l1_texts.score > 0.5

l1_texts[
    (predicted_seed_class) & (l1_texts.has_video == True) & (l1_texts.has_codec == True)
  ].abstract_text.iloc[1]









    Out[22]:





'Embodiments of a method and system for motion compensation in decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps indicate which one of multiple prediction operations is to be used in performing motion compensation on particular units of data in a frame. In an embodiment, motion compensation is performed on a frame basis such that each of the multiple prediction operations is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved. Embodiments increase the efficiency of the motion compensation such as to allow decoding of high-compression-ratio encoded video data on personal computers or comparable equipment without special, additional decoding hardware.'

Ad-hoc classification

If you want to experiment with ad-hoc classification of data, edit the fields below for abstract text, references, and CPC codes to see how it influences the model's output. Somewhat counter-intuitively, a score closer to 0 means more likely to be an instance of the seed class (e.g., more likely to be about 'hair dryers'); conversely, the closer to 1, the more likely the model thinks the instance is part of the anti seed class.



In [24]:

    
# A conditioner infuser cartridge for use with a dryer attachment having an attachment end for engagement with a hair dryer barrel, an opposite air outlet end and a perforated portion between the ends having at least one air intake, the cartridge configured for engagement near the attachment end and including a conditioner element constructed and arranged for retaining a supply of vaporizable conditioner and a support frame receiving the conditioner element and securing same in the attachment.
# An electric lamp bulb or electric resistance coil is provided to radiate heat to one or more adjacent surfaces which have an active oxidation catalyst such that fumes and odors in the confines of a room that are drawn over the catalytic surface will be converted to less objectionable products. A catalytic device with an incandescent light bulb type of heating element is of particular advantage in that it can readily be screwed into a lamp base or mounted in other forms of current supplying receptacles and, in addition to a light source, will provide a heat emitting surface for heating the catalyst surface and inducing natural air convection current flow past the catalytic surface. Also, a fume control device which utilizes a resistance heating coil can readily provide both radiant heat and convection heat so that there will be the dual function of fume oxidation from air flow past a heated catalyst surface and radiant heat into a room area. Various types of catalyst coatings and/or catalytic wrappings may be used on the refractory surfaces which will be radiantly heated by the manually mountable-demountable form of bulb or resistance heating element

#A45D20/10,A45D20/08,A45D20/10,A45D20/08,A45D20/10,A45D20/08,A45D20/10,A45D20/08,A45D20/10,A45D20/08

text = 'Embodiments of a method and system for motion compensation in decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps indicate which one of multiple prediction operations is to be used in performing motion compensation on particular units of data in a frame. In an embodiment, motion compensation is performed on a frame basis such that each of the multiple prediction operations is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved. Embodiments increase the efficiency of the motion compensation such as to allow decoding of high-compression-ratio encoded video data on personal computers or comparable equipment without special, additional decoding hardware.'
refs = ''
cpcs = ''

model.predict(td, text, refs, cpcs)









    Out[24]:





array([[ 0.18460934]], dtype=float32)

	pub_num	publication_number	family_id	priority_date	title_text	abstract_text	claims_text	description_text	refs	cpcs	ExpansionLevel
0	7929782	US-7929782-B2	15501626	19970609	Recording medium having recorded thereon coded...	Image decoding method including: receiving an ...	unused	unused	US-5054103-A,US-5054103-A,US-5054103-A,US-5054...	G06T7/223,G06T7/262,H04N19/52,H04N19/44,H04N19...	Seed
1	7801368	US-7801368-B2	15501626	19970609	Recording medium having recorded thereon coded...	An image decoder wherein motion compensation i...	unused	unused	US-5054103-A,US-5054103-A,US-5054103-A,US-5054...	G06T7/262,G06T7/223,H04N19/52,H04N19/44,H04N19...	Seed
2	7724967	US-7724967-B2	15501626	19970609	Recording medium having recorded thereon coded...	A recording medium having recorded thereon inf...	unused	unused	US-5054103-A,US-5054103-A,US-5054103-A,US-5054...	G06T7/223,G06T7/262,H04N19/52,H04N19/44,H04N19...	Seed
3	7801367	US-7801367-B2	15501626	19970609	Recording medium having recorded thereon coded...	An image decoder including: a demultiplexer ex...	unused	unused	US-5054103-A,US-5054103-A,US-5054103-A,US-5054...	G06T7/223,G06T7/262,H04N19/52,H04N19/44,H04N19...	Seed
4	8611685	US-8611685-B2	15501626	19970609	Recording medium having recorded thereon coded...	Computer-readable medium having an image decod...	unused	unused	JP-H0198729-A,JP-H0198729-A,JP-H0198729-A,JP-H...	G06T7/223,G06T7/262,H04N19/52,H04N19/44,H04N19...	Seed