Process Handbook

This notebook processes the example handbook (CAHRC_HR_Manual.txt). This is done in a simple fashion using the following heuristic: If a line of text consisting of less than 5 words is followed by paragraphs of text the assume the line of text with less than 5 words is a topic (i.e. the topic of a question an employee might ask) and that the paragraphs of text are the answer to that question (called action_text for the lack of a better term).

When a topic and action_text are found these are stored in Cloud Datastore as a key-value pair with the topic as the key and the action_text as the value.


In [ ]:
!pip uninstall -y google-cloud-datastore

In [ ]:
!pip install google-cloud-datastore

Hit Reset Session > Restart, then resume with the following cells.


In [ ]:
from google.cloud import datastore

In [ ]:
datastore_client = datastore.Client()

In [ ]:
employee_handbook = open('CAHRC_HR_Manual.txt', 'r')
while True:
  
  topic = employee_handbook.readline()
  if not(topic):
    break
  
  if (topic != '\r\n') and (len(topic.split(' ')) < 5):
  
    action_text = ''
        
    last_line = ''
    line = employee_handbook.readline()
    
    while (last_line != '\r\n') and (line != '\r\n') and (len(line.split(' ')) > 5):
      
      action_text += line
      last_line = line
      line = employee_handbook.readline()
      
    if action_text != '':
      
      kind = 'Topic'
      topic_key = datastore_client.key(kind, topic.strip().lower())

      topic = datastore.Entity(key=topic_key)
      topic['action_text'] = action_text

      datastore_client.put(topic)

      print('Saved {}: {}'.format(topic.key.name, topic['action_text']))

In [ ]: