Wayne Nixalo - 2018/3/14-15

My wanting to explore the time-complexity of the Bellman Equation (here in 6.S094 (2018) Lecture 3) turned into Python OOP practice.

https://www.python-course.eu/graphs_python.php

http://ls.pwd.io/2014/08/singly-and-doubly-linked-lists-in-python/



In [556]:

    
class Graph:
    def __init__(self, depth, actions):
        self.depth = depth
        self.actions = actions
        self.graph = {}
        
        self.__addNode('ROOT', 'STATE_00', 0.0)
        self.__populate()

    def __addNode(self, sup, sub, w):  # superior, subordinate, weight
        if not sup in self.keys():
            self.graph[sup] = []
        self.graph[sup].append([sub, w])
        self.graph[sub] = []
        self.graph[sub].append([sup, w])
        
    def __populate(self):        
        for d in range(self.depth):
            states = self.getLevel(d)
            for state in states:
                # action-path suffix
                suffix    = state.split('_')[-1][1:]
                suffix    = ('', suffix)[suffix[-1].isalpha()]
                # state name + depth level
                base_name = state.split('_')[0] + '_' + str(d + 1)
                # add actions
                for a in range(len(self.actions)):
                    self.__addNode(state, 
                                   base_name+suffix+chr(ord('A') + a), self.actions[a])
                
    def __getitem__(self, key):
        if key == '' or key == []:
            return 'ENDE'
        key = key.upper()
        return self.graph[key]
                
    def getLevel(self, level):
        return [key for key in graph.keys() if key.split('_')[-1][0].isdigit() and int(key.split('_')[-1][0]) == level]
                
    def superior(self, key):
        """Display incomming (superior) vertex"""
        key = key.upper()
        return self.graph[key][0]
    
    def subordinates(self, key):
        """Display all outgoing vertices."""
        key = key.upper()
        return self.graph[key][1:]
    
    def keys(self):
        return self.graph.keys()
    
    def __depthPrint(self, key, depth=0):
        """Recursively Print Depth-First structure of graph"""
        depth += 1
        subs = self.subordinates(key)   # subs = [['state_xx', num], ...]
        for sub in subs:
            indent = "> "*depth
            print(indent + sub[0] + f' - {str(sub[1])}')
            self.__depthPrint(sub[0], depth)
        return
                
    def display(self):  # breadth-first display of graph
        # arrows indicate depth of State
        for i,key in enumerate(self.keys()):
            # indent by depth -- NOTE: works for depth < 10
            if key != 'ROOT':
                indent = "  "*(max(0, int(key.split("_")[-1][0])) if key != "ROOT" else 0)
                print(f'{"> "*(len(indent)//2)}{key}')
                print(f'{indent+"  "}{self.subordinates(key) if key != "ROOT" else self.__getitem__(key):}')
                
    def displayNested(self):  # depth-first display of graph
        print(self.graph['ROOT'][0][0])  # self.graph['root'] = [['state_00', num]]
        self.__depthPrint(self.graph['ROOT'][0][0])
        
    def getCost(self, state):
        return state[1]
        
    def currentCost(self, key):
        cost = 0.0
        state = self.graph[key]
        while state[0] != 'ROOT':
            state = self.superior(key)
            cost += state[1]
            key = state[0]
        return cost
    
    def nextQMove(self, state):
        learning_rate = 0.1
        discount_fact = 0.5
        
        if type(state[0]) == list:  # nextQMove returns `state` but __getitem__ returns `[state]`
            state = state[0]
        
        # find highest-reward move
        current_cost = self.currentCost(state[0])
        states = self.subordinates(state[0])
        
        if len(states) == 0:
            return None
        
        next_state = max(enumerate(states), key=(lambda x: x[1]))
        next_state = next_state[1]
        
        # Bellman QLearning Equation  -  I *think* it's like this
        # next_cost = currentCost(state[0]) + 
        #     learning_rate * (next_state[1] + discount_fact*next_state[1] - current_cost)
        
        # NOTE: why is the above necessary if I can get the cost of an action, and my 
        #       current state cost?
    
        return next_state

$$Q_{t+1}(s_t,a_t) = Q_t(s_t,a_t) + α\big(R_{t+1}+γ max_a Q_t(s_{t+1},a) - Q_t(s_t, a_t)\big)$$



In [557]:

    
graph['root']









    Out[557]:





[['STATE_00', 0.0]]



In [567]:

    
graph = Graph(3, [2,3,1])

state = graph['root'][0]

policy_cost = 0.0

while state != None:
    policy_cost += state[1]
    print(state)
    state = graph.nextQMove(state)



print(policy_cost)









    



['STATE_00', 0.0]
['STATE_1C', 1]
['STATE_2CC', 1]
['STATE_3CCC', 1]
3.0

Right. So it kind does its job. But that's not exactly what it's supposed to do, I think. A better implementation would have this in a full graph - which means I'd have to expand my Graph class to be more than just an n-ary tree. I could use the dictionary structure to build an in_vertex, out_vertex abstraction.

I'd also need to understand that Q-Learning equation a bit more. What does γ*max(Qt(st1,a) - Qt(st,at)) mean? Is it the option that adds the greatest reward? If so why is there an Rt1 term added to it? Isn't that the reward of the next action?

What is really meant by 'Old State'? I understand 'New State' ie: the new state.. but how do you add a reward value onto the Old State?

Anyway, I kinda get the gist of this toy problem. This was a good exercise in writing OOP code in Python.

Oh I get it now. Rt1 is the reward gotten for taking action a at state s (current state). It's then a sum of that reward with the maximum acheivable summed reward of all future actions & states, discounted, and minus the current states .. cost, or something like that. So yeah it does do a big bit of computation because for every move it calculates the potential of every possible subsequent move, and of those chooses the 'best' one. Got it.

scratch work below:



In [494]:

    
# https://stackoverflow.com/a/28906262
a = [3,2,1,4,5]
print(a)

print(list(enumerate(a)))

print(max(enumerate(a), key=(lambda x: x[1])))









    



[3, 2, 1, 4, 5]
[(0, 3), (1, 2), (2, 1), (3, 4), (4, 5)]
(4, 5)



In [495]:

    
graph = Graph(3, [2,3,1])
# graph = Graph(1, [1,1])



In [496]:

    
state = graph['root']
graph.nextQMove(state)









    Out[496]:





['STATE_1C', 1]



In [497]:

    
graph.displayNested()









    



STATE_00
> STATE_1A - 2
> > STATE_2AA - 2
> > > STATE_3AAA - 2
> > > STATE_3AAB - 3
> > > STATE_3AAC - 1
> > STATE_2AB - 3
> > > STATE_3ABA - 2
> > > STATE_3ABB - 3
> > > STATE_3ABC - 1
> > STATE_2AC - 1
> > > STATE_3ACA - 2
> > > STATE_3ACB - 3
> > > STATE_3ACC - 1
> STATE_1B - 3
> > STATE_2BA - 2
> > > STATE_3BAA - 2
> > > STATE_3BAB - 3
> > > STATE_3BAC - 1
> > STATE_2BB - 3
> > > STATE_3BBA - 2
> > > STATE_3BBB - 3
> > > STATE_3BBC - 1
> > STATE_2BC - 1
> > > STATE_3BCA - 2
> > > STATE_3BCB - 3
> > > STATE_3BCC - 1
> STATE_1C - 1
> > STATE_2CA - 2
> > > STATE_3CAA - 2
> > > STATE_3CAB - 3
> > > STATE_3CAC - 1
> > STATE_2CB - 3
> > > STATE_3CBA - 2
> > > STATE_3CBB - 3
> > > STATE_3CBC - 1
> > STATE_2CC - 1
> > > STATE_3CCA - 2
> > > STATE_3CCB - 3
> > > STATE_3CCC - 1



In [444]:

    
graph.currentCost(graph.subordinates('STATE_00')[0][0])









    Out[444]:





2.0



In [445]:

    
graph.currentCost('STATE_00')









    Out[445]:





0.0



In [432]:

    
graph.subordinates('STATE_00')[0]









    Out[432]:





['STATE_1A', 2]



In [433]:

    
graph.superior('STATE_1A')









    Out[433]:





['STATE_00', 2]



In [434]:

    
graph.superior(['State_1a',2][0])









    Out[434]:





['STATE_00', 2]



In [435]:

    
graph.subordinates('STATE_00')[0][0]









    Out[435]:





'STATE_1A'



In [398]:

    
# essentially depth-first display
graph.displayNested()









    



STATE_00
> STATE_1A
> > STATE_2AA
> > > STATE_3AAA
> > > STATE_3AAB
> > > STATE_3AAC
> > STATE_2AB
> > > STATE_3ABA
> > > STATE_3ABB
> > > STATE_3ABC
> > STATE_2AC
> STATE_1B
> > STATE_2BA
> > > STATE_3BAA
> > > STATE_3BAB
> > > STATE_3BAC
> > STATE_2BB
> > > STATE_3BBA
> > > STATE_3BBB
> > > STATE_3BBC
> > STATE_2BC
> STATE_1C



In [396]:

    
# basically breadth-first display
graph.display()









    



STATE_00
  [['STATE_1A', 2], ['STATE_1B', 3]]
> STATE_1A
    [['STATE_2AA', 2], ['STATE_2AB', 3]]
> STATE_1B
    [['STATE_2BA', 2], ['STATE_2BB', 3]]
> > STATE_2AA
      [['STATE_3AAA', 2], ['STATE_3AAB', 3]]
> > STATE_2AB
      [['STATE_3ABA', 2], ['STATE_3ABB', 3]]
> > STATE_2BA
      [['STATE_3BAA', 2], ['STATE_3BAB', 3]]
> > STATE_2BB
      [['STATE_3BBA', 2], ['STATE_3BBB', 3]]
> > > STATE_3AAA
        []
> > > STATE_3AAB
        []
> > > STATE_3ABA
        []
> > > STATE_3ABB
        []
> > > STATE_3BAA
        []
> > > STATE_3BAB
        []
> > > STATE_3BBA
        []
> > > STATE_3BBB
        []



In [333]:

    
for state in graph.subordinates(graph.getLevel(0)[0]):
    print(state)









    



['STATE_1A', 2]
['STATE_1B', 3]



In [301]:

    
graph.keys()









    Out[301]:





dict_keys(['root', 'STATE_00', 'STATE_1A', 'STATE_1B', 'STATE_2AA', 'STATE_2AB', 'STATE_2BA', 'STATE_2BB', 'STATE_3AAA', 'STATE_3AAB', 'STATE_3ABA', 'STATE_3ABB', 'STATE_3BAA', 'STATE_3BAB', 'STATE_3BBA', 'STATE_3BBB'])



In [321]:

    
graph.display()









    



STATE_00
  [['STATE_1A', 2], ['STATE_1B', 3]]
> STATE_1A
    [['STATE_2AA', 2], ['STATE_2AB', 3]]
> STATE_1B
    [['STATE_2BA', 2], ['STATE_2BB', 3]]
> > STATE_2AA
      [['STATE_3AAA', 2], ['STATE_3AAB', 3]]
> > STATE_2AB
      [['STATE_3ABA', 2], ['STATE_3ABB', 3]]
> > STATE_2BA
      [['STATE_3BAA', 2], ['STATE_3BAB', 3]]
> > STATE_2BB
      [['STATE_3BBA', 2], ['STATE_3BBB', 3]]
> > > STATE_3AAA
        []
> > > STATE_3AAB
        []
> > > STATE_3ABA
        []
> > > STATE_3ABB
        []
> > > STATE_3BAA
        []
> > > STATE_3BAB
        []
> > > STATE_3BBA
        []
> > > STATE_3BBB
        []



In [273]:

    
graph.getLevel(0)









    Out[273]:





['STATE_00', 'STATE_01', 'STATE_02']



In [ ]:



In [ ]:



In [264]:

    
graph['STATE_1A']









    Out[264]:





[['STATE_02', 3]]



In [261]:

    
graph.display()









    



STATE_00
  [['STATE_1A', 3], ['STATE_1B', 4], ['STATE_1C', 5], ['STATE_2A', 3], ['STATE_2B', 4], ['STATE_2C', 5], ['STATE_3A', 3], ['STATE_3B', 4], ['STATE_3C', 5], ['STATE_4A', 3], ['STATE_4B', 4], ['STATE_4C', 5]]
> STATE_1A
    []
> STATE_1B
    []
> STATE_1C
    []
STATE_01
  [['STATE_1B', 4], ['STATE_1C', 5], ['STATE_2A', 3], ['STATE_2B', 4], ['STATE_2C', 5], ['STATE_3A', 3], ['STATE_3B', 4], ['STATE_3C', 5], ['STATE_4A', 3], ['STATE_4B', 4], ['STATE_4C', 5]]
STATE_02
  [['STATE_1B', 4], ['STATE_1C', 5], ['STATE_2A', 3], ['STATE_2B', 4], ['STATE_2C', 5], ['STATE_3A', 3], ['STATE_3B', 4], ['STATE_3C', 5], ['STATE_4A', 3], ['STATE_4B', 4], ['STATE_4C', 5]]
> > STATE_2A
      []
> > STATE_2B
      []
> > STATE_2C
      []
> > > STATE_3A
        []
> > > STATE_3B
        []
> > > STATE_3C
        []
> > > > STATE_4A
          []
> > > > STATE_4B
          []
> > > > STATE_4C
          []



In [209]:

    
graph.keys()









    Out[209]:





dict_keys(['root', 'STATE_00', 'STATE_1A', 'STATE_1B', 'STATE_01', 'STATE_2A', 'STATE_2B', 'STATE_02', 'STATE_3A', 'STATE_3B'])



In [ ]:

    
[k]



In [235]:

    
depth = 0
states = [key for key in graph.keys() if key.split('_')[-1][0].isdigit() and int(key.split('_')[-1][0]) == depth]
states









    Out[235]:





['STATE_00', 'STATE_01', 'STATE_02']



In [227]:

    
states = [key for key in graph.keys()]; states









    Out[227]:





['root',
 'STATE_00',
 'STATE_1A',
 'STATE_1B',
 'STATE_01',
 'STATE_2A',
 'STATE_2B',
 'STATE_02',
 'STATE_3A',
 'STATE_3B']



In [233]:

    
[key for key in states if key.split('_')[-1][0].isdigit()]









    Out[233]:





['STATE_00',
 'STATE_1A',
 'STATE_1B',
 'STATE_01',
 'STATE_2A',
 'STATE_2B',
 'STATE_02',
 'STATE_3A',
 'STATE_3B']



In [232]:

    
states[1].split('_')[-1][0].isdigit()









    Out[232]:





True



In [ ]:



In [208]:

    
graph['root'][0][0]









    Out[208]:





'STATE_00'



In [65]:

    
graph['state_00']









    Out[65]:





[['root', 0.0], ['state_1A', 2], ['state_1B', 3]]



In [66]:

    
graph.graph









    Out[66]:





{'root': [['state_00', 0.0]],
 'state_00': [['root', 0.0], ['state_1A', 2], ['state_1B', 3]],
 'state_1A': [['state_00', 2]],
 'state_1B': [['state_00', 3]]}



In [67]:

    
graph.keys()









    Out[67]:





dict_keys(['root', 'state_00', 'state_1A', 'state_1B'])



In [68]:

    
graph['root']









    Out[68]:





[['state_00', 0.0]]



In [69]:

    
graph['state_00']









    Out[69]:





[['root', 0.0], ['state_1A', 2], ['state_1B', 3]]



In [70]:

    
graph['state_1A']









    Out[70]:





[['state_00', 2]]



In [71]:

    
graph['state_1B']









    Out[71]:





[['state_00', 3]]



In [72]:

    
graph.superior('state_00')









    Out[72]:





['root', 0.0]

Alrighty. Let's implement a 4-state deep graph with 3 actions at each level.



In [175]:

    
# action weights
actions = [3, 4, 5]
depth   = 4

graph = Graph(depth, actions)



In [176]:

    
graph.graph









    Out[176]:





{'STATE_00': [['root', 0.0],
  ['STATE_1A', 3],
  ['STATE_1B', 4],
  ['STATE_1C', 5]],
 'STATE_01': [['STATE_2A', 3], ['STATE_2B', 4], ['STATE_2C', 5]],
 'STATE_02': [['STATE_3A', 3], ['STATE_3B', 4], ['STATE_3C', 5]],
 'STATE_03': [['STATE_4A', 3], ['STATE_4B', 4], ['STATE_4C', 5]],
 'STATE_1A': [['STATE_00', 3]],
 'STATE_1B': [['STATE_00', 4]],
 'STATE_1C': [['STATE_00', 5]],
 'STATE_2A': [['STATE_01', 3]],
 'STATE_2B': [['STATE_01', 4]],
 'STATE_2C': [['STATE_01', 5]],
 'STATE_3A': [['STATE_02', 3]],
 'STATE_3B': [['STATE_02', 4]],
 'STATE_3C': [['STATE_02', 5]],
 'STATE_4A': [['STATE_03', 3]],
 'STATE_4B': [['STATE_03', 4]],
 'STATE_4C': [['STATE_03', 5]],
 'root': [['STATE_00', 0.0]]}



In [177]:

    
graph.display()









    



STATE_00
  [['STATE_1A', 3], ['STATE_1B', 4], ['STATE_1C', 5]]
  STATE_1A
    []
  STATE_1B
    []
  STATE_1C
    []
STATE_01
  [['STATE_2B', 4], ['STATE_2C', 5]]
    STATE_2A
      []
    STATE_2B
      []
    STATE_2C
      []
STATE_02
  [['STATE_3B', 4], ['STATE_3C', 5]]
      STATE_3A
        []
      STATE_3B
        []
      STATE_3C
        []
STATE_03
  [['STATE_4B', 4], ['STATE_4C', 5]]
        STATE_4A
          []
        STATE_4B
          []
        STATE_4C
          []



In [ ]:

    
#     def __populate(self):
#         base_name = 'STATE_'
#         for a in range(len(actions)):
#             self.__addNode('STATE_00',
#                            'base_name' + str(d+1) + chr(ord('A') + a),
#                            self.actions[a])
#         for d in range(1, depth):
#             for a in range(len(actions)):
#                 base_name = 'STATE_' + str(d) + chr(ord('A') + a)
#                 # TODO   

#         STATE_00
#             \----STATE_1A
#                     \--------STATE_2AA
#                      \-------STATE_2AB
#                       \------STATE_2AC
#              \---STATE_1B
#                     \--------STATE_2BA
#                      \-------STATE_2BB
#                       \------STATE_2BC
#               \--STATE_1C
#                     \--------STATE_2CA
#                      \-------STATE_2CB
#                       \------STATE_2CC
        
#         for d in range(self.depth):
#             self.__addNode(
        
        
        
#         node_name = self.__getitem__('root')[0][0]
#         for d in range(self.depth):
#             for a in range(len(self.actions)):
#                 self.__addNode(node_name, )
#                 self.__addNode('STATE_0' + str(d),
#                              'STATE_' + str(d+1) + chr(ord('A') + a),
#                              self.actions[a])