Using ticdat to build modular engines

The goal of the ticdat package is to facilitate solve engines that are modular and robust. For example, the multicommodity netflow.py engine can read and write from a variety of file types when run from the the command line. It can also be run from a Python script that contains embedded static data, or from a script that reads and writes from a system-of-record data source such as an ERP system.

With regards to the latter, we should note that Python is one of the most popular "glue" languages. The market has recognized that Python scripts are easy to write, manage data with intuitive programming syntax, and can be connected to nearly any data source.

The ticdat package can easily be used in any Python glue script. One way to do this is to exploit ticdat's ability to recognize data tables as list-of-lists. The inner lists contain data values in the field order defined by by the TicDatFactory (i.e. netflow.input_schema).

For example, suppose the netflow engine needs to connect to an Oracle database for a daily automated solve. The integration engineer can use the cx_Oracle package (or something equivalent) to turn system data into a list-of-lists for each input table. These data structures can then be used to create a TicDat object that can be passed as input data to netflow.solve. The solution TicDat object returned by netflow.solve can then be converted back into a list-of-lists representation of each solution report table. (The list-of-lists strategy is just one approach. It might make sense to convert system-of-record data into pandas.DataFrame objects, and then use these DataFrames to build the TicDat object.)

We demonstrate this approach without explicit references to cx_Oracle. By demonstrating that ticdat is compatible with list-of-list/DataFrame table representations we thus show that ticdat is compatible with any data source that can be connected to Python, and also with human readable static data.



In [1]:

    
commodities = [['Pencils', 0.5], ['Pens', 0.2125]]

# a one column table can just be a simple list 
nodes = ['Boston', 'Denver', 'Detroit', 'New York',  'Seattle']

cost = [['Pencils', 'Denver', 'Boston', 10.0],
        ['Pencils', 'Denver', 'New York', 10.0],
        ['Pencils', 'Denver', 'Seattle', 7.5],
        ['Pencils', 'Detroit', 'Boston', 2.5],
        ['Pencils', 'Detroit', 'New York', 5.0],
        ['Pencils', 'Detroit', 'Seattle', 15.0],
        ['Pens', 'Denver', 'Boston', 15.0],
        ['Pens', 'Denver', 'New York', 17.5],
        ['Pens', 'Denver', 'Seattle', 7.5],
        ['Pens', 'Detroit', 'Boston', 5.0],
        ['Pens', 'Detroit', 'New York', 5.0],
        ['Pens', 'Detroit', 'Seattle', 20.0]]

inflow = [['Pencils', 'Boston', -200],
          ['Pencils', 'Denver', 240],
          ['Pencils', 'Detroit', 200],
          ['Pencils', 'New York', -200],
          ['Pencils', 'Seattle', -40],
          ['Pens', 'Boston', -160],
          ['Pens', 'Denver', 160],
          ['Pens', 'Detroit', 240],
          ['Pens', 'New York', -120],
          ['Pens', 'Seattle', -120]]

An integration engineer might prefer to copy system-of-records data into pandas.DataFrame objects. Note that pandas is itself capable of reading directly from various SQL databases, although it usually needs a supporting package like cx_Oracle.



In [2]:

    
from pandas import DataFrame
arcs = DataFrame({"Source": ["Denver", "Denver", "Denver", "Detroit", "Detroit", "Detroit",], 
                 "Destination": ["Boston", "New York", "Seattle", "Boston", "New York", 
                                 "Seattle"], 
                 "Capacity": [120, 120, 120, 100, 80, 120]}).set_index(["Source", "Destination"])
arcs









    Out[2]:







  
    
      
      
      Capacity
    
    
      Source
      Destination
      
    
  
  
    
      Denver
      Boston
      120
    
    
      New York
      120
    
    
      Seattle
      120
    
    
      Detroit
      Boston
      100
    
    
      New York
      80
    
    
      Seattle
      120

Next we create a TicDat input data object from the list-of-lists/DataFrame representations.



In [3]:

    
from netflow import input_schema, solve, solution_schema
dat = input_schema.TicDat(commodities=commodities, nodes=nodes, cost=cost, arcs=arcs, 
                          inflow=inflow)

We now create a TicDat solution data object by calling solve.



In [4]:

    
sln = solve(dat)









    



Optimize a model with 16 rows, 12 columns and 36 nonzeros
Coefficient statistics:
  Matrix range     [2e-01, 1e+00]
  Objective range  [2e+00, 2e+01]
  Bounds range     [0e+00, 0e+00]
  RHS range        [4e+01, 2e+02]
Presolve removed 12 rows and 6 columns
Presolve time: 0.00s
Presolved: 4 rows, 8 columns, 12 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    2.8823529e+02   1.717647e+02   0.000000e+00      0s
       3    5.6275000e+03   0.000000e+00   0.000000e+00      0s

Solved in 3 iterations and 0.01 seconds
Optimal objective  5.627500000e+03

We now create a list of list representation of the solution data object.



In [5]:

    
from ticdat.jsontd import make_json_dict
sln_lists = make_json_dict(solution_schema, sln)

Here we demonstrate that sln_lists is a dictionary mapping table name to list-of-lists of solution report data.



In [6]:

    
import pprint
for sln_table_name, sln_table_data in sln_lists.items():
    print "\n\n**\nSolution Table %s\n**"%sln_table_name
    pprint.pprint(sln_table_data)









    




**
Solution Table flow
**
[['Pencils', 'Detroit', 'New York', 51.0],
 ['Pens', 'Detroit', 'New York', 120.0],
 ['Pens', 'Denver', 'Boston', 40.0],
 ['Pencils', 'Denver', 'Boston', 51.0],
 ['Pens', 'Denver', 'Seattle', 120.0],
 ['Pens', 'Detroit', 'Boston', 120.0],
 ['Pencils', 'Denver', 'New York', 149.0],
 ['Pencils', 'Denver', 'Seattle', 40.0],
 ['Pencils', 'Detroit', 'Boston', 149.0]]


**
Solution Table parameters
**
[['Total Cost', 5627.5]]

Here we demonstrate how DataFrame objects can be generated from the solution data object.



In [7]:

    
sln_pandas = solution_schema.copy_to_pandas(sln)
sln_pandas.flow









    Out[7]:







  
    
      
      
      
      Quantity
    
    
      Commodity
      Source
      Destination
      
    
  
  
    
      Pencils
      Denver
      Boston
      51.0
    
    
      New York
      149.0
    
    
      Seattle
      40.0
    
    
      Detroit
      Boston
      149.0
    
    
      New York
      51.0
    
    
      Pens
      Denver
      Boston
      40.0
    
    
      Seattle
      120.0
    
    
      Detroit
      Boston
      120.0
    
    
      New York
      120.0

Using ticdat to build robust engines

The preceding section demonstrated how we can use ticdat to build modular engines. We now demonstrate how we can use ticdat to build engines that check solve pre-conditions, and are thus robust with respect to data integrity problems.

First, lets violate our (somewhat artificial) rule that the commodity volume must be positive.



In [8]:

    
dat.commodities["Pens"] = 0

The input_schema can not only flag this problem, but give us a useful data structure to examine.



In [9]:

    
input_schema.find_data_type_failures(dat)









    Out[9]:





{TableField(table='commodities', field='Volume'): ValuesPks(bad_values=(0,), pks=('Pens',))}

Next, lets add a Cost record for a non-existent commodity and see how input_schema flags this problem.



In [10]:

    
dat.cost['Crayons', 'Detroit', 'Seattle'] = 10
input_schema.find_foreign_key_failures(dat, verbosity="Low")









    Out[10]:





{('cost', 'commodities', ('Commodity', 'Name')): (('Crayons',),
  (('Crayons', 'Detroit', 'Seattle'),))}

In real life, data integrity failures can typically be grouped into a small number of categories. However, the number of failures in each category might be quite large. ticdat creates data structures for each of these categories that can themselves be examined programmatically. As a result, an analyst can leverage the power of Python to detect patterns in the data integrity problems.

		Capacity
Source	Destination
Denver	Boston	120
	New York	120
	Seattle	120
Detroit	Boston	100
	New York	80
	Seattle	120

			Quantity
Commodity	Source	Destination
Pencils	Denver	Boston	51.0
		New York	149.0
		Seattle	40.0
	Detroit	Boston	149.0
	Detroit	New York	51.0
Pens	Denver	Boston	40.0
	Denver	Seattle	120.0
	Detroit	Boston	120.0
	Detroit	New York	120.0