For this example, we'll demonstrate using `ticdat`

and `gurobipy`

to troubleshoot an infeasible model. In particular, we will use the combination of foregin key relationships and `obfusimplify`

to rename the parent entities of the data instance. This example is based on the Gurobi netflow model. We created our own version of netflow.py in order to turn the netflow solve engine into a modular component. The two code instances are close to identical, and are a good demonstration of the ease with which data scientists new to Python can use both `ticdat`

and `gurobipy`

to make readable, modular, solve engines.

We begin by importing the netflow components we need to do our work. `dataFactory`

defines the netflow schema, `solve`

tries to solve a data instance, and `create_model`

is a troubleshooting routine (and sub-step of `solve`

) that creates a `gurobipy.Model`

and a dictionary of variables from that `Model`

.

```
In [1]:
```from netflowmodel import dataFactory, solve, create_model

`netflow.xlsx`

file has data sheets that we think would populate a sound netflow data instance. Lets quickly perform the three basic integrity checks - row duplication, cross table reference failure, and data field validation.

```
In [2]:
```assert not dataFactory.xls.find_duplicates("netflow.xlsx")

```
In [3]:
```dat = dataFactory.xls.create_tic_dat("netflow.xlsx")

```
In [4]:
```assert not dataFactory.find_foreign_key_failures(dat)

```
In [5]:
```assert not dataFactory.find_data_type_failures(dat)

So far so good! No errors. Lets do a quick look at row counts for each table.

```
In [6]:
```{t:len(getattr(dat, t)) for t in dataFactory.all_tables}

```
Out[6]:
```

`solve`

routine doesn't inadvertently edit it's input data.

```
In [7]:
```dataFactory.freeze_me(dat)

```
Out[7]:
```

```
In [8]:
```soln = solve(dat)

```
```

Infreasible? Inconceivable!

Ok, I suppose infeasible models are not only conceivable, but also inevitable. Now we have to troubleshoot. A good first step for infeasibility troubleshooting is creating the Irreducible Inconsistent Subsystem (IIS) of the underlying MIP model. Lucky for us, `netflowmodel`

makes it easy us to turn a data instance into a `gurobipy`

model, and `gurobipy`

makes it easy for us to create the IIS of that model.

```
In [9]:
```model,_ = create_model(dat)

```
In [10]:
```model.computeIIS()

```
```

`gurobipy`

makes this so easy you hardly need to consult the docs.

```
In [11]:
```model.write("firstFail.ilp")

`firstFail.ilp`

. It's not very easy to read, but it looks like a lot of conservation of flow constraints.

```
node_00000567234-H7493804610407033_111BFZK2233-7788453265:
flow_00000567234-H7493804610407033_111BFZK2233-778845329_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-7788453210_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-778845321_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-778845328_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-778845323_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-778845324_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-778845325_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-778845322_111BFZK2233-7788453265
+ flow_00000567234-H7493804610407033_111BFZK2233-778845326_111BFZK2233-7788453265
- forcedToZero = 415
```

```
node_00000567234-H7493804610407033_111BFZK2233-7788453212:
flow_00000567234-H7493804610407033_111BFZK2233-778845329_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-778845324_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-778845325_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-778845321_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-778845326_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-778845323_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-778845328_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-778845322_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453212
+ flow_00000567234-H7493804610407033_111BFZK2233-7788453210_111BFZK2233-7788453212
- forcedToZero = 357
```

```
node_00000567234-H7493804610407033_111BFZK2233-778845327:
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453296
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453264
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453218
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453211
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453255
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453292
....
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453241
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453258
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453251
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453234
- flow_00000567234-H7493804610407033_111BFZK2233-778845327_111BFZK2233-7788453283
+ forcedToZero = -5220
```

So what's going on here? `forcedToZero`

is a placeholder variable `netflowmodel`

uses in order to promote readability. It is fixed at zero by lower and upper bounds. I use it insure that each conservation of flow constraint has at least one positive and one negative variable.

But the real source of confusion here is the rest of the variable and constraint names. Why are they so long and cryptic?

It's not `gurobipy`

's fault - they provide options to create readable names for each variable and constraint.

And it's not `netflowmodel`

's fault. That code is crafting names for the variables and constraint based on the names of the nodes and commodities in the `dat`

object.

The readability problem stems from the data in `netflow.xlsx`

. The commodity and node names in this file aren't human readable at all. They look like they're computer codes designed for use by some ERP system.

But of course they are! This data set comes from an ERP system. Ok, not literally, as this data set was home brewed for testing purposes. But we see these types of names all the time. It's a common issue when troubleshooting MIPs. Even after you go through all the work of giving good names to your MIP variables and constraints, the actual entity names of the core data obscures them with all but illegible machine codes.

What would be really handy here would be a way to rename the nodes and commodities for testing purposes. If only we could make a copy of our data set that retained all the original numerical data, but substituted simple, readable names for all the entities. Then we could find this exact same infeasibility problem in the copy, and study a truly human readable `.ilp`

file. (The whole point of diagnostic files and naming routines is to achieve human readability so as to create insight).

Lucky for us, `ticdat`

provides just such functionality.

```
In [12]:
```dat_better, dat_renamings = dataFactory.obfusimplify(dat)

`obfusimplify`

. It returns two objects. The first is a copy of `dat`

with clean entity names. The second (`dat_renamings`

) is a dictionary that maps from the new entities back to the table and entity name of the original data. For example, `'C5'`

is the new name for the fifth commodity entry (which is `''00000567234-H74938046104070102'`

), and `'N5'`

is the new name for the fifth node entry (which is `''111BFZK2233-7788453212''`

).

```
In [13]:
```dat_renamings['C5']

```
Out[13]:
```

```
In [14]:
```dat_renamings['N5']

```
Out[14]:
```

`dat_better`

object.

```
In [15]:
```[[c, n1, n2] for c, n1, n2 in dat_better.cost if n1 == 'N5' or n2 == 'N5' and c == 'C5']

```
Out[15]:
```

```
In [16]:
```[[n1, n2, v["capacity"]] for (n1, n2), v in dat_better.arcs.items()
if n1 == 'N5' or n2 == 'N5']

```
Out[16]:
```

How does `obfusimplify`

know how to correctly populate the secondary tables like `arcs`

and `cost`

? It takes advantage of the foreign key relationships that `netflowmodel`

created when it built `dataFactory`

. In the diet example we saw how the small investment of defining these relationships yielded big dividends in recognizing a data set with a misspelled entry. Here, we see how performing this relatively easy task (which really just documents the nature of the input data) can help us troubleshoot an infeasible model with unreadable names.

Now that we have a renamed data set, we can generate a `.ilp`

file with better names.

```
In [17]:
```model,_ = create_model(dat_better)

```
In [18]:
```model.computeIIS()

```
```

```
In [19]:
```model.write("betterFail.ilp")

Now let's take a look at the `betterFail.ilp`

file.

```
node_C257_N63: flow_C257_N1_N63 + flow_C257_N57_N63 + flow_C257_N46_N63
+ flow_C257_N24_N63 + flow_C257_N13_N63 + flow_C257_N90_N63
+ flow_C257_N35_N63 + flow_C257_N79_N63 + flow_C257_N68_N63
+ flow_C257_N2_N63 - forcedToZero = 415
```

```
node_C257_N5: flow_C257_N1_N5 + flow_C257_N57_N5 + flow_C257_N35_N5
+ flow_C257_N46_N5 + flow_C257_N2_N5 + flow_C257_N24_N5
+ flow_C257_N13_N5 + flow_C257_N68_N5 + flow_C257_N90_N5
+ flow_C257_N79_N5 - forcedToZero = 357
```

```
node_C257_N68: - flow_C257_N68_N88 - flow_C257_N68_N81 - flow_C257_N68_N15
- flow_C257_N68_N6 - flow_C257_N68_N64 - flow_C257_N68_N53
- flow_C257_N68_N54 - flow_C257_N68_N48 - flow_C257_N68_N62
- flow_C257_N68_N19 - flow_C257_N68_N12 - flow_C257_N68_N3
- flow_C257_N68_N26 - flow_C257_N68_N56 - flow_C257_N68_N39
- flow_C257_N68_N50 - flow_C257_N68_N76 - flow_C257_N68_N43
- flow_C257_N68_N87 - flow_C257_N68_N52 - flow_C257_N68_N16
- flow_C257_N68_N94 - flow_C257_N68_N7 - flow_C257_N68_N69
- flow_C257_N68_N72 - flow_C257_N68_N47 - flow_C257_N68_N30
- flow_C257_N68_N22 - flow_C257_N68_N65 - flow_C257_N68_N91
...
- flow_C257_N68_N5 - flow_C257_N68_N70 - flow_C257_N68_N45
- flow_C257_N68_N89 - flow_C257_N68_N84 - flow_C257_N68_N82
- flow_C257_N68_N67 - flow_C257_N68_N14 - flow_C257_N68_N93
- flow_C257_N68_N9 - flow_C257_N68_N23 - flow_C257_N68_N34
- flow_C257_N68_N20 - flow_C257_N68_N75 - flow_C257_N68_N40
- flow_C257_N68_N49 - flow_C257_N68_N37 - flow_C257_N68_N32
- flow_C257_N68_N63 - flow_C257_N68_N11 - flow_C257_N68_N97
- flow_C257_N68_N38 - flow_C257_N68_N27 - flow_C257_N68_N86
- flow_C257_N68_N25 - flow_C257_N68_N28 - flow_C257_N68_N71
- flow_C257_N68_N44 - flow_C257_N68_N55 - flow_C257_N68_N66
+ forcedToZero = -5220
```

Now we're getting somewhere. These constraints and variables names are fit for man, not machine. This is something I can study for a while and actually get a feel for what's going on.

The first thing that jumps out is that all the flow conservation constraints involve commodity `C257`

. I wonder if there is an aggregate `inflow`

imbalance there? (I.e. total supply, total demand mismatch).

There is no reason to check for this problem exclusively for `C257`

. Lets whip out a little Python function that checks all aggregate `inflow`

imbalances for all commodities for a given data set.

```
In [20]:
```from collections import defaultdict
def find_flow_imbalance(dat):
rtn = defaultdict(float)
for (k,n),v in dat.inflow.items():
rtn[k] += v["quantity"]
return {k:v for k,v in rtn.items() if abs(v) > 0}

```
In [21]:
```find_flow_imbalance(dat_better)

```
Out[21]:
```

Sure enough, there is an aggregate `inflow`

imbalance for commodity `C257`

. There is also one for commodity `C15`

. Why did the `betterFail.ilp`

file describe one and not the other? The answer is it wouldn't be able to put the imbalance for both commodities into the same IIS, since they exist independently of each other. An IIS that captured both at the same time would fail the IIS requirement that removing just one constraint from the IIS results in a feasible sub-model.

Just for fun, let's look at the `inflow`

imbalances for the original data.

```
In [22]:
```find_flow_imbalance(dat)

```
Out[22]:
```

`obfusimplify`

did its job correctly.

```
In [23]:
```dat_renamings['C15']

```
Out[23]:
```

```
In [24]:
```dat_renamings['C257']

```
Out[24]:
```

Finally, it's worthwhile to discuss why we named the `ticdat`

function `obfusimplify`

. As you might expect, its a portmanteau of obfuscate and simplify. This notebook shows the utility of simplify, but lets not ignore the value of being able to obfuscate the entity names. We live in a time of data privacy concerns and dedicated hackers. It's not hard to imagine that the companies that provide the proof of concept data sets will appreciate the security benefits of name obfuscation. Once the entity names have been anonymized, the data set contains no identifying information. At that point, it is far safer to share (within your organization and with third party support providers) or to archive into a test suite.

In other words, it will be more productive, but equally secure, to troubleshoot a `.lp`

file based on an data set created with `obfusimplify`

than to troubleshoot a `.rlp`

file. The latter provides you no context as to what purpose a given variable or constraint is serving. The former, while still using fully anonymized names, is far more likely to lead to human insight.