Opening and previewing

This uses the tiny excel spreadsheet example1.xls. It is small enough to preview inline in this notebook. But for bigger spreadsheet tables you will want to open them up in a separate window.



In [1]:

    
# Load in the functions
from databaker.framework import *

# Load the spreadsheet
tabs = loadxlstabs("example1.xls")

# Select the first table
tab = tabs[0]

print("The unordered bag of cells for this table looks like:")
print(tab)









    



Loading example1.xls which has size 7168 bytes
Table names: ['beatles', 'stones']
The unordered bag of cells for this table looks like:
{<B4 2.0>, <C6 1.0>, <D1 ''>, <B5 4.0>, <C5 3.0>, <C1 ''>, <B3 'Cars'>, <D3 'Trains'>, <C2 ''>, <D4 1.0>, <D6 3.0>, <B6 4.0>, <D5 2.0>, <B2 ''>, <A1 'Date'>, <B1 2014.0>, <C7 5.0>, <A7 'George'>, <A6 'Ringo'>, <A3 ''>, <A2 ''>, <C4 2.0>, <A4 'John'>, <C3 'Planes'>, <D7 5.0>, <D2 ''>, <A5 'Paul'>, <B7 2.0>}

Selecting cell bags

A table is also "bag of cells", which just so happens to be a set of all the cells in the table.

A "bag of cells" is like a Python set (and looks like one when you print it), but it has extra selection functions that help you navigate around the table.

We will learn these as we go along, but you can see the full list on the tutorial_reference notebook.



In [2]:

    
# Preview the table as a table inline
savepreviewhtml(tab)









    










beatles
Date 2014.0 

Cars Planes Trains
John 2.0 2.0 1.0
Paul 4.0 3.0 2.0
Ringo 4.0 1.0 3.0
George 2.0 5.0 5.0



In [3]:

    
bb = tab.is_bold()
print("The cells with bold font are", bb)









    



The cells with bold font are {<C3 'Planes'>, <D3 'Trains'>}



In [4]:

    
print("The", len(bb), "cells immediately below these bold font cells are", bb.shift(DOWN))









    



The 2 cells immediately below these bold font cells are {<C4 2.0>, <D4 1.0>}



In [5]:

    
cc = tab.filter("Cars")
print("The single cell with the text 'Cars' is", cc)

cc.assert_one()   # proves there is only one cell in this bag









    



The single cell with the text 'Cars' is {<B3 'Cars'>}






    Out[5]:





{<B3 'Cars'>}



In [6]:

    
print("Everything in the column below the 'Cars' cell is", cc.fill(DOWN))









    



Everything in the column below the 'Cars' cell is {<B7 2.0>, <B6 4.0>, <B5 4.0>, <B4 2.0>}



In [7]:

    
hcc = tab.filter("Cars").expand(DOWN)
print("If you wanted to include the 'Cars' heading, then use expand", hcc)









    



If you wanted to include the 'Cars' heading, then use expand {<B3 'Cars'>, <B7 2.0>, <B6 4.0>, <B5 4.0>, <B4 2.0>}



In [8]:

    
print("You can print the cells in row-column order if you don't mind unfriendly code")
shcc = sorted(hcc.unordered_cells, key=lambda Cell:(Cell.y, Cell.x))
print(shcc)









    



You can print the cells in row-column order if you don't mind unfriendly code
[<B3 'Cars'>, <B4 2.0>, <B5 4.0>, <B6 4.0>, <B7 2.0>]



In [9]:

    
print("It can be easier to see the set of cells coloured within the table")
savepreviewhtml(hcc)









    



It can be easier to see the set of cells coloured within the table






    









item 0


beatles
Date 2014.0 

Cars Planes Trains
John 2.0 2.0 1.0
Paul 4.0 3.0 2.0
Ringo 4.0 1.0 3.0
George 2.0 5.0 5.0

Note: As you work through this tutorial, do please feel free to temporarily insert new Jupyter-Cells in order to give yourself a place to experiment with any of the functions that are available. (Remember, the value of the last line in a Jupyter-Cell is always printed out -- in addition to any earlier print-statements.)



In [10]:

    
"All the cells that have an 'o' in them:", tab.regex(".*?o")









    Out[10]:





("All the cells that have an 'o' in them:",
 {<A6 'Ringo'>, <A4 'John'>, <A7 'George'>})

Observations and dimensions

Let's get on with some actual work. In our terminology, an "Observation" is a numerical measure (eg anything in the 3x4 array of numbers in the example table), and a "Dimension" is one of the headings.

Both are made up of a bag of cells, however a Dimension also needs to know how to "look up" from the Observation to its dimensional value.



In [11]:

    
# We get the array of observations by selecting its corner and expanding down and to the right
obs = tab.excel_ref('B4').expand(DOWN).expand(RIGHT)
savepreviewhtml(obs)









    









item 0


beatles
Date 2014.0 

Cars Planes Trains
John 2.0 2.0 1.0
Paul 4.0 3.0 2.0
Ringo 4.0 1.0 3.0
George 2.0 5.0 5.0



In [12]:

    
# the two main headings are in a row and a column
r1 = tab.excel_ref('B3').expand(RIGHT)
r2 = tab.excel_ref('A3').fill(DOWN)

# here we pass in a list containing two cell bags and get two colours
savepreviewhtml([r1, r2])









    









item 0 item 1


beatles
Date 2014.0 

Cars Planes Trains
John 2.0 2.0 1.0
Paul 4.0 3.0 2.0
Ringo 4.0 1.0 3.0
George 2.0 5.0 5.0



In [13]:

    
# HDim is made from a bag of cells, a name, and an instruction on how to look it up 
# from an observation cell.  
h1 = HDim(r1, "Vehicles", DIRECTLY, ABOVE)

# Here is an example cell
cc = tab.excel_ref('C5')

# You can preview a dimension as well as just a cell bag
savepreviewhtml([h1, cc])









    









Vehicles item 1


beatles
Date 2014.0 

Cars Planes Trains
John 2.0 2.0 1.0
Paul 4.0 3.0 2.0
Ringo 4.0 1.0 3.0
George 2.0 5.0 5.0



In [14]:

    
# !!! This is the important look-up stage from a cell into a dimension
print("Cell", cc, "matches", h1.cellvalobs(cc), "in dimension", h1.label)









    



Cell {<C5 3.0>} matches (<C3 'Planes'>, 'Planes') in dimension Vehicles



In [15]:

    
# You can start to see through to the final result of all this work when you 
# print out the lookup values for every observation in the table at once.  
for ob in obs:
    print("Obs", ob, "maps to", h1.cellvalobs(ob))









    



Obs {<B4 2.0>} maps to (<B3 'Cars'>, 'Cars')
Obs {<C4 2.0>} maps to (<C3 'Planes'>, 'Planes')
Obs {<D4 1.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B5 4.0>} maps to (<B3 'Cars'>, 'Cars')
Obs {<C5 3.0>} maps to (<C3 'Planes'>, 'Planes')
Obs {<D5 2.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B6 4.0>} maps to (<B3 'Cars'>, 'Cars')
Obs {<C6 1.0>} maps to (<C3 'Planes'>, 'Planes')
Obs {<D6 3.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B7 2.0>} maps to (<B3 'Cars'>, 'Cars')
Obs {<C7 5.0>} maps to (<C3 'Planes'>, 'Planes')
Obs {<D7 5.0>} maps to (<D3 'Trains'>, 'Trains')

Note the value of h1.cellvalobs(ob) is actually a pair composed of the heading cell and its value. This is is because we can over-ride its output value without actually rewriting the original table, as we shall see.



In [16]:

    
# You can change an output value like this:
h1.AddCellValueOverride("Cars", "Horses")

for ob in obs:
    print("Obs", ob, "maps to", h1.cellvalobs(ob))









    



Obs {<B4 2.0>} maps to (<B3 'Cars'>, 'Horses')
Obs {<C4 2.0>} maps to (<C3 'Planes'>, 'Planes')
Obs {<D4 1.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B5 4.0>} maps to (<B3 'Cars'>, 'Horses')
Obs {<C5 3.0>} maps to (<C3 'Planes'>, 'Planes')
Obs {<D5 2.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B6 4.0>} maps to (<B3 'Cars'>, 'Horses')
Obs {<C6 1.0>} maps to (<C3 'Planes'>, 'Planes')
Obs {<D6 3.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B7 2.0>} maps to (<B3 'Cars'>, 'Horses')
Obs {<C7 5.0>} maps to (<C3 'Planes'>, 'Planes')
Obs {<D7 5.0>} maps to (<D3 'Trains'>, 'Trains')



In [17]:

    
# Alternatively, you can override by the reference to a single cell to a value 
# (This will work even if the cell C3 is empty, which helps with filling in blank headings)
h1.AddCellValueOverride(tab.excel_ref('C3'), "Submarines")
for ob in obs:
    print("Obs", ob, "maps to", h1.cellvalobs(ob))









    



Obs {<B4 2.0>} maps to (<B3 'Cars'>, 'Horses')
Obs {<C4 2.0>} maps to (<C3 'Planes'>, 'Submarines')
Obs {<D4 1.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B5 4.0>} maps to (<B3 'Cars'>, 'Horses')
Obs {<C5 3.0>} maps to (<C3 'Planes'>, 'Submarines')
Obs {<D5 2.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B6 4.0>} maps to (<B3 'Cars'>, 'Horses')
Obs {<C6 1.0>} maps to (<C3 'Planes'>, 'Submarines')
Obs {<D6 3.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B7 2.0>} maps to (<B3 'Cars'>, 'Horses')
Obs {<C7 5.0>} maps to (<C3 'Planes'>, 'Submarines')
Obs {<D7 5.0>} maps to (<D3 'Trains'>, 'Trains')



In [18]:

    
# You can override the header value for an individual observation element.  
b4cell = tab.excel_ref('B4')
h1.AddCellValueOverride(b4cell, "Clouds")
for ob in obs:
    print("Obs", ob, "maps to", h1.cellvalobs(ob))









    



Obs {<B4 2.0>} maps to (None, 'Clouds')
Obs {<C4 2.0>} maps to (<C3 'Planes'>, 'Submarines')
Obs {<D4 1.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B5 4.0>} maps to (<B4 2.0>, 'Clouds')
Obs {<C5 3.0>} maps to (<C3 'Planes'>, 'Submarines')
Obs {<D5 2.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B6 4.0>} maps to (<B4 2.0>, 'Clouds')
Obs {<C6 1.0>} maps to (<C3 'Planes'>, 'Submarines')
Obs {<D6 3.0>} maps to (<D3 'Trains'>, 'Trains')
Obs {<B7 2.0>} maps to (<B4 2.0>, 'Clouds')
Obs {<C7 5.0>} maps to (<C3 'Planes'>, 'Submarines')
Obs {<D7 5.0>} maps to (<D3 'Trains'>, 'Trains')



In [19]:

    
# The preview table shows how things have changed
savepreviewhtml([h1, obs])









    









Vehicles item 1


beatles
Date 2014.0 

CarsHorses PlanesSubmarines Trains
John 2.0Clouds 2.0 1.0
Paul 4.0 3.0 2.0
Ringo 4.0 1.0 3.0
George 2.0 5.0 5.0



In [20]:

    
wob = tab.excel_ref('A1')
print("Wrong-Obs", wob, "maps to", h1.cellvalobs(wob), " <--- ie Nothing")









    



Wrong-Obs {<A1 'Date'>} maps to (None, None)  <--- ie Nothing



In [21]:

    
h1.AddCellValueOverride(None, "Who knows?")
print("After giving a default value Wrong-Obs", wob, "now maps to", h1.cellvalobs(wob))









    



After giving a default value Wrong-Obs {<A1 'Date'>} now maps to (None, 'Who knows?')



In [22]:

    
# The default even works if the cell bag set is empty.  In which case we have a special 
# constant case that maps every observation to the same value
h3 = HDimConst("Category", "Beatles")
for ob in obs:
    print("Obs", ob, "maps to", h3.cellvalobs(ob))









    



Obs {<B4 2.0>} maps to (None, 'Beatles')
Obs {<C4 2.0>} maps to (None, 'Beatles')
Obs {<D4 1.0>} maps to (None, 'Beatles')
Obs {<B5 4.0>} maps to (None, 'Beatles')
Obs {<C5 3.0>} maps to (None, 'Beatles')
Obs {<D5 2.0>} maps to (None, 'Beatles')
Obs {<B6 4.0>} maps to (None, 'Beatles')
Obs {<C6 1.0>} maps to (None, 'Beatles')
Obs {<D6 3.0>} maps to (None, 'Beatles')
Obs {<B7 2.0>} maps to (None, 'Beatles')
Obs {<C7 5.0>} maps to (None, 'Beatles')
Obs {<D7 5.0>} maps to (None, 'Beatles')

Conversion segments and output

A ConversionSegment is a collection of Dimensions with an Observation set that is going to be processed and output as a table all at once.

You can preview them in HTML (just like the cell bags and dimensions), only this time the observation cells can be clicked on interactively to show how they look up.



In [46]:

    
dimensions = [ 
    HDim(tab.excel_ref('B1'), TIME, CLOSEST, ABOVE), 
    HDim(r1, "Vehicles", DIRECTLY, ABOVE), 
    HDim(r2, "Name", DIRECTLY, LEFT), 
    HDimConst("Category", "Beatles")
]

c1 = ConversionSegment(obs, dimensions, processTIMEUNIT=False)
savepreviewhtml(c1)









    









OBS TIME Vehicles Name


beatles
Date 2014.0 

Cars Planes Trains
John 2.0 2.0 1.0
Paul 4.0 3.0 2.0
Ringo 4.0 1.0 3.0
George 2.0 5.0 5.0



In [55]:

    
# If the table is too big, we can preview it in another file is openable in another browser window.
# (It's very useful if you are using two computer screens.)
savepreviewhtml(c1, "preview.html", verbose=False)









    




Written to file: preview.html



In [47]:

    
print("Looking up all the observations against all the dimensions and print them out")
for ob in c1.segment:
    print(c1.lookupobs(ob))









    



Looking up all the observations against all the dimensions and print them out
{'Vehicles': 'Cars', 'Category': 'Beatles', 'OBS': 2.0, 'TIME': '2014.0', 'Name': 'John'}
{'Vehicles': 'Planes', 'Category': 'Beatles', 'OBS': 2.0, 'TIME': '2014.0', 'Name': 'John'}
{'Vehicles': 'Trains', 'Category': 'Beatles', 'OBS': 1.0, 'TIME': '2014.0', 'Name': 'John'}
{'Vehicles': 'Cars', 'Category': 'Beatles', 'OBS': 4.0, 'TIME': '2014.0', 'Name': 'Paul'}
{'Vehicles': 'Planes', 'Category': 'Beatles', 'OBS': 3.0, 'TIME': '2014.0', 'Name': 'Paul'}
{'Vehicles': 'Trains', 'Category': 'Beatles', 'OBS': 2.0, 'TIME': '2014.0', 'Name': 'Paul'}
{'Vehicles': 'Cars', 'Category': 'Beatles', 'OBS': 4.0, 'TIME': '2014.0', 'Name': 'Ringo'}
{'Vehicles': 'Planes', 'Category': 'Beatles', 'OBS': 1.0, 'TIME': '2014.0', 'Name': 'Ringo'}
{'Vehicles': 'Trains', 'Category': 'Beatles', 'OBS': 3.0, 'TIME': '2014.0', 'Name': 'Ringo'}
{'Vehicles': 'Cars', 'Category': 'Beatles', 'OBS': 2.0, 'TIME': '2014.0', 'Name': 'George'}
{'Vehicles': 'Planes', 'Category': 'Beatles', 'OBS': 5.0, 'TIME': '2014.0', 'Name': 'George'}
{'Vehicles': 'Trains', 'Category': 'Beatles', 'OBS': 5.0, 'TIME': '2014.0', 'Name': 'George'}



In [48]:

    
df = c1.topandas()
df









    









    Out[48]:






  
    
      
      OBS
      TIME
      Vehicles
      Name
      Category
    
  
  
    
      0
      2.0
      2014.0
      Cars
      John
      Beatles
    
    
      1
      2.0
      2014.0
      Planes
      John
      Beatles
    
    
      2
      1.0
      2014.0
      Trains
      John
      Beatles
    
    
      3
      4.0
      2014.0
      Cars
      Paul
      Beatles
    
    
      4
      3.0
      2014.0
      Planes
      Paul
      Beatles
    
    
      5
      2.0
      2014.0
      Trains
      Paul
      Beatles
    
    
      6
      4.0
      2014.0
      Cars
      Ringo
      Beatles
    
    
      7
      1.0
      2014.0
      Planes
      Ringo
      Beatles
    
    
      8
      3.0
      2014.0
      Trains
      Ringo
      Beatles
    
    
      9
      2.0
      2014.0
      Cars
      George
      Beatles
    
    
      10
      5.0
      2014.0
      Planes
      George
      Beatles
    
    
      11
      5.0
      2014.0
      Trains
      George
      Beatles

WDA Technical CSV

The ONS uses their own data system for publishing their time-series data known as WDA.

If you need to output to it, then this next section is for you.

The function which outputs to the WDA format is writetechnicalCSV(filename, [conversionsegments]) The format is very verbose because it repeats each dimension name and its value twice in each row, and every row begins with the following list of column entries, whether or not they exist.

observation, data_marking, statistical_unit_eng, statistical_unit_cym, measure_type_eng, measure_type_cym, observation_type, obs_type_value, unit_multiplier, unit_of_measure_eng, unit_of_measure_cym, confidentuality, geographic_area

The writetechnicalCSV() function accepts a single conversion segment, a list of conversion segments, or equivalently a pandas dataframe.



In [35]:

    
print(writetechnicalCSV(None, c1))









    



observation,data_marking,statistical_unit_eng,statistical_unit_cym,measure_type_eng,measure_type_cym,observation_type,empty,obs_type_value,unit_multiplier,unit_of_measure_eng,unit_of_measure_cym,confidentuality,empty1,geographic_area,empty2,empty3,time_dim_item_id,time_dim_item_label_eng,time_dim_item_label_cym,time_type,empty4,statistical_population_id,statistical_population_label_eng,statistical_population_label_cym,cdid,cdiddescrip,empty5,empty6,empty7,empty8,empty9,empty10,empty11,empty12,dim_id_1,dimension_label_eng_1,dimension_label_cym_1,dim_item_id_1,dimension_item_label_eng_1,dimension_item_label_cym_1,is_total_1,is_sub_total_1,dim_id_2,dimension_label_eng_2,dimension_label_cym_2,dim_item_id_2,dimension_item_label_eng_2,dimension_item_label_cym_2,is_total_2,is_sub_total_2,dim_id_3,dimension_label_eng_3,dimension_label_cym_3,dim_item_id_3,dimension_item_label_eng_3,dimension_item_label_cym_3,is_total_3,is_sub_total_3
2.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Cars,Cars,,,,Name,Name,,John,John,,,,Category,Category,,Beatles,Beatles,,,
2.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Planes,Planes,,,,Name,Name,,John,John,,,,Category,Category,,Beatles,Beatles,,,
1.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Trains,Trains,,,,Name,Name,,John,John,,,,Category,Category,,Beatles,Beatles,,,
4.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Cars,Cars,,,,Name,Name,,Paul,Paul,,,,Category,Category,,Beatles,Beatles,,,
3.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Planes,Planes,,,,Name,Name,,Paul,Paul,,,,Category,Category,,Beatles,Beatles,,,
2.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Trains,Trains,,,,Name,Name,,Paul,Paul,,,,Category,Category,,Beatles,Beatles,,,
4.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Cars,Cars,,,,Name,Name,,Ringo,Ringo,,,,Category,Category,,Beatles,Beatles,,,
1.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Planes,Planes,,,,Name,Name,,Ringo,Ringo,,,,Category,Category,,Beatles,Beatles,,,
3.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Trains,Trains,,,,Name,Name,,Ringo,Ringo,,,,Category,Category,,Beatles,Beatles,,,
2.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Cars,Cars,,,,Name,Name,,George,George,,,,Category,Category,,Beatles,Beatles,,,
5.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Planes,Planes,,,,Name,Name,,George,George,,,,Category,Category,,Beatles,Beatles,,,
5.0,,,,,,,,,,,,,,,,,2014.0,2014.0,,,,,,,,,,,,,,,0,,Vehicles,Vehicles,,Trains,Trains,,,,Name,Name,,George,George,,,,Category,Category,,Beatles,Beatles,,,
*********,12



In [40]:

    
# This is how to write to a file
writetechnicalCSV("exampleWDA.csv", c1)

# We can read this file back in to a list of pandas dataframes
dfs = readtechnicalCSV("exampleWDA.csv")
print(dfs[0])









    



writing 1 conversion segments into /home/goatchurch/sensiblecode/src/databaker/databaker/tutorial/exampleWDA.csv
conversionwrite segment size 12 table 'beatles'; 
    OBS    TIME Vehicles    Name Category
0   2.0  2014.0     Cars    John  Beatles
1   2.0  2014.0   Planes    John  Beatles
2   1.0  2014.0   Trains    John  Beatles
3   4.0  2014.0     Cars    Paul  Beatles
4   3.0  2014.0   Planes    Paul  Beatles
5   2.0  2014.0   Trains    Paul  Beatles
6   4.0  2014.0     Cars   Ringo  Beatles
7   1.0  2014.0   Planes   Ringo  Beatles
8   3.0  2014.0   Trains   Ringo  Beatles
9   2.0  2014.0     Cars  George  Beatles
10  5.0  2014.0   Planes  George  Beatles
11  5.0  2014.0   Trains  George  Beatles

Note If you were wondering what the processTIMEUNIT=False was all about in the ConversionSegment constructor, it's a feature to help the WDA output automatically set the TIMEUNIT column according to whether it should be Year, Month, or Quarter.

You will note that the TIME column above is 2014.0 when it really should be 2014 with the TIMEUNIT set to Year.

By setting it to True the ConversionSegment object will identify the timeunit from the value of the TIME column and then force its format to conform.



In [51]:

    
# See that the `2014` no longer ends with `.0`
c1 = ConversionSegment(obs, dimensions, processTIMEUNIT=True)
c1.topandas()









    



TIMEUNIT='Year'






    Out[51]:






  
    
      
      OBS
      TIME
      TIMEUNIT
      Vehicles
      Name
      Category
    
  
  
    
      0
      2.0
      2014
      Year
      Cars
      John
      Beatles
    
    
      1
      2.0
      2014
      Year
      Planes
      John
      Beatles
    
    
      2
      1.0
      2014
      Year
      Trains
      John
      Beatles
    
    
      3
      4.0
      2014
      Year
      Cars
      Paul
      Beatles
    
    
      4
      3.0
      2014
      Year
      Planes
      Paul
      Beatles
    
    
      5
      2.0
      2014
      Year
      Trains
      Paul
      Beatles
    
    
      6
      4.0
      2014
      Year
      Cars
      Ringo
      Beatles
    
    
      7
      1.0
      2014
      Year
      Planes
      Ringo
      Beatles
    
    
      8
      3.0
      2014
      Year
      Trains
      Ringo
      Beatles
    
    
      9
      2.0
      2014
      Year
      Cars
      George
      Beatles
    
    
      10
      5.0
      2014
      Year
      Planes
      George
      Beatles
    
    
      11
      5.0
      2014
      Year
      Trains
      George
      Beatles

Note Sometimes the TIME value needs to be created by joining two or more other cells (eg one is a month, and the other is the year).

Such an operation can much more easily be done using the pandas column operations than by using the concept of subdimensions which used to exist in Databaker before we took it out.

This will be explained in a later worked example.



In [ ]:

Date	2014.0

	Cars	Planes	Trains
John	2.0	2.0	1.0
Paul	4.0	3.0	2.0
Ringo	4.0	1.0	3.0
George	2.0	5.0	5.0

Date	2014.0

	~~Cars~~Horses	~~Planes~~Submarines	Trains
John	~~2.0~~Clouds	2.0	1.0
Paul	4.0	3.0	2.0
Ringo	4.0	1.0	3.0
George	2.0	5.0	5.0

	OBS	TIME	TIMEUNIT	Vehicles	Name	Category
0	2.0	2014	Year	Cars	John	Beatles
1	2.0	2014	Year	Planes	John	Beatles
2	1.0	2014	Year	Trains	John	Beatles
3	4.0	2014	Year	Cars	Paul	Beatles
4	3.0	2014	Year	Planes	Paul	Beatles
5	2.0	2014	Year	Trains	Paul	Beatles
6	4.0	2014	Year	Cars	Ringo	Beatles
7	1.0	2014	Year	Planes	Ringo	Beatles
8	3.0	2014	Year	Trains	Ringo	Beatles
9	2.0	2014	Year	Cars	George	Beatles
10	5.0	2014	Year	Planes	George	Beatles
11	5.0	2014	Year	Trains	George	Beatles