In [9]:
from IPython.core.display import display, HTML
from part_four_meta import giffy_html
display(HTML(giffy_html))
Use code from flpd_helper to get frequencies of data types across fields across documents for a collection.
In [10]:
# This cell is meta code whose purpose is to generate some data
# so that it can be displayed in an HTML table in this post
from flpd_helper.tools import data_types_differences, formatted_type_counts
from flpd_helper import validated_collection_names
from itertools import chain
FIRST_ITEM = 0
# data_types_differences can take over a minute to run
duplicates_map = data_types_differences()
In [11]:
formatted_type_counts_ = formatted_type_counts(duplicates_map)
In [12]:
# This cell is meta code whose purpose is to display HTML into this notebook
# so that it will display in the Zippped Code blog.
# ipy_table is a 3rd party package
from ipy_table import make_table, set_column_style, render
import significance_for_differences
from IPython.lib import deepreload
deepreload.reload(significance_for_differences)
header = list(('Collection Name', 'Field', 'Data Types and Counts', ))
if header not in formatted_type_counts_:
formatted_type_counts_.insert(0, header)
make_table(formatted_type_counts_)
set_column_style(0, width='100', bold=False,)
set_column_style(1, width='100', color='hsla(225, 80%, 94%, 1)')
set_column_style(2, width='100', bold=True,)
display(HTML(significance_for_differences.table_title))
render()
Out[12]:
In [13]:
display(HTML(significance_for_differences.significance_anchor))
I wrote a module that maps data types to mongoengine.Field classes. It even includes some custom functions that modify some of the values so that they may be cast into their appropriate data type.
For example, the cast_spaced_time function takes a value that is supposed to be a time, e.g. '1 30' and turns it into an integer: 130.
In [14]:
from flpd_helper.documents import cast_spaced_time
new_value = cast_spaced_time()('2 40') # closure that it imitates a Class
print(new_value)
Result is a database named 'app' with collections of data taken from the CSV files.