This notebook tests the following Python scripts:
In [1]:
!ls *.py
About These Files
These scripts make use of reusable objects. The object code is contained within them but can be deployed on your systems so that updates and enhancements then flow into the final scripts you build. If you analyze the code, you will notice that abstract objects then have child objects to do the actual work. In these objects is a seemingly useless function with no definition. It is there as a placeholder.
The code as written will split or fuse files. These placeholder functions can then be modified in the child objects to allow you to perform operations on the inputs ahead of creating the outputs. The intent is to set up inherently extensible code where the coder can focus on what needs to be done to the files and let the underlying objects handle spitting / fusing as designed.
In [2]:
!ls Test*.csv
Code Testing
In the cells that follow:
Getting Help:
In [3]:
!python file_fragmenter.py -h
In [4]:
!python file_fuser.py --help
Tests Using A Simple Index In The Input
Note: numRows (rw) argument defaults to 10,000 rows if it is left off. numRows (rw) arguments are set in testing with strange values just to illustrate how the code works.
In [6]:
!python file_fragmenter.py -i TestFileSmall-wIndx.csv -o TestFSwi_i -rw 13 -idx True -odx True
In [7]:
!python file_fuser.py -i TestFSwi_i -o TestFSwi_i_Out.csv -idx True -odx True
In [8]:
!python file_fuser.py --inputStart TestFSwi_i --output TestFSwi_noI_Out.csv --input_index True
In above tests:
Test Using No Index in The Original Input File
This next batch of tests starts from an input csv that does not have an index on it.
In [9]:
!python file_fragmenter.py -i TestFileSmall-NoIndx.csv -o TestFSnoX_ --numRows 27
In [10]:
!python file_fuser.py -i TestFSnoX_ -o TstFSnoX_Out.csv
In [11]:
!python file_fuser.py -i TestFSnoX_ -o TestFSnoX_Out_i.csv --output_index True
In the previous batch of tests, the original input file did not have an index. The file is split and then merged back together. Then finally, the inputs are merged again, but this time into a file with an index. The index is auto-generated and sorted to run from 0, 1, 2, ... etc.
Testing Using Single Column File
This next section repeats all of the tests before it using a csv with only one column of data in it. These tests were performed for completeness. Early drafts of the code worked on a single code but not on multiple columns. Once code was revised to work on multiple columns, these tests were performed again to show it still works on a single column. They also illustrate more permutations of the input parameter syntax.
In [13]:
!python file_fragmenter.py --input TestFile_SingleCol-NoIndx.csv --outputStart TestSCFL_noX_
In [14]:
!python file_fuser.py -i TestSCFL_noX_ -o TstSCFL_noX_Out.csv
In [15]:
!python file_fuser.py -i TestSCFL_noX_ -o TestSCFL_noX_Out.csv -odx True
In [16]:
!python file_fragmenter.py -i TestFile_SingleCol-wIndx.csv -o TestSCFL_idX_ -rw 11500 -idx True -odx True
In [17]:
!python file_fuser.py -i TestSCFL_idX_ -o TstSCFL_idX_Out.csv -idx True -odx True
In [18]:
!python file_fuser.py --inputStart TestSCFL_idX_ --output TestSCFL_idX_Out.csv --input_index True
Final test involving a large file. Lack of --numRows (or -rw) argument means code will default to 10,000 rows per file.
In [19]:
!python file_fragmenter.py -i TestFileLarge.csv -o TestFileLrgFrag
In [21]:
!python file_fuser.py -i TestFileLrgFrag -o TestFileLarge_Out.csv
In [ ]:
# The End