Five use cases are considered:
In the simplest case, you can locate the features in every frame of a movie, and output them to a variable.
In [4]:
import mr
In [10]:
v = mr.Video('/home/dallan/mr/mr/tests/water/bulk-water.mov')
In [12]:
f = mr.batch(v[:3], 11, 3000)
The result is a DataFrame, which can be saved in formats convenient for sharing, like Excel
In [13]:
f.to_excel('features.xlsx')
or comma-separated values
In [14]:
f.to_csv('features.csv')
These formats are slow to read and write. If you not are not sending the file to a non-programmer, it is better to save it as a binary file.
In [37]:
f.save('features.df') # df for DataFrame -- could be any name you want
For large jobs, it is better to save the features one frame at a time as the job proceeds. If the job is interrupted, partial progress will be saved. And the job requires only enough memory to process one frame at a time -- it need not hold all the frames' data.
batch can do this in two different ways: using an HDF5 file (a fast binary format) or a SQL database.
For HDF5, we open an HDF5 file using pandas, and pass it to batch.
In [20]:
store = pd.HDFStore('data.h5')
f = mr.batch(v[:3], 11, 3000, store=store, table='bulk_water/features')
# table can take any unique name -- even slashes and spaces are OK
batch saves the data one frame at a time, discarding each frame's data before it begins the next one. In this way, memory is conserved and long videos can be processed. At the end, batch loads the data out of the HDF5 file and returns it in the variable f.
In some cases, if you wish to run jobs simultaneous in several Python sessions, you might want to leave the data in store and retrieve it later, in part or in full. Use do_not_return=True.
In [22]:
mr.batch(v[:3], 11, 3000, store=store, table='bulk_water/features', do_not_return=True)
# This returns nothing.
We can load it from the store later.
In [25]:
f = store['bulk_water/features']
f.head()
Out[25]:
If it is too large, we can fetch it in part:
In [43]:
f = store.select('features', pd.Term('frame < 3'))
f.head()
Out[43]:
In [33]:
import sqlite3
conn = sqlite3.connect('data.sql')
f = mr.batch(v[:3], 11, 3000, conn=conn, sql_flavor='sqlite', table='bulk_water/features')
A MySQL database is also supported. The mr.sql module provides a convenience function for making a MySQL database connection.
In [32]:
f = mr.batch(v[:3], 11, 3000, conn=mr.sql.connect(), sql_flavor='mysql', table='bulk_water/features')
As with HDF5, you can conserve memory using do_not_return=True.
Finally, sometimes it is convenient examine the early results while the full video is still being processed. This is not possible with an HDF5 file, which does not support concurrent reading and writing. But SQL makes it possible.
In [36]:
partial = pd.io.sql.read_frame('select * from bulk_water_features', conn)
partial.head()
Out[36]:
Here we have the full result because my short example job is done and already finished.