TODO:

General Objectives:

PySpark + Spark SQL + Hadoop DB running on cluster (and Amazon EC2)
Patchify and filter on PySpark
Image fusion
Image registration
EM patched-base CNN
Spatial Statistics

Specific Objectives:

Move image_info nested dictionaries to ImageInfo transaction object

Clean ImageController code

Write distributed "patchify" - consider using Bolt ChunkedArrays

Prepare to convert Provider into a database query helper

Database considerations: we'll need a DB helper and DB contract, and an ImageFetcher and ImageParser

Questions (particularly for the fall):

In terms of distributed computing, PySpark seems like a very good way to go:

I have found a Medical Image Registration Toolbox which works extrememly well with some of the data I'll be working with. However, this is a MATLAB toolbox. Will this play nicely with the PySpark framework? Will it have to be rewritten? I was planning to work on this on the fall to handle the Registration part of the pipeline

Matlab just introduced its own distributed computing/cloud computing options...should I try those instead?

What about doing registration on the GPU instead of a cluster?



In [ ]: