TODO:

General Objectives:

  • PySpark + Spark SQL + Hadoop DB running on cluster (and Amazon EC2)
  • Patchify and filter on PySpark
  • Image fusion
  • Image registration
  • EM patched-base CNN
  • Spatial Statistics

Specific Objectives:

  • Move image_info nested dictionaries to ImageInfo transaction object
  • Clean ImageController code
  • Write distributed "patchify" - consider using Bolt ChunkedArrays
  • Prepare to convert Provider into a database query helper
  • Database considerations: we'll need a DB helper and DB contract, and an ImageFetcher and ImageParser
  • Questions (particularly for the fall):

    In terms of distributed computing, PySpark seems like a very good way to go:

  • I have found a Medical Image Registration Toolbox which works extrememly well with some of the data I'll be working with. However, this is a MATLAB toolbox. Will this play nicely with the PySpark framework? Will it have to be rewritten? I was planning to work on this on the fall to handle the Registration part of the pipeline
  • Matlab just introduced its own distributed computing/cloud computing options...should I try those instead?
  • What about doing registration on the GPU instead of a cluster?
  • 
    
    In [ ]: