This section describes a tool or operation that is desirable for someone. The title above should describe what is happening, and this paragraph explains in what situation the tool or process being described is useful. It sets the stage for what we're about to do. If we expect that they've completed previous tutorials, such as formatting their data or preprocessing it in some way, we should link here to those previous tutorials (or at the very least name them).
This sections should describe what a user needs to have in front of them in order for the tutorial to work, including providing example data that should be: a) publicly accessible, b) "small", and c) be able to be processed quickly. This should explain the properties of the data needed, and either display an image of the demo data inline, or have a code block below showing the demo data, like is currently below.
In [2]:
from scipy import misc as scm
import os.path as op
import matplotlib.pyplot as plt
% matplotlib inline
datadir = '/tmp/113_1/'
im = scm.imread(op.join(datadir,'0090.png'))
plt.imshow(im, cmap='gray')
plt.show()
Here we should enumerate the goals of the pre-processing steps that need to be taken initially. Whether this is organization or documentation of the data, or computing some trasformation, this step is generally taking the fresh, "raw"-ish data you provided and the user is expected to have, and sets it up so that in the third step they can do real processing.
As an example, in the case of ndstore, when creating datasets/projects/channels we need to learn the following features about our data prior to beginning:
An external link to documentation which explain things, like this one for the above example, is always helpful for users who wish to have more than the superficial and functional picture you're currently providing.
We, again, should have a code block that does some analysis. the one below gets some items from that list above.
In [3]:
import os
import numpy as np
files = os.listdir(datadir) # get a list of all files in the dataset
print 'X image size: ', im.shape[1] # second dimension is X in our png
print 'Y image size: ', im.shape[0] # first dimension is Y in our png
print 'Z image size: ', len(files) # we get Z by counting the number of images in our directory
print 'Time range: (0, 0)' # default value if the data is not time series
dtype = im.dtype
print 'Data type: ', dtype
try:
im_min = np.iinfo(dtype).max
im_max = np.iinfo(dtype).min
except:
im_min = np.finfo(dtype).max
im_max = np.finfo(dtype).min
for f in files: # get range by checking each slice min and max
temp_im = scm.imread(op.join(datadir, f))
im_min = np.min(temp_im) if np.min(temp_im) < im_min else im_min # update image stack min
im_max = np.max(temp_im) if np.max(temp_im) > im_max else im_max # update image stack max
print 'Window range: (%f, %f)' % (im_min, im_max)
It's also important to summarize what we've done, so that the user can Summarizing these results and those that require more intimate knowledge of the data, we come up with the following:
| property | value |
|---|---|
| dataset name | kki2009_demo |
| x size | 182 |
| y size | 218 |
| z size | 182 |
| time range | (0, 0) |
| data type | uint8 |
| window range | (0, 255) |
This is usually the real deal. You've set the stage, preprocessed as needed, and now are ready for the task. Here you should provide a detailed description of the next steps, link to documentation, and provide some way to validate that what you are getting the user to do worked as expected.
In [4]:
print "more code here, as always"