Q3

CodeNeuro is a project run out of HHMI Janelia Farms which looks at designing algorithms to automatically identify neurons in time-lapse microscope data. The competition is called "NeuroFinder": http://neurofinder.codeneuro.org/

The goal of the project is to use data that look like this

and automatically segment out all the neurons in the image, like so

As you can probably imagine, storing this information is tricky, requiring a great deal of specifics. On the website, JSON format is used to submit predictions. The format is as follows:

  • The first layer is a list, where each item in the list is a dictionary. Each item (dictionary) corresponds to a single dataset.
  • One of the dictionaries will contain two keys: dataset, which gives the name of the dataset as the value (a string), and regions, which contains a list of all the regions found in that dataset.
  • A single item in the list of regions is a dictionary, with one key: coordinates.
  • The value for coordinates is, again, a list, where each element of the list is an (x, y) pair that specifies a pixel in the region.

That's a lot, for sure. Here's an example of a JSON structure representing two different datasets, where one dataset has only 1 region and the other dataset has 2 regions:

'[
  {"dataset": "d1", "regions":
    [
      {"coordinates": [[1, 2], [3, 4], [4, 5]]}
    ]
  },

  {"dataset": "d2", "regions":
    [
      {"coordinates": [[2, 3], [4, 10]]},
      {"coordinates": [[20, 20], [20, 21], [22, 23]]}
    ]
  }
]'

You have two datasets, d1 and d2, represented as two elements in the outermost list. Those two dictionaries have two keys, dataset (the name of the dataset) and regions (the list of regions outlining neurons present in that dataset). The regions field is a list of dictionaries, and the length of the list is how many distinct regions/neurons there are in that dataset. For example, in d1 above, there is only 1 neuron/region, but in d2, there are 2 neurons/regions. Each region is just a list of (x, y) tuple integers that specify a pixel in the image dataset that is part of the region.

WHEW. That's a lot. We'll try to start things off slowly.

A

Write a function, count_datasets, which returns the number of datasets in the provided JSON file.

The function will accept one argument: json_file, which is the name of a JSON file on the hard disk that represents a submission file for CodeNeuro.

Your function should return an integer: the number of datasets present in the JSON input file.

This function should read the file off the hard disk, count the number of datasets in the file, and return that number. It should also be able to handle file exceptions gracefully; if an error is encountered, return -1 to represent this. Otherwise, the return value should always be 0 or greater.

You can use the json Python library; otherwise, no other imports are allowed.


In [ ]:


In [ ]:
try:
    count_datasets
except:
    assert False
else:
    assert True

In [ ]:
c = count_datasets("submission_partial.json")
assert c == 4

In [ ]:
c = count_datasets("submission_full.json")
assert c == 9

In [ ]:
try:
    c = count_datasets("submission_nonexistent.json")
except:
    assert False
else:
    assert c == -1

B

Write a function, get_dataset_by_index, which returns a certain dataset from the file.

This function should take two arguments: the name of the JSON file on the filesystem, and the integer index of the dataset to return from that JSON file.

This function should return the dictionary corresponding to the dataset in the JSON file, or None if an invalid index is supplied (e.g. specified 10 when there are only 4 datasets, or a negative number, or a float/string/list/non-integer type). It should also be able to handle file-related errors.

You can use the json Python library; otherwise, no other imports are allowed.


In [ ]:


In [ ]:
try:
    get_dataset_by_index
except:
    assert False
else:
    assert True

In [ ]:
import json
d = json.loads(open("partial_1.json", "r").read())
assert d == get_dataset_by_index("submission_partial.json", 1)

d = json.loads(open("full_8.json", "r").read())
assert d == get_dataset_by_index("submission_full.json", 8)

In [ ]:
try:
    c = get_dataset_by_index("submission_partial.json", 5)
except:
    assert False
else:
    assert c is None

In [ ]:
try:
    c = get_dataset_by_index("submission_nonexistent.json", 4983)
except:
    assert False
else:
    assert c is None

C

Write a function, get_dataset_by_name, which is functionally identical to get_dataset_by_index except rather than retrieving a dataset by the integer index, you instead return a dataset by its string name.

This function should take two arguments: the name of the JSON file on the filesystem, and the string name of the dataset to return from that JSON file.

This function should return the dictionary corresponding to the dataset in the JSON file, or None if an invalid name is supplied. It should also be able to handle file-related errors.

You can use the json Python library; otherwise, no other imports are allowed.


In [ ]:


In [ ]:
try:
    get_dataset_by_name
except:
    assert False
else:
    assert True

In [ ]:
import json
d = json.loads(open("partial_1.json", "r").read())
assert d == get_dataset_by_name("submission_partial.json", "01.01.test")

d = json.loads(open("full_8.json", "r").read())
assert d == get_dataset_by_name("submission_full.json", "04.01.test")

In [ ]:
try:
    c = get_dataset_by_name("submission_partial.json", "nonexistent")
except:
    assert False
else:
    assert c is None

In [ ]:
try:
    c = get_dataset_by_name("submission_nonexistent.json", "02.00.test")
except:
    assert False
else:
    assert c is None

D

Write a function, count_pixels_in_dataset, which returns the number of pixels found in all regions of a particular dataset.

This function should take two arguments:

  • the string name of the JSON file containing all the datasets on the filesystem
  • the string name of the dataset to search

This function should return one integer: the number of pixels identified in regions in that dataset. This should be returned from the function. Each individual pixel is a single pair of (x, y) numbers (that counts as 1).

If any file-related errors are encountered, or an incorrect dataset name specified, the function should return -1.

You can use the json Python library, or other functions you've already written in this question; otherwise, no other imports are allowed.


In [ ]:


In [ ]:
try:
    count_pixels_in_dataset
except:
    assert False
else:
    assert True

In [ ]:
assert 29476 == count_pixels_in_dataset("submission_full.json", "01.01.test")
assert 30231 == count_pixels_in_dataset("submission_full.json", "04.01.test")

In [ ]:
try:
    c = count_pixels_in_dataset("submission_partial.json", "02.00.test")
except:
    assert False
else:
    assert c == -1