BONUS

This bonus will do a very deep dive into dictionaries and lists, and do so within the context of a real-world application.

CodeNeuro is a project run out of HHMI Janelia Farms which looks at designing algorithms to automatically identify neurons in time-lapse microscope data. The competition is called "NeuroFinder": http://neurofinder.codeneuro.org/

The goal of the project is to use data that look like this

and automatically segment out all the neurons in the image, like so

As you can probably imagine, storing this information is tricky, requiring a great deal of specifics. To store this data, they use a data format we haven't covered before called JSON, or JavaScript Object Notation. It's an extremely flexible data storage format, using regular old text but structuring it in such a way that you can store pretty much any information you want. As a bonus, its structure maps really really well to dictionaries.

In the CodeNeuro competition, then, the JSON format is used to submit predictions from participating competitors. Recalling the aforementioned goal of the competition (re: the two above images), the format of the JSON submissions is as follows:

  • The first layer is a list, where each item in the list is a dictionary. Each item (again, a dictionary) corresponds to a single dataset.
  • One of the dictionaries will contain two keys: dataset, which gives the name of the dataset as the value (a string), and regions, which contains a list of all the regions found in that dataset.
  • A single item in the list of regions is a dictionary, with one key: coordinates.
  • The value for coordinates is, again, a list, where each element of the list is an (x, y) pair that specifies a pixel in the region.

That's a lot, for sure. Here's an example of a JSON structure representing two different datasets, where one dataset has only 1 region and the other dataset has 2 regions:

'[
  {"dataset": "d1", "regions":
    [
      {"coordinates": [[1, 2], [3, 4], [4, 5]]}
    ]
  },

  {"dataset": "d2", "regions":
    [
      {"coordinates": [[2, 3], [4, 10]]},
      {"coordinates": [[20, 20], [20, 21], [22, 23]]}
    ]
  }
]'

You have two datasets, d1 and d2, represented as two elements in the outermost list. Those two dictionaries have two keys, dataset (the name of the dataset) and regions (the list of regions outlining neurons present in that dataset). The regions field is a list of dictionaries, and the length of the list is how many distinct regions/neurons there are in that dataset. For example, in d1 above, there is only 1 neuron/region, but in d2, there are 2 neurons/regions. Each region is just a list of (x, y) tuple integers that specify a pixel in the image dataset that is part of the region.

WHEW. That's a lot. We'll try to start things off slowly.

Part A

Write a function which:

  • is named count_datasets
  • takes 1 argument: a string to a JSON file
  • returns 1 value: the number (integer) of datasets in the provided JSON file

The JSON file string provided to the function indicates the file on the hard disk that represents a submission for CodeNeuro. Your function should go through the JSON structure in this file, count the number of datasets present in it, and return an integer count of the number of datasets present in the JSON input file.

This function should read the file off the hard disk, count the number of datasets in the file, and return that number. It should also be able to handle file exceptions gracefully; if an error is encountered, return -1 to represent this. Otherwise, the return value should always be 0 or greater.

You can use the json Python library; otherwise, no other imports are allowed. Here is the Python JSON library documentatio: https://docs.python.org/3/library/json.html Of particular note, for reading JSON files, is the json.load function.


In [ ]:


In [ ]:
c = count_datasets("submission_partial.json")
assert c == 4

In [ ]:
c = count_datasets("submission_full.json")
assert c == 9

In [ ]:
try:
    c = count_datasets("submission_nonexistent.json")
except:
    assert False
else:
    assert c == -1

Part B

Write a function which:

  • is named get_dataset_by_index
  • takes 2 arguments: the name of the JSON file on the filesystem (same as in Part A), and the integer index of the data to return from that JSON file
  • returns 1 value: the JSON dataset specified by the integer index argument

This function should return the dictionary corresponding to the dataset in the JSON file, or None if an invalid index is supplied (e.g. specified 10 when there are only 4 datasets, or a negative number, or a float/string/list/non-integer type). It should also be able to handle file-related errors.

You can use the json Python library; otherwise, no other imports are allowed.


In [ ]:


In [ ]:
import json
d = json.loads(open("partial_1.json", "r").read())
assert d == get_dataset_by_index("submission_partial.json", 1)

In [ ]:
d = json.loads(open("full_8.json", "r").read())
assert d == get_dataset_by_index("submission_full.json", 8)

In [ ]:
try:
    c = get_dataset_by_index("submission_partial.json", 5)
except:
    assert False
else:
    assert c is None

In [ ]:
try:
    c = get_dataset_by_index("submission_nonexistent.json", 4983)
except:
    assert False
else:
    assert c is None

Part C

Write a function which:

  • is named get_dataset_by_name
  • takes 2 arguments: the name of the JSON file on the filesystem (same as in Part A and B), and a string indicating the name of the dataset to return
  • returns 1 value: the JSON dataset specified by the string name argument

This solution is functionally identical to get_dataset_by_index, except rather than retrieving a dataset by the integer index, you instead return a dataset by its string name.

This function should return the dictionary corresponding to the dataset in the JSON file, or None if an invalid name is supplied. It should also be able to handle file-related errors.

You can use the json Python library; otherwise, no other imports are allowed.


In [ ]:


In [ ]:
import json
d = json.loads(open("partial_1.json", "r").read())
assert d == get_dataset_by_name("submission_partial.json", "01.01.test")

In [ ]:
d = json.loads(open("full_8.json", "r").read())
assert d == get_dataset_by_name("submission_full.json", "04.01.test")

In [ ]:
try:
    c = get_dataset_by_name("submission_partial.json", "nonexistent")
except:
    assert False
else:
    assert c is None

In [ ]:
try:
    c = get_dataset_by_name("submission_nonexistent.json", "02.00.test")
except:
    assert False
else:
    assert c is None

Part D

Write a function which:

  • is named count_pixels_in_dataset
  • takes 2 arguments: the name of the JSON file on the filesystem (same as in Part A, B, and C), and the string name of the dataset to examine (same as Part C)
  • returns 1 number: the count of pixels found in all regions of the specified dataset

Each individual pixel is a single pair of (x, y) numbers (that counts as 1).

If any file-related errors are encountered, or an incorrect dataset name specified, the function should return -1.

You can use the json Python library, or other functions you've already written in this question; otherwise, no other imports are allowed.


In [ ]:


In [ ]:
assert 29476 == count_pixels_in_dataset("submission_full.json", "01.01.test")

In [ ]:
assert 30231 == count_pixels_in_dataset("submission_full.json", "04.01.test")

In [ ]:
try:
    c = count_pixels_in_dataset("submission_partial.json", "02.00.test")
except:
    assert False
else:
    assert c == -1