This bonus will do a very deep dive into dictionaries and lists, and do so within the context of a real-world application.
CodeNeuro is a project run out of HHMI Janelia Farms which looks at designing algorithms to automatically identify neurons in time-lapse microscope data. The competition is called "NeuroFinder": http://neurofinder.codeneuro.org/
The goal of the project is to use data that look like this
and automatically segment out all the neurons in the image, like so
As you can probably imagine, storing this information is tricky, requiring a great deal of specifics. To store this data, they use a data format we haven't covered before called JSON, or JavaScript Object Notation. It's an extremely flexible data storage format, using regular old text but structuring it in such a way that you can store pretty much any information you want. As a bonus, its structure maps really really well to dictionaries.
In the CodeNeuro competition, then, the JSON format is used to submit predictions from participating competitors. Recalling the aforementioned goal of the competition (re: the two above images), the format of the JSON submissions is as follows:
dataset
, which gives the name of the dataset as the value (a string), and regions
, which contains a list of all the regions found in that dataset.coordinates
.coordinates
is, again, a list, where each element of the list is an (x, y)
pair that specifies a pixel in the region.That's a lot, for sure. Here's an example of a JSON structure representing two different datasets, where one dataset has only 1 region and the other dataset has 2 regions:
'[
{"dataset": "d1", "regions":
[
{"coordinates": [[1, 2], [3, 4], [4, 5]]}
]
},
{"dataset": "d2", "regions":
[
{"coordinates": [[2, 3], [4, 10]]},
{"coordinates": [[20, 20], [20, 21], [22, 23]]}
]
}
]'
You have two datasets, d1
and d2
, represented as two elements in the outermost list. Those two dictionaries have two keys, dataset
(the name of the dataset) and regions
(the list of regions outlining neurons present in that dataset). The regions
field is a list of dictionaries, and the length of the list is how many distinct regions/neurons there are in that dataset. For example, in d1
above, there is only 1 neuron/region, but in d2
, there are 2 neurons/regions. Each region is just a list of (x, y)
tuple integers that specify a pixel in the image dataset that is part of the region.
WHEW. That's a lot. We'll try to start things off slowly.
Write a function which:
count_datasets
The JSON file string provided to the function indicates the file on the hard disk that represents a submission for CodeNeuro. Your function should go through the JSON structure in this file, count the number of datasets present in it, and return an integer count of the number of datasets present in the JSON input file.
This function should read the file off the hard disk, count the number of datasets in the file, and return that number. It should also be able to handle file exceptions gracefully; if an error is encountered, return -1
to represent this. Otherwise, the return value should always be 0 or greater.
You can use the json
Python library; otherwise, no other imports are allowed. Here is the Python JSON library documentatio: https://docs.python.org/3/library/json.html Of particular note, for reading JSON files, is the json.load
function.
In [ ]:
In [ ]:
c = count_datasets("submission_partial.json")
assert c == 4
In [ ]:
c = count_datasets("submission_full.json")
assert c == 9
In [ ]:
try:
c = count_datasets("submission_nonexistent.json")
except:
assert False
else:
assert c == -1
Write a function which:
get_dataset_by_index
This function should return the dictionary corresponding to the dataset in the JSON file, or None
if an invalid index is supplied (e.g. specified 10 when there are only 4 datasets, or a negative number, or a float/string/list/non-integer type). It should also be able to handle file-related errors.
You can use the json
Python library; otherwise, no other imports are allowed.
In [ ]:
In [ ]:
import json
d = json.loads(open("partial_1.json", "r").read())
assert d == get_dataset_by_index("submission_partial.json", 1)
In [ ]:
d = json.loads(open("full_8.json", "r").read())
assert d == get_dataset_by_index("submission_full.json", 8)
In [ ]:
try:
c = get_dataset_by_index("submission_partial.json", 5)
except:
assert False
else:
assert c is None
In [ ]:
try:
c = get_dataset_by_index("submission_nonexistent.json", 4983)
except:
assert False
else:
assert c is None
Write a function which:
get_dataset_by_name
This solution is functionally identical to get_dataset_by_index
, except rather than retrieving a dataset by the integer index, you instead return a dataset by its string name.
This function should return the dictionary corresponding to the dataset in the JSON file, or None
if an invalid name is supplied. It should also be able to handle file-related errors.
You can use the json
Python library; otherwise, no other imports are allowed.
In [ ]:
In [ ]:
import json
d = json.loads(open("partial_1.json", "r").read())
assert d == get_dataset_by_name("submission_partial.json", "01.01.test")
In [ ]:
d = json.loads(open("full_8.json", "r").read())
assert d == get_dataset_by_name("submission_full.json", "04.01.test")
In [ ]:
try:
c = get_dataset_by_name("submission_partial.json", "nonexistent")
except:
assert False
else:
assert c is None
In [ ]:
try:
c = get_dataset_by_name("submission_nonexistent.json", "02.00.test")
except:
assert False
else:
assert c is None
Write a function which:
count_pixels_in_dataset
Each individual pixel is a single pair of (x, y)
numbers (that counts as 1).
If any file-related errors are encountered, or an incorrect dataset name specified, the function should return -1
.
You can use the json
Python library, or other functions you've already written in this question; otherwise, no other imports are allowed.
In [ ]:
In [ ]:
assert 29476 == count_pixels_in_dataset("submission_full.json", "01.01.test")
In [ ]:
assert 30231 == count_pixels_in_dataset("submission_full.json", "04.01.test")
In [ ]:
try:
c = count_pixels_in_dataset("submission_partial.json", "02.00.test")
except:
assert False
else:
assert c == -1