Import the json package
Assign the path of the json content file to the path variable.
In [2]:
import json
path = r'C:\Users\hrao\Documents\Personal\HK\Books\pydata-book-master\pydata-book-master\ch02\usagov_bitly_data2012-03-16-1331923249.txt'
Open the file located in the path directory, one line at a time, and store it in a list called records.
In [3]:
records = [json.loads(line) for line in open(path,'r')]
In [4]:
type(records)
Out[4]:
In [5]:
records[0]
Out[5]:
Calling a specific key within the list
In [6]:
records[0]['tz']
Out[6]:
Printing all time zone values in the records list.
Here we search for the string 'tz' in each element of the records list.
If the search returns a string, then we print the corresponding value of the key 'tz' for that element.
In [7]:
time_zones = [rec['tz'] for rec in records if 'tz' in rec]
In [8]:
time_zones[:10]
Out[8]:
Counting the frequency of each time zone's occurrence in the list using a dict type in Python
In [11]:
counts = {}
for x in time_zones:
if x in counts:
counts[x] = counts.get(x,0) + 1
else:
counts[x] = 1
print(counts)
In [14]:
from collections import defaultdict
counts = defaultdict(int)
for x in time_zones:
counts[x] += 1
print(counts)
In [20]:
counts['America/New_York']
Out[20]:
In [19]:
len(time_zones)
Out[19]:
To list the top n time zone occurrences
In [23]:
def top_counts(count_dict, n):
value_key_pairs = [(count, tz) for tz, count in count_dict.items()]
value_key_pairs.sort()
return value_key_pairs[-n:]
In [24]:
top_counts(counts,10)
Out[24]:
In [25]:
from collections import Counter
counts = Counter(time_zones)
counts.most_common(10)
Out[25]: