The collections module is a tresure trove of a built-in module that implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers. This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.
Name | Description |
---|---|
namedtuple() | factory function for creating tuple subclasses with named fields |
deque | list-like container with fast appends and pops on either end |
ChainMap | dict-like class for creating a single view of multiple mappings |
Counter | dict subclass for counting hashable objects |
OrderedDict | dict subclass that remembers the order entries were added |
defaultdict | dict subclass that calls a factory function to supply missing values |
UserDict | wrapper around dictionary objects for easier dict subclassing |
UserList | wrapper around list objects for easier list subclassing |
UserString | wrapper around string objects for easier string subclassing |
The ChainMap class manages a list of dictionaries, and can be used to searche through them in the order they are added to find values for associated keys.
It makes a good "context" container, as it can be visualised as a stack for which changes happen as soon as the stack grows, with these changes being discarded again as soon as the stack shrinks.
Treat it as a view table in DB, where actual values are still stored in their respective table and we can still perform all the operation on them.
In [4]:
import collections
# from collections import ChainMap
a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}
m = collections.ChainMap(a, b)
print('Individual Values')
print('a = {}'.format(m['a']))
print('b = {}'.format(m['b']))
print('c = {}'.format(m['c']))
print("-"*20)
print(type(m.keys()))
print('Keys = {}'.format(list(m.keys())))
print('Values = {}'.format(list(m.values())))
print("-"*20)
print('Items:')
for k, v in m.items():
print('{} = {}'.format(k, v))
print("-"*20)
print('"d" in m: {}'.format(('d' in m)))
In [9]:
a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}
m = collections.ChainMap(a, b)
lst = []
for v in m.keys():
lst.append(v)
for v in m.values():
lst.append(v)
print(lst)
The child mappings are searched in the order they are passed to the constructor, so the value reported for the key 'c' comes from the a dictionary.
In [12]:
import collections
a = {'a': '1', 'c': '3'}
b = {'b': '2', 'c': '33'}
cm = collections.ChainMap(a, b)
print(cm.maps)
print('c = {}\n'.format(cm['c']))
# reverse the list
cm.maps = list(reversed(cm.maps)) # m = collections.ChainMap(b, a)
print(cm.maps)
print('c = {}'.format(cm['c']))
When the list of mappings is reversed, the value associated with 'c' changes.
In [13]:
import collections
a = {'a': '1', 'c': '3'}
b = {'b': '2', 'c': '33'}
m = collections.ChainMap(a, b)
print('Before: {}'.format(m['c']))
a['c'] = '3.3'
print('After : {}'.format(m['c']))
In [19]:
import collections
a = {'a': '1', 'c': '3'}
b = {'b': '2', 'c': '33'}
cm = collections.ChainMap(b, a)
print(cm.maps)
print('Before: {}'.format(cm['c']))
a['c'] = '3.3'
print('After : {}'.format(cm['c']))
Changing the values associated with existing keys and adding new elements works the same way.
It is also possible to set values through the ChainMap directly, although only the first mapping in the chain is actually modified.
In [20]:
import collections
a = {'a': '1', 'c': '3'}
b = {'b': '2', 'c': '33'}
cm = collections.ChainMap(a, b)
print('Before: {}'.format(cm['c']))
cm['c'] = '3.3'
print('After : {}'.format(cm['c']))
print(a['c'])
print(b['c'])
In [21]:
import collections
a = {'a': '1', 'c': '3'}
b = {'b': '2', 'c': '33'}
cm = collections.ChainMap(b, a)
print('Before: {}'.format(cm['c']))
cm['c'] = '3.3'
print('After : {}'.format(cm['c']))
print(a['c'])
print(b['c'])
In [25]:
import collections
a = {'a': '1', 'c': '3'}
b = {'b': '2', 'c': '33'}
cm = collections.ChainMap(a, b)
print('Before: {}'.format(cm['c']))
cm['d'] = '3.3'
print('After : {}'.format(cm['c']))
print(cm.maps)
print(a)
print(b)
When the new value is stored using m, the a mapping is updated.
ChainMap provides a convenience method for creating a new instance with one extra mapping at the front of the maps list to make it easy to avoid modifying the existing underlying data structures.
This stacking behavior is what makes it convenient to use ChainMap instances as template or application contexts. Specifically, it is easy to add or update values in one iteration, then discard the changes for the next iteration.
In [50]:
import collections
a = {'a': '1', 'c': '3'}
b = {'b': '2', 'c': '33'}
m1 = collections.ChainMap(a, b)
m2 = m1.new_child()
print('m1 before:', m1)
print('m2 before:', m2)
m2['c'] = '3.3'
print('m1 after:', m1)
print('m2 after:', m2)
For situations where the new context is known or built in advance, it is also possible to pass a mapping to new_child().
In [51]:
import collections
a = {'a': '1', 'c': '3'}
b = {'b': '2', 'c': '33'}
c = {'c': '333'}
m1 = collections.ChainMap(a, b)
m2 = m1.new_child(c)
print('m1["c"] = {}'.format(m1['c']))
print('m2["c"] = {}'.format(m2['c']))
print(m2)
#This is the equivalent of
m2_1 = collections.ChainMap(c, *m1.maps)
print(m2_1)
In [33]:
# Tally occurrences of words in a list
from collections import Counter
cnt = Counter()
for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
cnt[word] += 1
Counter({'blue': 3, 'red': 2, 'green': 1})
# Find the ten most common words in Hamlet
import re
words = re.findall(r'\w+', open('hamlet.txt').read().lower())
Counter(words).most_common(10)
Out[33]:
Where as Counter can be used:
In [38]:
l = [1 ,23 , 23, 44, 4, 44, 55, 555, 44, 32, 23, 44, 56, 64, 2, 1]
lstCounter = Counter(l)
print(lstCounter)
print(lstCounter.most_common(4))
In [41]:
sentance = "The collections module is a tresure trove of a built-in module that implements " + \
"specialized container datatypes providing alternatives to Python’s general purpose " + \
"built-in containers."
wordList = sentance.split(" ")
Counter(wordList).most_common(3)
Out[41]:
In [42]:
# find the most common words
# Methods with Counter()
c = Counter(wordList)
print(c.most_common(4))
print(c.items())
In [48]:
d = {"a": 1, "b": 2}
print(d)
print(d['a'])
print(d['d'])
In [57]:
from collections import defaultdict
dd = defaultdict(object)
print(dd)
print(dd['one'])
print(dd)
dd['Two'] = 2
print(dd)
for d in dd:
print(d)
print(dd[d])
In [53]:
help(defaultdict)
In [58]:
# Initializing with default value
dd = defaultdict(1)
print(dd)
print(dd['one'])
print(dd)
dd['Two'] = 2
print(dd)
for d in dd:
print(d)
print(dd[d])
In [63]:
# Using factory function
import collections
def default_factory():
return 'default value'
d = collections.defaultdict(default_factory, india='new delhi')
print('d:', d)
print('india =>', d['india'])
print('bar =>', d['bar'])
print(d)
In [70]:
# Using factory function
import collections
def default_factory():
return 'Bhopal'
d = collections.defaultdict(default_factory,
{"india": 'new delhi',
"karnataka":"Bangaluru"})
print('d:', d)
print('india =>', d['india'])
print('MP =>', d['MP'])
print(d)
In [61]:
# Using factory function
# ---------------------------------------------------
# TODO: How can i pass value to the default function
# ---------------------------------------------------
import collections
def default_factory():
return 'default value'
d = collections.defaultdict(default_factory, foo='bar')
print('d:', d)
print('foo =>', d['foo'])
print('bar =>', d['bar'])
In [ ]:
# Using list as the default_factory, it is easy to group a sequence of key-value pairs into a dictionary of lists:
from collections import defaultdict
countryList = [("India", "New Delhi"), ("Iceland", "Reykjavik"),
("Indonesia", "Jakarta"), ("Ireland", "Dublin"),
("Israel", "Jerusalem"), ("Italy", "Rome")]
d = defaultdict(list)
for country, capital in countryList:
d[country].append(capital)
print(d.items())
In [72]:
# Setting the default_factory to int makes the defaultdict useful for counting
quote = 'Vande Mataram'
dd = defaultdict(int)
print(dd)
for chars in quote:
dd[chars] += 1
print(dd.items())
print(dd['T'])
In [94]:
import collections
d = collections.deque('Vande Mataram')
print('Deque:', d)
print('Length:', len(d))
print('Left end:', d[0])
print('Right end:', d[-1])
d.remove('e')
print('remove(e):', d)
In [83]:
import collections
# Add to the right
d1 = collections.deque()
d1.extend('Vande')
print('extend :', d1)
for a in " Mataram":
d1.append(a)
d1.extend(" !!!")
print('append :', d1)
d1.extendleft(" #!* ")
print('append :', d1)
# Add to the left
d2 = collections.deque()
d2.extendleft(range(6))
print('extendleft:', d2)
d2.appendleft(6)
print('appendleft:', d2)
In [ ]:
In [86]:
fruitsCount = {}
fruitsCount["apple"] = 10
fruitsCount["grapes"] = 120
fruitsCount["mango"] = 200
fruitsCount["kiwi"] = 2000
fruitsCount["leeche"] = 20
print(fruitsCount)
for fruit in fruitsCount:
print(fruit)
In [88]:
# Now lets try this with OrderedDict
from collections import OrderedDict as OD
fruitsCount = OD()
fruitsCount["apple"] = 10
fruitsCount["grapes"] = 120
fruitsCount["mango"] = 200
fruitsCount["kiwi"] = 2000
fruitsCount["leeche"] = 20
print(fruitsCount)
for fruit in fruitsCount:
print(fruit)
In [95]:
from collections import namedtuple
Point = namedtuple("India", ['x', 'y', "z"]) # Defining the namedtuple
p = Point(10, y=20, z = 30) # Creating an object
print(p)
print(p.x + p.y + p.z)
p[0] + p[1] # Accessing the values in normal way
x, y, z = p # Unpacking the tuple
print(x)
print(y)