1.9 查找两字典的相同点

怎样在两字典中寻找相同点（相同的key or 相同的 value）



In [1]:

    
a = {
    'x':1,
    'y':2,
    'z':3
}
b = {
    'w':10,
    'x':11,
    'y':2
}
# In a ::: x : 1 ,y : 2
# In b ::: x : 11,y : 2

为寻找两字典的相同点可通过简单的在两字典keys() or items() method 中Return 结果进行set 操作



In [4]:

    
# Find keys in common
kc = a.keys() & b.keys()
print('a 和 b 共有的键',kc)
# Find keys in a that are not in b
knb = a.keys() - b.keys()
print('a 有的键 而 b 没有的键',knb)
# Find (key value) pair in common
kv = a.items() & b.items()
print('a 和 b 共有的元素',kv)









    



a 和 b 共有的键 {'y', 'x'}
a 有的键 而 b 没有的键 {'z'}
a 和 b 共有的元素 {('y', 2)}

以上操作亦可用于修改or过滤dict element
if you want 以现有dict 来构造一个排除指定key的new dict 下面利用dict 推到来实现 this 需求



In [3]:

    
# Make a new dictionary with certain keys remove
c = {key : a[key] for key in a.keys() - {'z','w'}}
# c 排除Le {'z' and 'w'}对应的value
print(c)









    



{'y': 2, 'x': 1}

当然 'w'并没有出现在a中就算for 上也无大碍

一个dict 即 a key and value 's set 的映射关系
dict 的 keys() method to Return a 展现 key set 's key view object

键视图 (dict.keys()) 支持set 操作比如并交差运算【这样就不用将key 转换成set】
元素视图 (dict.items()) 支持set运算
值视图 (dict.values() 不支持set 运算若是需要set操作【这样就需要将value 转换成set】

1.10 删除序列相同元素并保持顺序

怎样在一个序列上保持顺序的同时并消除重复的值

if sequence 上的value 都是 hashalbe 类型则可利用set or generator 来实现



In [9]:

    
def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)



In [10]:

    
a  = [1,3,5,6,7,8,1,5,1,10]
list(dedupe(a))









    Out[10]:





[1, 3, 5, 6, 7, 8, 10]



In [11]:

    
len(a)- len(_)









    Out[11]:





3

上述方法仅仅在sequence 中element 属于hashable (不能改变其顺序同时其存储结构亦不可改变)



In [12]:

    
def dedupe2(items,key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

这里key args 指定一个函数将sequence element 转换成 hashable 类型



In [13]:

    
b = [{'x':1,'y':2},{'x':1,'y':3},{'x':1,'y':2},{'x':1,'y':4}]
bd = list(dedupe2(b,key=lambda d: (d['x'],d['y'])))
bd2 = list(dedupe2(b,key=lambda d: (d['x'])))



In [19]:

    
print('删除满足重复的d[\'x\'],d[\'y\']格式的元素')
bd
# 因为以上格式而又不重复的即为此3 其中{'x':1,'y':2}重复了两次 则被删除









    



删除满足重复的d['x'],d['y']格式的元素






    Out[19]:





[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 4}]



In [18]:

    
print('删除满足重复的d[\'x\']格式的元素')
bd2
# 因为以上格式而又不重复的即为此2 b中所有元素 都是以{'x': 1开始的 所以返回第一个element









    



删除满足重复的d['x']格式的元素






    Out[18]:





[{'x': 1, 'y': 2}]

若是仅仅为了除重将之转化成set类型即可



In [21]:

    
ra = [1,1,1,1,1,1,1]
set(ra)









    Out[21]:





{1}

不过这样即会使元素本身顺序改变生成的结果中element 位置被打乱
使用generator 可让函数更加通用以下是对文件本身进行消除重复行



In [24]:

    
with open('somefile.txt','r') as f:
    for line in dedupe2(f):
        print(line)









    



zlxs

本身somefile 中有10行zlxs 结果显示一行
key 函数模仿了 sorted min max 函数内置函数的类似功能 key 多了这个以lambda 匿名函数的大力帮助

1.11 命名切片

你的程序已经出现一大堆无法直视的硬编码切片下标



In [46]:

    
ns = ''
for nn in range(10):
    ns += str(nn)
ns = ns*3

提高代码的可读性与可维护性



In [49]:

    
items = [1,2,3,4,5,6,7,8,9]
a = slice(2,4)



In [50]:

    
items[2:4]









    Out[50]:





[3, 4]



In [53]:

    
items[a]









    Out[53]:





[3, 4]



In [54]:

    
items[a] =[10,11]
items









    Out[54]:





[1, 2, 10, 11, 5, 6, 7, 8, 9]



In [55]:

    
del items[a]
items









    Out[55]:





[1, 2, 5, 6, 7, 8, 9]

此时a是一个slice object 可以分别调用 start stop step



In [56]:

    
s = slice(5,10,2)
print('s 起始',s.start)
print('s 终末',s.stop)
print('s 公差',s.step)









    



s 起始 5
s 终末 10
s 公差 2



In [59]:

    
st = 'helloworld'
a.indices(len(s))









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-59-aeaad7157deb> in <module>()
      1 st = 'helloworld'
----> 2 a.indices(len(s))

TypeError: object of type 'slice' has no len()

序列出现次数最多的元素

怎么找出序列中的出现次数的元素

collections.Counter 类即专门为此类问题设计同时拥有most_common 方法直接获得



In [2]:

    
words = [
    'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
    'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
    'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
    'my', 'eyes', "you're", 'under'
]
# 找出words 中 出现频率最高的单词
from collections import Counter
word_counts = Counter(words)
# 出现频率最高德3个单词
top_three = word_counts.most_common(3)
print(top_three)









    



[('eyes', 8), ('the', 5), ('look', 4)]

作为输入 Counter对象可接受任意的hashable sequence 对象在底层实现上，一个Counter 对象即为一个dict 将element 映射到他出现的次数上



In [3]:

    
word_counts['not']









    Out[3]:





1



In [4]:

    
word_counts['eyes']









    Out[4]:





8

若是又有另一个dict中的words 若是想加上计算其中的频率



In [6]:

    
morewords = ['why','are','you','looking','in','eyes']
for word in morewords:
    word_counts[word] += 1
word_counts['eyes']









    Out[6]:





9

上述示例中morewords还有eyes单词所以利用for循环可利用Counter对象再次计算还可以使用update method



In [7]:

    
word_counts.update(morewords)



In [8]:

    
word_counts['eyes']









    Out[8]:





10

Counter instance 是一个鲜为人知的特性可与数学运算操作结合



In [9]:

    
a = Counter(words)
b = Counter(morewords)



In [10]:

    
a









    Out[10]:





Counter({'around': 2,
         "don't": 1,
         'eyes': 8,
         'into': 3,
         'look': 4,
         'my': 3,
         'not': 1,
         'the': 5,
         'under': 1,
         "you're": 1})



In [11]:

    
b









    Out[11]:





Counter({'are': 1, 'eyes': 1, 'in': 1, 'looking': 1, 'why': 1, 'you': 1})



In [12]:

    
# combine counts
# 合并
c = a + b
print(c)









    



Counter({'eyes': 9, 'the': 5, 'look': 4, 'into': 3, 'my': 3, 'around': 2, "don't": 1, 'why': 1, 'are': 1, 'looking': 1, 'not': 1, 'you': 1, 'in': 1, 'under': 1, "you're": 1})



In [14]:

    
# Subtract counts
# 减取
d = a -b
print(d)









    



Counter({'eyes': 7, 'the': 5, 'look': 4, 'into': 3, 'my': 3, 'around': 2, "don't": 1, 'not': 1, 'under': 1, "you're": 1})

1.9 查找两字典的相同点

怎样在两字典中寻找相同点（相同的key or 相同的 value）

为寻找两字典的相同点 可通过简单的在两字典keys() or items() method 中Return 结果 进行set 操作

以上操作亦可用于修改or过滤dict element if you want 以现有dict 来构造一个排除指定key的new dict 下面利用dict 推到来实现 this 需求

当然 'w'并没有出现在a中 就算for 上也无大碍

一个dict 即 a key and value 's set 的映射关系dict 的 keys() method to Return a 展现 key set 's key view object

1.10 删除序列相同元素并保持顺序

怎样在一个序列上保持顺序的同时并消除重复的值

if sequence 上的value 都是 hashalbe 类型 则可利用set or generator 来实现

上述方法仅仅在sequence 中element 属于hashable (不能改变其顺序 同时其存储结构亦不可改变)

这里key args 指定一个函数 将sequence element 转换成 hashable 类型

若是仅仅为了除重 将之转化成set类型即可

不过这样即会使元素本身顺序改变 生成的结果中element 位置被打乱使用generator 可让函数更加通用 以下是对文件本身进行消除重复行

本身somefile 中有10行zlxs 结果显示一行key 函数 模仿了 sorted min max 函数内置函数的类似功能 key 多了这个以lambda 匿名函数的大力帮助

1.11 命名切片

你的程序已经出现一大堆无法直视的硬编码切片下标

提高代码的可读性与可维护性

此时a是一个slice object 可以分别调用 start stop step

序列出现次数最多的元素

怎么找出序列中的出现次数的元素

collections.Counter 类即专门为此类问题设计 同时拥有most_common 方法直接获得

作为输入 Counter对象可接受任意的hashable sequence 对象 在底层实现上，一个Counter 对象即为一个dict 将element 映射到他出现的次数上

若是又有另一个dict中的words 若是想加上计算其中的频率

上述示例中morewords还有eyes单词 所以 利用for循环 可利用Counter对象再次计算 还可以使用update method

Counter instance 是一个鲜为人知的特性 可与数学运算操作结合

Counter 对象在几乎所有需要指标或计数数据的场合 非常有用 在解决这类问题 不要手动利用dict 实现!

为寻找两字典的相同点可通过简单的在两字典keys() or items() method 中Return 结果进行set 操作

以上操作亦可用于修改or过滤dict element
if you want 以现有dict 来构造一个排除指定key的new dict 下面利用dict 推到来实现 this 需求

当然 'w'并没有出现在a中就算for 上也无大碍

一个dict 即 a key and value 's set 的映射关系
dict 的 keys() method to Return a 展现 key set 's key view object

if sequence 上的value 都是 hashalbe 类型则可利用set or generator 来实现

上述方法仅仅在sequence 中element 属于hashable (不能改变其顺序同时其存储结构亦不可改变)

这里key args 指定一个函数将sequence element 转换成 hashable 类型

若是仅仅为了除重将之转化成set类型即可

不过这样即会使元素本身顺序改变生成的结果中element 位置被打乱
使用generator 可让函数更加通用以下是对文件本身进行消除重复行

本身somefile 中有10行zlxs 结果显示一行
key 函数模仿了 sorted min max 函数内置函数的类似功能 key 多了这个以lambda 匿名函数的大力帮助

collections.Counter 类即专门为此类问题设计同时拥有most_common 方法直接获得

作为输入 Counter对象可接受任意的hashable sequence 对象在底层实现上，一个Counter 对象即为一个dict 将element 映射到他出现的次数上

上述示例中morewords还有eyes单词所以利用for循环可利用Counter对象再次计算还可以使用update method

Counter instance 是一个鲜为人知的特性可与数学运算操作结合

Counter 对象在几乎所有需要指标或计数数据的场合非常有用在解决这类问题不要手动利用dict 实现!