Writing Great Code

Y.-W. FANG at Kyoto University

这个章节对于 python 初学者真的是极其重要的。

This chapter focuses on writing the great code. You will see it is very useful to help you develop some good manners.

Code style

Pythonistas (vesteran Python developers)总是对 python 这一预言引以为豪的,因为即使不懂 python 的人,阅读一些简单源代码后也可以懂得这个 python 程序的功能。Readability is at the heart of Python‘s design。Python 之所以具有较好的可读性,其关键点在于完备的代码书写准则(Python enhancement proposals PEP 20 and PEP8)和 “Pythonic” idioms。

PEP8

PEP8 is the de facto code style guide for Python. It covers naming conventions, code layout, whitespace (tabs versus spaces), and other similar style topics. 在写 Python 代码时,遵从 PEP8 是非常有必要的,这有利于在代码开发过程中与其他开发者交流合作,也有利于代码的阅读。

pycodestyle 是个可以帮助指出 python 代码中不符合 ‘PEP8’ 标准之处的工具(原名叫做 pep8),安装很简单,就是 pip install pycodestyle. 使用方法为

pycodestyle python.py

还有一个工具是 autopep8,它可以直接format一个python代码,命令为

autopep8 --in-place python.py

但是如果你并不想 in-plance 方式重新标准化这个代码,可以移除 --in-place,即使用

autopepe8 python.py

这一命令会将标准格式化的代码打印在屏幕上. 此外,--aggresive 这个 flag 可以帮助做一些更加彻底的标准格式化,并且可以使用多次使得代码更加符合标准。

PEP20 (a.k.a The Zen of Python)

PEP 20 is the set of guiding principles for decision making in Python. 它的全文为


In [7]:
import this


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

实际上 PEP20 只包含了 19 条格言,并不是 20 条,据说第二十条还没被写出来,哈哈。

General Advice

This sections contains style concepts that are hopefully easy to accept without debte. They can be applicable to lanuages other than Python

Errors should never pass silently/Unless explicitly silenced

Error handling in Python is done using the 'try' statement. Don't let errors pass silently: always explicitly indentify by name the exceptions you will catch, and handle only those exceptions.

Function arguments should be intuitive to use

尽管在 Python 中定义函数时,如何选择 positional arguments 和 optional arguments在于写程序的人。不过,最好只使用一种非常显性的方式来定义函数:

  • easy to read (meaning the name and arguments need no explanation)
  • easy to change (meaning adding a new keyword argument won't break other parts of the code)

We are responsible users

Although many tricks are allowed in Python, some of them are potentionally dangerous. A good example is that any client code can override an object's properties and methods: there is no 'private' keyword in Python.

The main convention for private properties and implementation details is to prefix all "internals" with an underscore, e.g., sys._getframe.

Return values from one place

When possible, keep a single exit point--it's difficult to debug functions when you first have to indentify which return statement is responsible for your result.

Conventions

书中给出了多个 convetions 的案例,值得赞誉的是,书中把xi'g涉及到list的操作,‘逻辑相等’的检查等,请查看原书。


In [37]:
a = [1, 2, 3, 4, 5]
b = [x*2 for x in a]
c = list(map(lambda i: i+2, a))
print(a)
print(b)
print(c)


[1, 2, 3, 4, 5]
[2, 4, 6, 8, 10]
[3, 4, 5, 6, 7]

Idioms

Good idioms must be consciously acquired.

Unpacking

If we know the length of a list or tuple, you can assign names to its elements with unpacking.


In [31]:
filename, ext = "my_photo.orig.png".rsplit(".", 1)
print(filename, 'is a', ext, 'file.')


my_photo.orig is a png file.

We can use unpacking to swap variables as well:


In [32]:
a = 1
b = 2
a, b = b, a
print('a is', a)
print('b is', b)


a is 2
b is 1

Nested unpacking can also work,


In [33]:
a, (b, c) = 1, (2, 3)
print(a)
print(b)
print(c)


1
2
3

In Python 3, a new method of extended unpacking was introduced by PEP 3132:


In [36]:
a, *rest = [1, 2, 3]
x, *y, z = [5, 28, 33, 4]
print(a)
print(rest)
print(x)
print(y)
print(z)


1
[2, 3]
5
[28, 33]
4

Ignoring a value

当我们使用unpacking时候,有些变量可能我们在程序中根本用不到,这个时候可以使用double underscore (__)


In [42]:
filename = 'foobar.txt'
basename, __, ext = filename.rpartition('.')
print(basename)
print(__)
print(ext)


foobar
.
txt

Creating a length-$N$ list of the same thing

Use the Python list * operator to make a list of the same immutable iterm:


In [43]:
four_nones = [None] * 4
print(four_nones)


[None, None, None, None]

由于list是可变对象,'*' 所完成的操作就是创建出具有N个相同元素的list. But be careful with mutable objects: because lists are mutable, the * operator will create a list of $N$ references to the ${same}$ list, which is not likely what you want. Instead, use a list comprehensions.


In [67]:
four_lists = [[1]] * 4
print(four_lists)
four_lists[0].append('Ni')
print(four_lists)


[[1], [1], [1], [1]]
[[1, 'Ni'], [1, 'Ni'], [1, 'Ni'], [1, 'Ni']]

正如上面的代码所示,虽然我们本意只是想让第1个元素发生变化,但是所有四个元素都增加了字符'Ni'. 上面的结果并不是我们想要的,如过只想改第一个元素,则应该采用如下代码


In [73]:
four_lists = [[1] for __ in range(4)]
print(four_lists)
four_lists[0].append('Ni')
print(four_lists)


[[1], [1], [1], [1]]
[[1, 'Ni'], [1], [1], [1]]

A common idion for creating strings is to use str.join() on an empty string. This idion can be applied to lists and tuples:


In [75]:
letters = ['l', 'e', 't', 't', 'e', 'r']
word = ''.join(letters)
print(word)


letter

In [76]:
letters = ('l', 'e', 't')
word = ''.join(letters)
print(word)


let

有时候我们需要在一个集合里面搜索一些东西。让我们来看两个操作,分别是lists和sets


In [77]:
x = list(('foo', 'foo', 'bar', 'baz'))
y = set(('foo', 'foo', 'bar', 'baz'))

print(x)
print(y)


['foo', 'foo', 'bar', 'baz']
{'foo', 'baz', 'bar'}

In [78]:
'foo' in x


Out[78]:
True

In [80]:
'foo' in y


Out[80]:
True

尽管最后的两个boolean测试看起来结果是一样的,但是 ${foo}$ in $y$ is utilizing the fact that sets (and dictrionaries) in Python are hash tables, the lookup performace between the two examples is different. Python will have to step through each item in the list to find a matching case, which is time-consuming (for large collections). But finding keys in the set can be done quickly using the hash lookup. Also, sets and dictrionaries drop duplicate eentries, which is why dictrionaries cannot have two identical keys.

Exceptrion-sefe contexts

这部分不太懂,所以我不再笔记。以后重新阅读时补全---2018 May 7th


In [ ]:


In [ ]:

Commom Gotchas

尽管Python试图让与演变的连贯、简单,并且避免令人惊讶的地方,但是实际上对于一些初学者而言,有些东西看起来是很‘出人意料的’.

Mutable default arguments

最让初学者感到意外的是,在函数定义中,Python 对待可变默认参数的方式。

例如下面这个例子


In [7]:
def append_to(element, to=[]):
    to.append(element)
    return to
    
my_list = append_to(10)
print(my_list)
your_list = append_to(12)

print(my_list)
print(your_list)


[10]
[10, 12]
[10, 12]

这个结果对于熟悉C和Fortran语言的人或者初学者都是很意外的,因为初学者可能期望的结果是这样的

[10]

[10]

[12]

他们(当然也包括以前的我)往往会认为,每次call这个function的时候,就有新的list被创建,除非我们给出了第二个关于to的参数。然而,事实上在python中,一旦这个函数被调用后创建了这个list后,这个list会在之后的函数调用中连贯得使用下去。 上面的代码中,第一次调用后,返回的to这个list中只包含了10,但是第二次调用的时候,它继续被使用,并且增加了一个元素12,指向to这个list的列表有两个,分波是my_list和your_list,因此最后输出的时候它们都是[10,12]

在 python 中,假设我们已经改变了某个可变的变量,那么这个可变量就会在之后的程序中继续被用到。

为了避免上述出现的情况,我们可以这样做:

Create a new object each time the function is called, by using a default arg to signal that no argument was provided (None is often a good choice):


In [15]:
def append_to(element, to=None):
    b = to is None
    print(b)
    if to is None:
        to = []
    to.append(element)
    return to

mylist = append_to(10)
print(mylist)
yourlist = append_to(12,[2])
print(mylist)
print(yourlist)


True
[10]
False
[10]
[2, 12]

上述代码中,第一次调用函数时,因我没有给出第二个参数,所以 to 这个list默认就是空的,即None,所以第一次调用后 my_list 是 [10]. 第二次调用时,因为我给出了第二个参数是[2], 因此第二次调用时的 yourlist 是在 [2] 这个list中增加了一个元素12,因此得到的 yourlist 是 [2, 12].

${When this gotcha isn't a gothca:}$

Sometimes we can specifically 'exploit' this behavior to maintain the state between calls of a function. This is ofter done when writing a caching function (which stores results in-momery), for example: (这段话我不是太理解,不过暂时先写下来?????)


In [ ]:
def time_consuming_function(x, y, cache={}):
    args = (x, y)
    if args in cache:
        return cache[args]
    #Otherwise this is the first time with these arguments
    #Do the time-consuming operation
    cache[args] = result
    return result

Late binding closures

Another common source of confiusion is the way Python binds its variables in closures (or in the surronding global scope).

先看个例子:


In [81]:
def creat_multipliers():
    return [lambda x: i*x for i in range(5)]

for multiplier in creat_multipliers():
    print(multiplier(2), end="...")
print()
# you wuold think the result is 0...2...4...6...8...
#but actually waht you get is 8...8...8...8...
#about the late binding closure, you can also find the discussion on stackflow
# https://stackoverflow.com/questions/36463498/late-binding-python-closures


8...8...8...8...8...

Such a result would be superising for Python beginers. Why do we get this result? Python's closures are ${late binding}$. This means that the values of variables used in closures are looked up at the time the inner function is called.


In [ ]:
def create_multipliers():
    multipliers = []
    
    for i in range(5):
        def multiplier(x):
            return i * x
        multipliers.append(multiplier)
    return multipliers


creat_multipliers

What we should do instead:


In [94]:
def create_multipliers():
    return [lambda x, i=i: i*x for x in range(5)]

alternatively, we can use the functools.partial() function:


In [96]:
from functools import partial
from operator import mul


def create_multipliers():
    return [partial(mul, i) for i in range(5)]

In [ ]:

Structuring Your Project


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [2]:
import unittest
def fun(x):
    return x+1

class MyTest(unittest.TestCase):
    def test_that_fun_adds_one(self):
        self.assertEqual(fun(3),4)
        
class MySecondTest(unittest.TestCase):
    def test_that_fun_fails_when_not_adding_number(self):
        self.assertRaises(TypeError, fun, 'multiply six by nine')

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: