_
doesn't look like much, but as part of a name in Python it has a surprising amount of different meanings.
We all know that we should use good names. This often makes it necessary to use more than one word to describe the thing. cheese grater
is arguably a good name for a thing but it consists of two words in english1. Like most programming languages Python tokenizes source code along white space. So, for the sake of readability the system needs to be tricked into believing that several words are one while still providing some kind of visual separation for the human reader. Born were constructs like CheeseGrater
or cheese_grater
- a.k.a. CamelCase and snake_case.
The PEP-8 style guide is here to tell us - among other things - how to name things in a consistent manner.
! I personally prefer lowerCasedCamelCase
for local names bound to data and snake_case
for names bound to functions. Until 2018 I wasn't even technically violating PEP-8 when doing that. Since then I am a self confessed PEP-8 outlaw in personal projects and wherever I can get away with it.
We wouldn't have a problem like that if we used german names, where this would be a käsehobel
. Especially in german law, words like Vermögenszuordnungszuständigkeitsübertragungsverordnung are used unironically.↩
_
: it needs a name, but I won't use itUsing a single underscore as the complete name can be seen as a crutch. In certain situations it might be necessary to assign a name due to the nature of the language, but we'd rather not give it a name (because we won't even be using it anyway).
_
as a parameter nameHere we know that the function will always be called with three positional arguments but we don't need the second one:
In [ ]:
def spam(a, _, b):
return a + b
spam(1, 2, 3)
There are often more elegant ways to deal with this depending on context, but sometimes this is still the easiest way to clearly communicate this.
To take one example and offer some unasked advice we could think of a primitive home baked plugin system that simply calls the client function with positional arguments. This could be improved by always using keyword arguments like this:
In [ ]:
def spam(a, b, **_):
return a + b
spam(c=10, a=2, b=3)
This way the client function can collect whatever arrives in an un(derscore)named catch-all dict also having more flexibility regarding future API changes.
What might be even better though is if the plugin system inspects the client function and calls it only with the requested parameters making this particular underscore crutch completely unnecessary1.
In [ ]:
left, *_, right = (0, 1, 2, 3, 4)
left, right
Also not uncommon: needing to repeat something a certain number of times without needing the iteration variable:
In [ ]:
for _ in range(3):
print("hi", end="")
_
in an interactive shellThere is one case where _
can and should be used but is not manually assigned.
When you type python
on the command line you enter the so-called REPL. The underscore grows a magic functionality here by always holding the result of the last evaluation.
!!! The IPython interactive shell takes this two steps further - it provides the last three evaluation results in _
, __
and ___
.
In [ ]:
_private_name = "Please don't access me from outside"
This has also one practical implication when used together with the star import: names that start with an underscore, are not imported in that case.
_
as postfix: avoid shadowing of namesNames in Python can be freely reassigned. This is handy, but also a source of confusion and bugs. For that reason, static code analyzers warn you when you reassign the name of an inbuilt (e.g. id
) or a name already defined in an outer scope. If I am determined to use such a name, I simply add an underscore like id_
- which means: I know that this is shadowing an already defined name, but I still want to use it, so I mangle it just enough to be different. I am not sure how common this practice is, but I am pretty sure I didn't come up with it myself.
In [ ]:
id_ = 123456
print(f"{id(id_)=}")
This is veering off the original topic a bit, but I just want to mention that whatever you do - the original object a builtin points to is never lost - just shadowed. When a module is initialized, the namespace of the builtins
1 module is merged into the module. The objects can still be retrieved from builtins
whenever necessary:
While we are talking about builtins
in the context of underscores: the __builtins__
attribute is already available in the module namespace but it is not recommended to use it directly as it is a CPython implementation detail↩
In [ ]:
print = 2
try:
print("This won't work!")
except TypeError:
from builtins import print as real_print
real_print("Told you so ...")
In [ ]:
print = real_print
print("Now all is fine again.")
In [3]:
class A:
__spam = "SPAM"
def print_spam(self):
print(f"{self.__spam=}, {id(self)=}")
a = A()
a.print_spam()
Up to this point there is nothing unusual about this. When I try to access the attribute from outside though, the behaviour is different as when accessed from inside the object although a
and self
are the exact same object (as can be seen from the printed id):
In [4]:
print(f"{id(a)=}")
a.__spam
To access the original attribute, I have to know the secret name mangling formula which simplified is _<class name><attribute name>
:
In [5]:
a._A__spam
Out[5]:
The Python docs have this to say about it:
Since there is a valid use-case for class-private members (namely to avoid name clashes of names with names defined by subclasses), there is limited support for such a mechanism, called name mangling. Any identifier of the form
__spam
(at least two leading underscores, at most one trailing underscore) is textually replaced with_classname__spam
, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, as long as it occurs within the definition of a class.Name mangling is helpful for letting subclasses override methods without breaking intraclass method calls.
__magic__
: part of the language mechanicsWhen something starts and ends with a double underscore things get really serious. Special methods and special attributes are everywhere and they are a big part of Pythons special sauce.
It starts with simple module attributes that reveal information about state and internals (e.g. __name__
and __file__
) and ends with protocols that can be used to hook into the language mechanics like e.g. creating a context manager by implementing __enter__
and __exit__
on an object.
I won't go into more details here as this would far exceed the scope of this innocent little article about the underscore, but I'll leave a warning here:
!!!! This kind of naming scheme should be regarded as strictly Python internal. You usually don't want to use this for names in your own programs (there are exceptions to that rule though as usual1).
Enough underscores for today, I would say. If you know of other conventions and uses of the underscore, please let me know.
See e.g. pytest using __tracebackhide__
as a function local name in order to instruct pytest to hide a function from a traceback↩