Timing

Quickly time a single line.


In [1]:
import math
import ubelt as ub
timer = ub.Timer('Timer demo!', verbose=1)
with timer:
    math.factorial(100000)


tic('Timer demo!')
...toc('Timer demo!')=0.1446s

Robust Timing and Benchmarking

Easily do robust timings on existing blocks of code by simply indenting them. The quick and dirty way just requires one indent.


In [2]:
import math
import ubelt as ub
for _ in ub.Timerit(num=200, verbose=3):
    math.factorial(10000)


Timing for: 200 loops, best of 3
Timed for: 200 loops, best of 3
    body took: 473.4 ms
    time per loop: best=1.938 ms, mean=2.208 ± 0.43 ms

Loop Progress

ProgIter is a (mostly) drop-in alternative to `tqdm https://pypi.python.org/pypi/tqdm`__. The advantage of ProgIter is that it does not use any python threading, and therefore can be safer with code that makes heavy use of multiprocessing.

Note: ProgIter is now a standalone module: pip intstall progiter)


In [3]:
import ubelt as ub
import math
for n in ub.ProgIter(range(7500)):
     math.factorial(n)


 7500/7500... rate=2151.77 Hz, eta=0:00:00, total=0:00:03, wall=22:29 ESTTT

In [4]:
import ubelt as ub
import math
for n in ub.ProgIter(range(7500), freq=2, adjust=False):
     math.factorial(n)
        
# Note that forcing freq=2 all the time comes at a performance cost
# The default adjustment algorithm causes almost no overhead


 7500/7500... rate=560.16 Hz, eta=0:00:00, total=0:00:06, wall=22:29 ESTTST

In [5]:
>>> import ubelt as ub
>>> def is_prime(n):
...     return n >= 2 and not any(n % i == 0 for i in range(2, n))
>>> for n in ub.ProgIter(range(1000), verbose=2):
>>>     # do some work
>>>     is_prime(n)


    0/1000... rate=0 Hz, eta=?, total=0:00:00, wall=22:29 EST
    1/1000... rate=109950.53 Hz, eta=0:00:00, total=0:00:00, wall=22:29 EST
  257/1000... rate=209392.86 Hz, eta=0:00:00, total=0:00:00, wall=22:29 EST
  642/1000... rate=142079.56 Hz, eta=0:00:00, total=0:00:00, wall=22:29 EST
 1000/1000... rate=105135.94 Hz, eta=0:00:00, total=0:00:00, wall=22:29 EST

Caching

Cache intermediate results in a script with minimal boilerplate.


In [6]:
import ubelt as ub
cfgstr = 'repr-of-params-that-uniquely-determine-the-process'
cacher = ub.Cacher('test_process', cfgstr)
data = cacher.tryload()
if data is None:
    myvar1 = 'result of expensive process'
    myvar2 = 'another result'
    data = myvar1, myvar2
    cacher.save(data)
myvar1, myvar2 = data

Hashing

The ub.hash_data constructs a hash corresponding to a (mostly) arbitrary ordered python object. A common use case for this function is to construct the cfgstr mentioned in the example for ub.Cacher. Instead of returning a hex, string, ub.hash_data encodes the hash digest using the 26 lowercase letters in the roman alphabet. This makes the result easy to use as a filename suffix.


In [7]:
import ubelt as ub
data = [('arg1', 5), ('lr', .01), ('augmenters', ['flip', 'translate'])]
ub.hash_data(data)


Out[7]:
'5f5fda5e8257a95ffc715e892f981202d88c324d0765a0a05cc9cf0b5303b32c38c4f6c38257989a90ba0708a21e7ea1611891bb0df8c714fd43b2ef5d09f6d8'

In [8]:
import ubelt as ub
data = [('arg1', 5), ('lr', .01), ('augmenters', ['flip', 'translate'])]
ub.hash_data(data, hasher='sha512', base='abc')


Out[8]:
'hpwwtvadnjcbcqwnkszdwokpdvpobngaeyaezlhjxdnbomfmylfhzwvujojiufnkmvpeyavayebrvggzjecbyqyuomglxwklwvcldjqoiofqu'

Command Line Interaction

The builtin Python subprocess.Popen module is great, but it can be a bit clunky at times. The os.system command is easy to use, but it doesn't have much flexibility. The ub.cmd function aims to fix this. It is as simple to run as os.system, but it returns a dictionary containing the return code, standard out, standard error, and the Popen object used under the hood.


In [9]:
import ubelt as ub
info = ub.cmd('cmake --version')
# Quickly inspect and parse output of a 
print(info['out'])


cmake version 3.11.0-rc2

CMake suite maintained and supported by Kitware (kitware.com/cmake).


In [10]:
# The info dict contains other useful data
print(ub.repr2({k: v for k, v in info.items() if 'out' != k}))


{
    'command': 'cmake --version',
    'err': '',
    'proc': <subprocess.Popen object at 0x7f1b36af80f0>,
    'ret': 0,
}

In [11]:
# Also possible to simultaniously capture and display output in realtime
info = ub.cmd('cmake --version', tee=1)


cmake version 3.11.0-rc2

CMake suite maintained and supported by Kitware (kitware.com/cmake).

In [12]:
# tee=True is equivalent to using verbose=1, but there is also verbose=2
info = ub.cmd('cmake --version', verbose=2)


[ubelt.cmd] joncrall@calculex:~/Dropbox$ cmake --version
cmake version 3.11.0-rc2

CMake suite maintained and supported by Kitware (kitware.com/cmake).

In [13]:
# and verbose=3
info = ub.cmd('cmake --version', verbose=3)


┌─── START CMD ───
[ubelt.cmd] joncrall@calculex:~/Dropbox$ cmake --version
cmake version 3.11.0-rc2

CMake suite maintained and supported by Kitware (kitware.com/cmake).
└─── END CMD ───

Cross-Platform Resource and Cache Directories

If you have an application which writes configuration or cache files, the standard place to dump those files differs depending if you are on Windows, Linux, or Mac. UBelt offers a unified functions for determining what these paths are.

The ub.ensure_app_cache_dir and ub.ensure_app_resource_dir functions find the correct platform-specific location for these files and ensures that the directories exist. (Note: replacing "ensure" with "get" will simply return the path, but not ensure that it exists)

The resource root directory is ~/AppData/Roaming on Windows, ~/.config on Linux and ~/Library/Application Support on Mac. The cache root directory is ~/AppData/Local on Windows, ~/.config on Linux and ~/Library/Caches on Mac.


In [14]:
import ubelt as ub
print(ub.shrinkuser(ub.ensure_app_cache_dir('my_app')))


~/.cache/my_app

Downloading Files

The function ub.download provides a simple interface to download a URL and save its data to a file.

The function ub.grabdata works similarly to ub.download, but whereas ub.download will always re-download the file, ub.grabdata will check if the file exists and only re-download it if it needs to.

New in version 0.4.0: both functions now accepts the hash_prefix keyword argument, which if specified will check that the hash of the file matches the provided value. The hasher keyword argument can be used to change which hashing algorithm is used (it defaults to "sha512").


In [15]:
>>> import ubelt as ub
    >>> url = 'http://i.imgur.com/rqwaDag.png'
    >>> fpath = ub.download(url, verbose=0)
    >>> print(ub.shrinkuser(fpath))


~/.cache/ubelt/rqwaDag.png

In [16]:
>>> import ubelt as ub
    >>> url = 'http://i.imgur.com/rqwaDag.png'
    >>> fpath = ub.grabdata(url, verbose=0, hash_prefix='944389a39')
    >>> print(ub.shrinkuser(fpath))


~/.cache/ubelt/rqwaDag.png

In [17]:
try:
   ub.grabdata(url, verbose=0, hash_prefix='not-the-right-hash')
except Exception as ex:
    print('type(ex) = {!r}'.format(type(ex)))


hash_prefix = 'not-the-right-hash'
got = '944389a39dfb8fa9e3d075bc25416d56782093d5dca88a1f84cac16bf515fa12aeebbbebf91f1e31e8beb59468a7a5f3a69ab12ac1e3c1d1581e1ad9688b766f'
type(ex) = <class 'RuntimeError'>

Dictionary Tools


In [18]:
import ubelt as ub
items    = ['ham',     'jam',   'spam',     'eggs',    'cheese', 'bannana']
groupids = ['protein', 'fruit', 'protein',  'protein', 'dairy',  'fruit']
groups = ub.group_items(items, groupids)
print(ub.repr2(groups, nl=1))


{
    'dairy': ['cheese'],
    'fruit': ['jam', 'bannana'],
    'protein': ['ham', 'spam', 'eggs'],
}

In [19]:
import ubelt as ub
items = [1, 2, 39, 900, 1232, 900, 1232, 2, 2, 2, 900]
ub.dict_hist(items)


Out[19]:
{1: 1, 2: 4, 39: 1, 900: 3, 1232: 2}

In [20]:
import ubelt as ub
items = [0, 0, 1, 2, 3, 3, 0, 12, 2, 9]
ub.find_duplicates(items, k=2)


Out[20]:
{0: [0, 1, 6], 2: [3, 8], 3: [4, 5]}

In [21]:
import ubelt as ub
dict_ = {'K': 3, 'dcvs_clip_max': 0.2, 'p': 0.1}
subdict_ = ub.dict_subset(dict_, ['K', 'dcvs_clip_max'])
print(subdict_)


OrderedDict([('K', 3), ('dcvs_clip_max', 0.2)])

In [22]:
import ubelt as ub
dict_ = {1: 'a', 2: 'b', 3: 'c'}
print(list(ub.dict_take(dict_, [1, 2, 3, 4, 5], default=None)))


['a', 'b', 'c', None, None]

In [23]:
import ubelt as ub
dict_ = {'a': [1, 2, 3], 'b': []}
newdict = ub.map_vals(len, dict_)
print(newdict)


{'a': 3, 'b': 0}

In [24]:
import ubelt as ub
mapping = {0: 'a', 1: 'b', 2: 'c', 3: 'd'}
ub.invert_dict(mapping)


Out[24]:
{'a': 0, 'b': 1, 'c': 2, 'd': 3}

In [25]:
import ubelt as ub
mapping = {'a': 0, 'A': 0, 'b': 1, 'c': 2, 'C': 2, 'd': 3}
ub.invert_dict(mapping, unique_vals=False)


Out[25]:
{0: {'A', 'a'}, 1: {'b'}, 2: {'C', 'c'}, 3: {'d'}}

AutoDict - Autovivification

While the collections.defaultdict is nice, it is sometimes more convenient to have an infinitely nested dictionary of dictionaries.

(But be careful, you may start to write in Perl)


In [26]:
>>> import ubelt as ub
>>> auto = ub.AutoDict()
>>> print('auto = {!r}'.format(auto))
>>> auto[0][10][100] = None
>>> print('auto = {!r}'.format(auto))
>>> auto[0][1] = 'hello'
>>> print('auto = {!r}'.format(auto))


auto = {}
auto = {0: {10: {100: None}}}
auto = {0: {1: 'hello', 10: {100: None}}}

String-based imports

Ubelt contains functions to import modules dynamically without using the python import statement. While importlib exists, the ubelt implementation is simpler to user and does not have the disadvantage of breaking pytest.

Note ubelt simply provides an interface to this functionality, the core implementation is in xdoctest.


In [27]:
>>> import ubelt as ub
>>> module = ub.import_module_from_path(ub.truepath('~/code/ubelt/ubelt'))
>>> print('module = {!r}'.format(module))
>>> module = ub.import_module_from_name('ubelt')
>>> print('module = {!r}'.format(module))

>>> modpath = ub.util_import.__file__
>>> print(ub.modpath_to_modname(modpath))
>>> modname = ub.util_import.__name__
>>> assert ub.truepath(ub.modname_to_modpath(modname)) == modpath


module = <module 'ubelt' from '/home/joncrall/code/ubelt/ubelt/__init__.py'>
module = <module 'ubelt' from '/home/joncrall/code/ubelt/ubelt/__init__.py'>
ubelt.util_import

Horizontal String Concatenation

Sometimes its just prettier to horizontally concatenate two blocks of text.


In [28]:
>>> import ubelt as ub
    >>> B = ub.repr2([[1, 2], [3, 4]], nl=1, cbr=True, trailsep=False)
    >>> C = ub.repr2([[5, 6], [7, 8]], nl=1, cbr=True, trailsep=False)
    >>> print(ub.hzcat(['A = ', B, ' * ', C]))


A = [[1, 2], * [[5, 6],
     [3, 4]]    [7, 8]]