Title: Finding Your Most Used Python Libraries Slug: most-used-libraries Summary: A Simple Script To Scan Through A Given Directory, Finding Your Most Commonly Used Python Libraries Date: 2018-03-11 13:49 Category: Misc Tags: Misc Authors: Thomas Pinder
This is by no means a polished item, moreover a quick and simple script that scans through my Documents directory, locating any files with the .py extension. For each found file, the script will scan through the file, extracting the libraries loaded into the file and creating a total count for each library. This script could be enhanced by removing duplicates from files as there are cases where the same library is loaded multiple i.e. from sklearn.linear_model import linear_model and from sklearn.metrics import confusion_matrix would be classed as seperate libraries when in fact it is just the sklearn library being loaded. For now this will do though, maybe later down the line I will refine this script if I have time.
In [1]:
import os
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
directory = '/home/tpin3694/Documents/'
python_files = [os.path.join(root, name)
for root, dirs, files in os.walk(directory)
for name in files if name.endswith(('.py'))]
print('Found {} Python files\n'.format(len(python_files)))
libraries = []
error_count = 0
for file in python_files:
try:
file_import = open(file, 'r')
file_data = file_import.readlines()
for line in file_data:
if line.startswith(('import')) or line.startswith(('from')):
libraries.append(line.split()[1])
except UnicodeDecodeError:
error_count += 1
print('{}% files raising encoding errors.\n'.format(round(100*error_count/len(python_files),2)))
library_counts = Counter(libraries)
With the counts stored in a Counter object, lets now quickly print out the top ten libraries and their respective counts.
In [2]:
print('Top 15 Libraries')
for label, count in library_counts.most_common(15):
print('{}: {}'.format(label, count))
Nothing there seems particularly suprising as this is run from my laptop meaning that a lot of the scripts written are statistical or just simple automations and the above libraries are pretty useful for that. I'd be interested to run this same script on my GPU desktop as it is there that I run any neural networks and heavy machine learning files. I suspect PyTorch and TensorFlow may start to appear in the top few items then.
Just to round things off, I will make a quick bar plot to depict the counts of each library.
In [3]:
labels, counts = zip(*library_counts.most_common(15))
plt.figure(figsize=(12, 8))
plt.bar(labels, counts)
plt.xticks(rotation=60);
There we go then, that seems to work quite well. Like I said above, this is quite a rough and ready script and I will consider tweaking it in the future to make it more accurate, however, for now this does just fine.
In [ ]: