In [13]:
import pandas as pd
=> a great variety!
Individual systems == individual problems => individual analyses => individual insights!
Thomas Zimmermann in "One size does not fit all":
But: "... the methods typically are applicable on different datasets." => we see what's possible!
"Statistics on a Mac."
Data Science Venn Diagram (Drew Conway)
=> Delivering credible insights based on facts.
=> Working out insights in a comprehensible way.
In [ ]:
"100" == max. popularity!
Not so far away as you may have thought!
=> from a question over data to insights!
Approach: Computational notebooks
https://www.feststelltaste.de/category/top5/
Courses, videos, blogs, books and more...
**some pages are still under development*
Meta goal: Get to know the basic mechanics of the stack.
We load Git log dataset extracted from a Git repository.
In [ ]:
We explore some basic key elements of the dataset
In [ ]:
1 DataFrame (~ programmable Excel worksheet), 6 Series (= columns), 1128819 rows (= entries)
We convert the text with a time to a real timestamp object.
In [ ]:
We filter out older changes.
In [ ]:
We keep just code written in Java.
In [ ]:
We aggregate the rows by counting the number of changes per file.
In [ ]:
We add additional information about the number of lines of all currently existing files...
In [ ]:
...and join this data with the existing dataset.
In [ ]:
We show only the TOP 10 hotspots in the code.
In [ ]:
We plot the TOP 10 list as XY diagram.
In [ ]:
=> from a question over data to insights!