Introduction

For a workshop for non-technical people, I need a dataset that is out of the area of software development. But I also really like the datasets I have so far. Especially the Linux kernel Git history has some nice characteristics.

So in this notebook, I take an existing dataset and transform it to a dataset non-technical people can understand. We take a look at how we can "pseudumnize" names and shift time-based data.

So let's go!

Idea

The idea is to take the simple Git log output and transform it so a fun dataset. What we have are commit timestamps and the authors who committed something into the Linux kernel Git repository:

timestamp,author
2017-12-31 14:47:43,Linus Torvalds
2017-12-31 13:13:56,Linus Torvalds
2017-12-31 13:03:05,Linus Torvalds
2017-12-31 12:30:34,Linus Torvalds
2017-12-31 12:29:02,Linus Torvalds

It's a really nice dataset to get students of computer science into the first steps with Pandas and Co. . For non-technical people, we change the theme of the dataset to something more comprehensible and also more futuristic. This is the new scenario I have in my mind:

It's 2028, smart assistants are integrated into our bodies and can record certain events of the owner. In this dataset, you'll find each the curse

Swear data


In [1]:
from string import ascii_lowercase
from bs4 import BeautifulSoup
import requests

URL = "https://www.noswearing.com/dictionary/"

swear_words = []

for letter in ascii_lowercase:
    html_content = str(requests.get(URL + letter).content)
    soup = BeautifulSoup(html_content, 'lxml')
    for tag in soup.find_all("table")[2].find_all("a"):

        if "name" in tag.attrs:
            swear_words.append(tag['name'])
        
swear_words


Out[1]:
['anus',
 'arse',
 'arsehole',
 'ass',
 'ass-hat',
 'ass-jabber',
 'ass-pirate',
 'assbag',
 'assbandit',
 'assbanger',
 'assbite',
 'assclown',
 'asscock',
 'asscracker',
 'asses',
 'assface',
 'assfuck',
 'assfucker',
 'assgoblin',
 'asshat',
 'asshead',
 'asshole',
 'asshopper',
 'assjacker',
 'asslick',
 'asslicker',
 'assmonkey',
 'assmunch',
 'assmuncher',
 'assnigger',
 'asspirate',
 'assshit',
 'assshole',
 'asssucker',
 'asswad',
 'asswipe',
 'axwound',
 'bampot',
 'bastard',
 'beaner',
 'bitch',
 'bitchass',
 'bitches',
 'bitchtits',
 'bitchy',
 'blow job',
 'blowjob',
 'bollocks',
 'bollox',
 'boner',
 'brotherfucker',
 'bullshit',
 'bumblefuck',
 'butt plug',
 'butt-pirate',
 'buttfucka',
 'buttfucker',
 'camel toe',
 'carpetmuncher',
 'chesticle',
 'chinc',
 'chink',
 'choad',
 'chode',
 'clit',
 'clitface',
 'clitfuck',
 'clusterfuck',
 'cock',
 'cockass',
 'cockbite',
 'cockburger',
 'cockface',
 'cockfucker',
 'cockhead',
 'cockjockey',
 'cockknoker',
 'cockmaster',
 'cockmongler',
 'cockmongruel',
 'cockmonkey',
 'cockmuncher',
 'cocknose',
 'cocknugget',
 'cockshit',
 'cocksmith',
 'cocksmoke',
 'cocksmoker',
 'cocksniffer',
 'cocksucker',
 'cockwaffle',
 'coochie',
 'coochy',
 'coon',
 'cooter',
 'cracker',
 'cum',
 'cumbubble',
 'cumdumpster',
 'cumguzzler',
 'cumjockey',
 'cumslut',
 'cumtart',
 'cunnie',
 'cunnilingus',
 'cunt',
 'cuntass',
 'cuntface',
 'cunthole',
 'cuntlicker',
 'cuntrag',
 'cuntslut',
 'dago',
 'damn',
 'deggo',
 'dick',
 'dick-sneeze',
 'dickbag',
 'dickbeaters',
 'dickface',
 'dickfuck',
 'dickfucker',
 'dickhead',
 'dickhole',
 'dickjuice',
 'dickmilk',
 'dickmonger',
 'dicks',
 'dickslap',
 'dicksucker',
 'dicksucking',
 'dicktickler',
 'dickwad',
 'dickweasel',
 'dickweed',
 'dickwod',
 'dike',
 'dildo',
 'dipshit',
 'doochbag',
 'dookie',
 'douche',
 'douche-fag',
 'douchebag',
 'douchewaffle',
 'dumass',
 'dumb ass',
 'dumbass',
 'dumbfuck',
 'dumbshit',
 'dumshit',
 'dyke',
 'fag',
 'fagbag',
 'fagfucker',
 'faggit',
 'faggot',
 'faggotcock',
 'fagtard',
 'fatass',
 'fellatio',
 'feltch',
 'flamer',
 'fuck',
 'fuckass',
 'fuckbag',
 'fuckboy',
 'fuckbrain',
 'fuckbutt',
 'fuckbutter',
 'fucked',
 'fucker',
 'fuckersucker',
 'fuckface',
 'fuckhead',
 'fuckhole',
 'fuckin',
 'fucking',
 'fucknut',
 'fucknutt',
 'fuckoff',
 'fucks',
 'fuckstick',
 'fucktard',
 'fucktart',
 'fuckup',
 'fuckwad',
 'fuckwit',
 'fuckwitt',
 'fudgepacker',
 'gay',
 'gayass',
 'gaybob',
 'gaydo',
 'gayfuck',
 'gayfuckist',
 'gaylord',
 'gaytard',
 'gaywad',
 'goddamn',
 'goddamnit',
 'gooch',
 'gook',
 'gringo',
 'guido',
 'handjob',
 'hard on',
 'heeb',
 'hell',
 'ho',
 'hoe',
 'homo',
 'homodumbshit',
 'honkey',
 'humping',
 'jackass',
 'jagoff',
 'jap',
 'jerk off',
 'jerkass',
 'jigaboo',
 'jizz',
 'jungle bunny',
 'junglebunny',
 'kike',
 'kooch',
 'kootch',
 'kraut',
 'kunt',
 'kyke',
 'lameass',
 'lardass',
 'lesbian',
 'lesbo',
 'lezzie',
 'mcfagget',
 'mick',
 'minge',
 'mothafucka',
 "mothafuckin\\\\\\'",
 'motherfucker',
 'motherfucking',
 'muff',
 'muffdiver',
 'munging',
 'negro',
 'nigaboo',
 'nigga',
 'nigger',
 'niggers',
 'niglet',
 'nut sack',
 'nutsack',
 'paki',
 'panooch',
 'pecker',
 'peckerhead',
 'penis',
 'penisbanger',
 'penisfucker',
 'penispuffer',
 'piss',
 'pissed',
 'pissed off',
 'pissflaps',
 'polesmoker',
 'pollock',
 'poon',
 'poonani',
 'poonany',
 'poontang',
 'porch monkey',
 'porchmonkey',
 'prick',
 'punanny',
 'punta',
 'pussies',
 'pussy',
 'pussylicking',
 'puto',
 'queef',
 'queer',
 'queerbait',
 'queerhole',
 'renob',
 'rimjob',
 'ruski',
 'sand nigger',
 'sandnigger',
 'schlong',
 'scrote',
 'shit',
 'shitass',
 'shitbag',
 'shitbagger',
 'shitbrains',
 'shitbreath',
 'shitcanned',
 'shitcunt',
 'shitdick',
 'shitface',
 'shitfaced',
 'shithead',
 'shithole',
 'shithouse',
 'shitspitter',
 'shitstain',
 'shitter',
 'shittiest',
 'shitting',
 'shitty',
 'shiz',
 'shiznit',
 'skank',
 'skeet',
 'skullfuck',
 'slut',
 'slutbag',
 'smeg',
 'snatch',
 'spic',
 'spick',
 'splooge',
 'spook',
 'suckass',
 'tard',
 'testicle',
 'thundercunt',
 'tit',
 'titfuck',
 'tits',
 'tittyfuck',
 'twat',
 'twatlips',
 'twats',
 'twatwaffle',
 'unclefucker',
 'va-j-j',
 'vag',
 'vagina',
 'vajayjay',
 'vjayjay',
 'wank',
 'wankjob',
 'wetback',
 'whore',
 'whorebag',
 'whoreface',
 'wop']

In [2]:
import string

swear_word_dfs = []
for letter in string.ascii_lowercase:
    p


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-f3aecf76193f> in <module>()
      3 swear_word_dfs = []
      4 for letter in string.ascii_lowercase:
----> 5     p

NameError: name 'p' is not defined