Notebook 3.

This notebook will be concerned with EDA of our data.

Furthermore, this notebook will remove further duplicates that survived our first filtration technique in Notebook 2.



In [1]:

    
%run helper_functions.py
%run filters.py #contain filtration code used in this notebook.
%run plotly_functions.py #wrapper around plot.ly
from datetime import date
from tabulate import tabulate
from collections import Counter
from IPython.display import Image
import math
import string
%matplotlib inline
plt.rcParams["figure.figsize"] = (15,20)
plt.rcParams["xtick.labelsize"] = 16
plt.rcParams["ytick.labelsize"] = 16
plt.rcParams["axes.labelsize"] = 20
plt.rcParams['legend.fontsize'] = 20
plt.style.use('fivethirtyeight')
pd.set_option('display.max_colwidth', -1)
import plotly.plotly as py
import plotly.graph_objs as go

In this notebook, we will try to handle missing data!



In [2]:

    
df = unpickle_object("no_duplicates_df.pkl")



In [3]:

    
df.shape









    Out[3]:





(611980, 6)



In [4]:

    
df.head()









    Out[4]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
    
  
  
    
      0
      2017-02-22
      FoabMoab
      19:35:43
      RT @bitcoinagile: .1 #bitcoin BTC Straight To Wallet USA Bitcoin Seller, Immediate Pay Out - BitCoin Gatherer
      (immediate, 1, to, #bitcoin, out, straight, usa, seller, wallet, gatherer, pay, btc, bitcoin)
      1 #bitcoin btc straight to wallet usa bitcoin seller immediate pay out  bitcoin gatherer
    
    
      1
      2017-02-22
      Bitcoin_Revo
      19:35:39
      #bitcoin “Is Like Positive Bacteria”: Russian State Bank Vice President
      (like, bank, #bitcoin, vice, state, russian, president, positive, bacteria, is)
      #bitcoin is like positive bacteria russian state bank vice president
    
    
      2
      2017-02-22
      alt_bit_coins
      19:35:31
      Volatile Bitcoin Nears Its All-Time High - Wall Street Journal (subscription) (blog)
      (volatile, wall, street, subscription, blog, its, nears, alltime, journal, high, bitcoin)
      volatile bitcoin nears its alltime high  wall street journal subscription blog
    
    
      3
      2017-02-22
      Rhino3nity
      19:35:26
      Missed out on #bitcoin? This #cryptocurrency created OVER 300 MILLIONAIRES WORLDWIDE within 12... by #davidf1982
      (this, millionaires, over, worldwide, created, within, on, #bitcoin, out, #davidf1982, 300, 12, by, #cryptocurrency, missed)
      missed out on #bitcoin this #cryptocurrency created over 300 millionaires worldwide within 12 by #davidf1982
    
    
      4
      2017-02-22
      Siimple_inc
      19:35:18
      RT @blockchainhelpr: 5 Weak Points Of #Blockchain Technology  #tech #bitcoin
      (points, of, #bitcoin, technology, #tech, 5, weak, #blockchain)
      5 weak points of #blockchain technology #tech #bitcoin



In [5]:

    
percentage_missing(df) # seems that only dates and times are missing for out data! And only 0.2%!









    



0.2% missing from: Column date
0.2% missing from: Column time



In [6]:

    
df[df["date"].isnull()].shape









    Out[6]:





(1286, 6)



In [7]:

    
df[df["time"].isnull()].shape









    Out[7]:





(1286, 6)



In [8]:

    
all(df[df["time"].isnull()].index == df[df["date"].isnull()].index) #perfect match for indicies!









    Out[8]:





True



In [9]:

    
df[df["date"].isnull()].iloc[-1, :]









    Out[9]:





date                   NaN                                       
handle                 SemperArgentum                            
time                   NaN                                       
tweet                  Spooked by spike in cyber extortion,      
tuple_version_tweet    (in, spike, extortion, spooked, cyber, by)
clean_tweet_V1         spooked by spike in cyber extortion       
Name: 1049876, dtype: object

Our last entry in the dataframe has an index of 1049876



In [10]:

    
df[df["date"].isnull()].head()









    Out[10]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
    
  
  
    
      1047747
      NaN
      kointrend
      NaN
      RT @FractalNode: Great News #Dash, #Bitcoin &amp; #Steem blockchains in the Smartcoin Wallet
      (in, news, blockchains, #bitcoin, #dash, amp, the, smartcoin, wallet, #steem, great)
      great news #dash #bitcoin amp #steem blockchains in the smartcoin wallet
    
    
      1047748
      NaN
      FaucetGaming
      NaN
      Multiplayer #Poker launching tomorrow on
      (multiplayer, on, launching, tomorrow, #poker)
      multiplayer #poker launching tomorrow on
    
    
      1047755
      NaN
      FollowOrFadeMe
      NaN
      RT @FollowOrFadeMe: Join me at:
      (me, at, join)
      join me at
    
    
      1047765
      NaN
      bitcoin_experts
      NaN
      luke-jr: &gt; So you propose a soft fork to 500 kB limit?
      (500, propose, soft, fork, kb, limit, so, to, gt, lukejr, a, you)
      lukejr gt so you propose a soft fork to 500 kb limit
    
    
      1047766
      NaN
      mikeytea123
      NaN
      Get 2 ETH back with my referral! #KIBO less than 2 weeks left
      (my, weeks, get, referral, eth, #kibo, 2, less, back, than, with, left)
      get 2 eth back with my referral #kibo less than 2 weeks left

Our first entry in the dataframe has an index of 1047747



In [11]:

    
1049876 - 1047747 # this is a range of 2129. Which is larger than the total dimensions missing.

#so we know that the missing values are not consecutive!









    Out[11]:





2129



In [12]:

    
#these are the handles that have missing dates/times
list(set(df[df["date"].isnull()]['handle']))









    Out[12]:





['Very spectacular info about Bitcoin',
 'RacataVajasanoh',
 'bitcoinsulting',
 'AfrikanNinja',
 'MakeMoney4rhome',
 'Adeniyiseyi15',
 'HiImYash',
 'India_Bitcoin',
 'BryceWeiner',
 'umukoroemmanu14',
 'JimMarous',
 'CryptoTraders',
 'cuttwood00716',
 'OsazeeKelvinH',
 'sthenc',
 'Freedom_Fever75',
 'kingtin1978',
 'Adela_Kuban',
 'jaytrades16',
 'BitcoinSpeaks',
 'ehpeacekeeper',
 'What are you waiting for?',
 'dmitriybtc',
 'Khaled_SD95',
 'nimblues',
 'mkunaini',
 'WhatIsWAVES',
 'LondonTech_',
 'satoshigallery',
 'andyapdirect',
 'Radi_speculant',
 'widytra',
 '.05 bitcoin out of pocket',
 'JL02048628',
 '1971231KIYA',
 'iGent_BigData',
 '#bitcoin',
 'PesamobCom',
 'AUCharles1',
 'ask4waqar',
 '-&gt; USD',
 'igotfame05',
 'BoardDavid',
 'PendoSystemsInc',
 'ooobo1',
 'Olatorich',
 'kingbash247',
 '*Payoneer',
 'amassalot',
 'zikis2323',
 'Helena_Wish',
 'DigitalGokhan',
 'Dennahz',
 'BuzzLotto',
 'BTCParlay',
 'BitWhizzUK',
 'FairlayDev',
 'ItitzexCrypto',
 'FollowOrFadeMe',
 'europecoinEUORG',
 'CryptoPsych',
 'ArkEcosystem',
 'PascalCoin',
 'kristovatlas',
 'CarpeNoctom',
 'iAm_elkay',
 'dplsurve',
 'kelvinlyttle',
 '100% organic',
 'ReddBazaar',
 'dun3buggi3',
 'freespinsgratis',
 'DonateBitcoin',
 'btcmrkts',
 '1. online drug dealers',
 'darkone1',
 'bit_host',
 'workwithlee',
 'grbroker',
 'davin5515',
 'FaucetGaming',
 'NickolayV19',
 '• 19th century—Railways',
 'FNovelsO',
 'PauloBtc',
 'webhat',
 'int_reserve',
 'EKrediet',
 'miningfield',
 'DanielPox',
 'DavidKa3002',
 'MistahGeorge',
 'CivicCentreMan',
 'redpillmen',
 '1. cash it out',
 'anthony9843',
 '86Someone86',
 'CoinCards',
 'jagira',
 'jacker_27',
 'jhaand',
 'MarleyBrooklyn1',
 'SQLTrooper',
 'maxjcm',
 'decodedlegal',
 'DigitalMintIO',
 'theolynn',
 'CaroleBakerTV',
 'stasuzu',
 'audio_pervert',
 '_portaldoaxe_',
 'Journal_Seminar',
 'lenteracoin',
 'valiktrade',
 'BTCoinSwap',
 'TraceMayer',
 'BlackMambaBitco',
 'Cryppel',
 'dariodipardo',
 '#BITCOIN',
 'ProjectInvesd',
 'nigelmarkdias',
 'vrtxlabs',
 'bognetakau',
 'glued2thescreen',
 'yriysh',
 'OssbergerMarkus',
 'a4ndr3wzone',
 'noshortre',
 'jacaf01jacaf',
 'n_growth',
 'MilanPrucha',
 'LuckyGi20401711',
 '5ub_zer0',
 'bigbluecup721',
 'bitcoin_ch',
 'PoyetBenjamin',
 '_BitcoinNews_',
 'triumphalArthur',
 'Zealot99',
 'kokowtp',
 'dbblackdiamond',
 'marcoporretto',
 '\\(•_•)',
 'FifalpayPro',
 'sweefm',
 'Realisation',
 'redmail99',
 'Godzealous',
 'Rembrandt_972',
 'DigitalNote_XDN',
 'eZwallStTrades',
 'DaRrYl2o0o',
 'BiostarUSA',
 'Multi-Sig Wallet',
 'roadcrypto',
 'see_u_REKT',
 'merlyne93',
 'NWFleissig',
 'armiols',
 'i_instances',
 'gustin_john',
 'MABitcoin',
 'clever_together',
 'tajitj1',
 'stosty',
 'tiupaeng',
 'yassiner2',
 'globalrashid007',
 'manipuflation',
 'FinTechProffitt',
 'DanPejeroni',
 'bitcoin_miner_',
 'cinnamonshelper',
 'DevopsInfo1',
 'makemoney_boi',
 'wasmandj',
 'Teeatrends',
 'LeanLione',
 'bitcoin_kid16',
 'MatrixDirectory',
 'BestPtcPaying',
 'uscointech',
 'Dollar is crashing',
 'BlockChannel',
 'PotomacWill',
 'jennyblacky08',
 'wbe1947',
 'pierre_rochard',
 'DogeofBlessing',
 'Maniac_Bitcoin',
 'equibit',
 'enjoyurdays',
 'D_Zimnoch',
 'Eng_HishamAdel',
 'ymubarak79',
 'Cik_Su',
 'Bitcoin_StarBoy',
 'ZET_mining',
 'jspontes',
 'DoctorSekzi',
 'iam_hidris',
 'BeFrOGnet',
 'ryro_peterson',
 'GPL_Life',
 'globaliqx',
 'mskellytolar',
 'mz_antii',
 'bitstein',
 'Bitcoin_2018',
 'cltucci',
 'AderojuAdelere',
 'mccinnovlab',
 'ana3ig',
 'G3NTbl4ck',
 'onlygodlyson',
 'goldblockscoin',
 'postkneejerk',
 'LaptopPrice24',
 'zebpay',
 'Is to follow',
 'KirkDBorne',
 'vladbitchat',
 'JoshCrumb',
 'MinersBitcoin',
 'sandyme07',
 'CRYPTOBANGer',
 'digiimaxx',
 'Logic_Joker',
 'alison82182',
 'sanctumware',
 'kvandenbrande',
 'quotedian',
 'Swiss Goodness',
 'Betternotrest',
 'randybrito',
 'thatssixzs',
 'musicjem666',
 'plutonash06',
 'Coindata_IO',
 'Dashpay',
 'angelinasarahh',
 'GEOPS22',
 'tachyeonz',
 'BlueST955',
 'M_RappaCEO',
 'BitMedia1',
 'anonmoxygirl',
 '_BCostello',
 'No Sign Up',
 'TechNowOrNever',
 'THE MOST INNOVATIVE AND LUCRATIVE WAY TO EARN BITCOIN',
 'Shieldzzzy84',
 'statspotting',
 'MultiMediaDave',
 'YOC_Crypto',
 'gamersclub0',
 'BlockChainBiz',
 'MGSoundproofing',
 'AltcoinTraders',
 'crypto_lawyers',
 'kingofpayments',
 'ryanstr',
 'smarterway',
 'WundaTraida',
 'BitcoinWanderer',
 '40x market cap growth(2016)',
 'MythicalLucy',
 'DaleGo800',
 'alt_bit_coins',
 'MasterChangz',
 'ottikas2',
 'Crypto_Lord',
 'winnersID',
 'itknowingness',
 'BSAUNIVERSITY',
 'crypto_gangsta',
 'Newusernametime',
 'Altantis12',
 'BITCOIN_Forex',
 'igorklajo',
 'mygulfcoin',
 'SuccessWithLou',
 'OKIE_Actual',
 'anaamh12',
 'sophiaahmed',
 'julia_zekorn',
 'Alexandiver',
 'GG33___',
 'jouslaw',
 'Krippled_Hick',
 'hiroshimaz',
 'GppeVassallo',
 'TheShahzada',
 'aawitan',
 'rboro28',
 'BryanMicon',
 'healthfitnes121',
 'eViLPlunfy',
 'charlitojr',
 'HuttonIF',
 'jm111t',
 'angela_walch',
 'yogesh93112',
 'Target #Bitcoin price if $COIN',
 'cazushi0321',
 'santilestary11',
 'assunitel',
 'ronniegolpark',
 'HQFalwasser',
 'Flo020890',
 'onlinepuspadas',
 'honeyanghel_05',
 'Roh_Sakra',
 'Mansa_Godson',
 'bitcointreff',
 'HilariousTrends',
 'MyFriendonCam',
 '#CreditCard',
 'Uber $68 billion',
 '1Sandella',
 'rusmana62',
 'romerome94',
 'rblk247',
 'btsfav',
 'op_return_ack',
 'badass_network',
 'RashadAlwahaj',
 'get_help_W',
 'Lucky_Bitcoin',
 'finly_india',
 'TheSiliconWave',
 '@Fox_Theatre',
 'gonzoucab',
 'sportmeister',
 'LinosVersion',
 'RicardodeHaan85',
 'btcquantum',
 'tokyobitcoiner',
 'thickopedia',
 'bitcoin1000',
 'satoshibitcoins',
 'otubovictor',
 'success5353',
 'julia_vaingurt',
 'weickmann',
 '_anthony_seger_',
 'd_seaman',
 'deskoffuatbalta',
 'StartUpRealTime',
 '8chpol',
 'karlmoneyinpjs',
 'Investments_bit',
 'lucasmzanella',
 '🔘   Grab-n-Go',
 'thelordoftruth',
 'Coincanna',
 'LONG!!!',
 'web24services',
 'MaximillionRuns',
 '2be_here',
 'TreesCalifornia',
 'freetree64',
 'Much Buuuuhhhhh.',
 'ildike49',
 'gubatron',
 'SeamusConwell',
 'AutoviewExt',
 'PMPSLLC',
 'Fodaroalerts2',
 'Follow_Bitcoin',
 'bitcoin_experts',
 'jaylordcuevas',
 'yugosobolev',
 'FionaAneltaut_',
 'viajeroloco13',
 'crypto_princess',
 'JonasSunshine',
 'eResearch_UCL',
 'xfinstocks2016',
 'dannylmasters',
 'ReviewsAcross',
 'realsheepwolf',
 'distresser',
 'mohitr1810',
 'Ehsanbtc',
 'dontakemycrack',
 'fanfare100',
 '#BitcoinLoan #BitcoinLoans',
 'D3pak',
 'DISRUPT_BANKS',
 'coin_joker',
 'killer_reaper23',
 'Holayincah',
 'buyucoin',
 'SemperArgentum',
 'SteveBTC',
 'Nairaongold',
 'EffizientLink',
 'SantaCracker',
 'Bitcoin $1,054.62',
 'Mutantoide',
 'mayurpa03598706',
 'elivue',
 'iivanov482',
 'coolindark',
 'iDeyPay_web',
 'Sellbitbuy',
 'BrandonKellyND',
 'OnTheBlockChain',
 'knowbtc7',
 'dffss8',
 'devnullius',
 'CloudExpo',
 '➊ Deepavali 2016',
 'OWASPDublin',
 'CNBCi',
 'starteasyway',
 'PayPal',
 'fay_yeen',
 'blockspotio',
 'JulianAgunabor',
 'josetron',
 'CryptoCryptics',
 '_3ch0_',
 '_vjy',
 'CryptoPH',
 'Crypto_God',
 'markmedia',
 'MirakhorHassan',
 'dilaraaltundag',
 'Bitbisiness',
 'AnnguOrngu',
 'Afrotechtv',
 'At_Atiya',
 'RNR_0',
 'CryptoGermanBro',
 'ricardo88rick',
 'Techemist',
 'neoinvest282',
 '10% cashback',
 '201210060',
 'Click2Amit',
 'SpirosMargaris',
 'financialMAVN',
 'MattCRZ22',
 '_bpriy',
 'mikeytea123',
 'TheUltraAliens',
 'JohnRiversX4',
 'mr__austin',
 'rosafutur',
 'AI',
 'canboybs',
 'NelsonMRosario',
 'itraor',
 'Treocian',
 'jawadms',
 'tchinyamakobvu',
 'Omero1972',
 'Big_Bitcoin',
 'Hermetec',
 'erickquiap',
 'VinnieVagabond',
 'Cryptogeek_BTC',
 'HYIP_Dr',
 'MsCharm143',
 'JuanCMinero',
 'jozeran',
 '999BTC',
 'Docbuh',
 'EdmundCMoy',
 'DarioTepoTec',
 '84Bitcoins',
 'AchwaqKhalid',
 'BetChain',
 'CryptoGuides',
 'michaelalex11',
 'landofcrypto',
 'SlonBTM',
 'bhumharit',
 'SylviaJanotta',
 'ArsAuxilia',
 'paditetteh',
 'AorakiPro',
 'Ravi_Bansal5',
 'CD28UK',
 'coolstuffcheap',
 '- sexy spread',
 '2iqresearch',
 'TheRockTrading',
 'RhinoAK',
 'TopBonPlanDuNet',
 '1cyrilblanc1',
 'DLCMyPocketFull',
 'AmarunGSR',
 'AtlanticCent',
 'vio1965',
 'techguard1',
 'HenriMorpatiza',
 '012Bit',
 'James_Laminger',
 'bit_coin_invest',
 'SwCPoker',
 'JRell274',
 'Sire_Charsty',
 'Bitfinex: $1006.6',
 'kyletorpey',
 'basonggogok',
 '🤑💬 #CrytoCurrency rules!',
 'bitcoingeek000',
 'Shopawl_com',
 'Rational_Urge',
 'news_btc',
 'HodgeShopArt',
 'HealthyFood87',
 'iAmMojari',
 'epakravan',
 'aaron_koenig',
 'mBTCPizpie',
 'CashQueen2013',
 'AltexSoft',
 'AdekanyeOpe',
 'programmableweb',
 '@realDonaldTrump',
 'iPraveenKumar_',
 'Peso: -13%',
 'olajide_tv',
 'theonevortex',
 'coollikeme',
 'KPesaBit',
 'vince_weyn',
 'MoonboundLTC',
 'skope1',
 'xavvypls',
 'vidal007',
 'Coinkite',
 'crazytipster_',
 'InstituteCrypto',
 'a) get a job working for it',
 'Bitcoin_Rush',
 'Viremia',
 'GhanmiChokri',
 'Wally_WST',
 '400',
 'DTongSports',
 'bitnewslatest',
 'vakeraj',
 'SmalValera',
 'BareMetalGame',
 '•• bitcoin',
 'localizecrypto',
 'cryptopump_pimp',
 'MadBitcoins',
 'Marijuana_Coin',
 'CollinCrypto',
 'RoniFinTech',
 'koqoo',
 'ThePriceOfBTC',
 'Mrajesin',
 'BitcoinGamble3',
 'ModernApostate',
 'Creator',
 'intowin2017',
 'MindCrypto',
 'Xentagz',
 'JoseGuequi',
 'Bitcoin',
 'bcb_atm',
 'tjpezlo',
 'bitcoinrainnet',
 'hikaru2525ajt',
 'rogersm',
 'hackfest_ca',
 'Ad3lin312',
 'iYuta_channel',
 'Gainbitcoin07_B',
 'LawnmowerIO',
 'impraise',
 'yayoinakano1',
 'UseBtCoin',
 'BitcoinBotTrade',
 'insidestat',
 'cryptohorse',
 'haibitcoincom',
 'kamleshk28',
 'Everybody need wallet .',
 'Metaballo',
 'Free 0.10 #Bitcoin Giveaway Below!',
 'MeerWindEnergy',
 'BitcoinTeo',
 'bdncsgo',
 'mi071630',
 'Rh_Tomal',
 'Stephen1424',
 '1 Signup',
 'csovil',
 'hrobeers',
 'CymKal',
 'vmalevris',
 'TuurDemeester',
 'bitfortip',
 '54pto',
 'LibertyBlitz',
 'vijayrathoda9',
 '🔛Bitcoin',
 'ntppf_',
 'darren006',
 'kotwica666',
 'TheForexIcon',
 'padbor',
 'bitsquare_',
 'ursualboaie',
 '_20BiZ',
 'BTC_Planet',
 'friedrichdundee',
 'hq83bnn9',
 '- 60% of all gun crime',
 'Josh_hoy20',
 'Florentino_Schm',
 'Bitcoin_Genius',
 'agnelmoses',
 'CryptoMoar',
 '_jonasschnelli_',
 'mekysmooth',
 'Amolain',
 'CommoditiesAcad',
 'don_teague',
 'Selim_Kavcik',
 'btc_dance',
 'GaltsGirl',
 'moneytrigz',
 'kwill_tuka',
 'misbahurbd',
 'btcvideospro',
 'yD2pzehRGC7coyZ',
 'guzzler_3',
 'EINS78',
 'zatocent',
 'AudunGulbrands1',
 'AKenyanVoice',
 'Rock_bitcoin',
 'Alhilal_sky',
 'stefioan',
 'PocketMinerAndr',
 'micah_gideon',
 'imNdash',
 'onitaizi3',
 'IngridMKollmann',
 'raphaelshalaby',
 'BitcoinsUP',
 'ThunkinOutloud',
 'BeerdHead',
 'niftygoblin',
 'AdminHOST',
 'Bitcoin_Fellow',
 'el33th4xor',
 'iremsokullu',
 'Zero_Cult',
 'A3coin',
 '100kaprocentovm',
 'maqbool_sm',
 'Itsjoeco',
 'massatofo',
 'BoulevardLP',
 'bishopmarn',
 'sovietskull',
 'freecryptomoney',
 'iamdwinn_',
 'keshab_baruah',
 'vernonstallins',
 'zupdawg_btc',
 'leojavi2',
 'nonameagent001',
 'QcanApp',
 'EarningsR',
 'sulisituaku',
 'RegTech_FinTech',
 'MismeMC',
 'swaggzconcept',
 'TwinklePortal',
 'MariusIluca',
 'kointrend',
 'BitForCoin',
 'DigitalAyrshire',
 'cryptopicasso',
 'neylorZ',
 'AegisFS',
 'Vaultoro',
 '@makadem',
 'ezyremit_IN',
 'mynameisajudin',
 ':)',
 'Themmytaryoh',
 'ICONICTOKEN',
 'Hicks_Associate',
 'laasch',
 'PayPal, Credit/Debit, Bitcoin',
 'pesobit',
 'stacystock',
 'davidogdenseo',
 'smile7031_i',
 'makemoremoney1_',
 'BitcoinOM',
 'cryptoSqueeze',
 'LYMSAMOMOBA',
 'Rasheed_Moore',
 'RlCKYJONES',
 'PrivateBrief',
 'macksonwakama',
 'grandsfx',
 'Win UNLIMITED #Satoshi ♡',
 'valdeano',
 'naveed7naik',
 'leklenche',
 'Item of clothing',
 'CryptoCompare',
 'Vindyne8',
 'boy_bitcoin',
 'timpastoor',
 'czarofrebels',
 'IarikVeliciko',
 'iGent_crypto',
 'willyskye',
 'LECAUSTRALIA',
 'Invest about N2000 (0.002btc) &amp; earn16btc',
 'syedimran01',
 'prince_nrb',
 'My ID  ... DA84306',
 'Chris_Haefner',
 'moolanghat',
 'milos_zikic',
 'Bitcoin-unlimited',
 'Piotr14Tra',
 'DigitalProcess',
 'UmayangaLK',
 'bitcoin4u2',
 'grantrobertb',
 'EMC2Whale',
 'frederikro',
 'aalbertjac',
 'bit_bitcoin',
 'jonest_l',
 'menelaou',
 'JosefValkoun',
 'scriptingfuture',
 'CloudRank',
 'silveretblack',
 'myprtship',
 'BardirDwarf',
 'Gana_Bitcoins',
 'coinsz23',
 'howestreet',
 'maurilioberetta',
 'Matbythesea',
 'AlexSurrett',
 'sparky1213mc',
 'ezCoinAccess',
 'dbb2015',
 '_GetFollower_',
 'Altcoindreams',
 'dash_vibes',
 'pterion2910',
 'ehsminer',
 'ThingsExpo',
 'censorednewsnow',
 'technoapt',
 'DavidVicary',
 'Var_retweets',
 'alexsunnarborg',
 'IljaRadomir',
 'irvpaswan',
 'bitrevolver_net',
 'Bitcoin_kr',
 '* Store data efficiently',
 'Lingo101',
 'amystocks21334',
 'CryptoOz',
 'OrangeCaramelLZ',
 'Diseases desperate grown',
 'Bakom_skynket',
 'CoexistCoin',
 'OttawaStare',
 'hariyono_id',
 'JerryMooney',
 'SkipTerrio',
 'BigBrainArmy',
 'Gary_Price',
 '1. d/l a wallet app',
 'constancewillli',
 'desantis',
 'CrowdfundB0t',
 'Kuwaddo',
 'Blame China',
 'hiroshi_shimo',
 'arthur952010',
 'cryptowanderer',
 'ahier',
 'praisefmnevis',
 'bobbyclee',
 'ajicute1976',
 'OttLegalRebels',
 'Melt_Dem',
 'goldkey0070',
 'wendellbulbaai',
 'DonTravlos',
 'GeekOnTheLoose',
 'don_ejeh',
 'ducatirocks0369',
 'ArtByteMe',
 'jyotimish501',
 '_digitalfutures',
 'OpenECommunity',
 'promotebitcoin']

Having sampled a large amount of the handles above, I found that most accounts were primarily bots or the accounts were suspended, meaning that they were formerly bots. As such, I will just drop all the rows that have missing data for date and time.



In [13]:

    
df.dropna(inplace=True)



In [14]:

    
df.shape









    Out[14]:





(610694, 6)



In [15]:

    
1286 + 610694 # looks like we dropped the correct rows!









    Out[15]:





611980

We now have rows that have a date of 1970! They most have nonsensical tweets! We will drop these too!



In [16]:

    
df[df['date'] == date(1970,1,1)] #clearly bad rows of data!









    Out[16]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
    
  
  
    
      1047740
      1970-01-01
      BitcoinPlusOrg
      00:00:00.136185
      Network hashrate. (hash/s)
      (hashrate, network, hashs)
      network hashrate hashs
    
    
      1047741
      1970-01-01
      DOGE/RUB
      00:00:00.013534
      0.013619
      (0013619,)
      0013619
    
    
      1047742
      1970-01-01
      DOGE/RUB
      00:00:00.013331
      0.0134
      (00134,)
      00134
    
    
      1047743
      1970-01-01
      DOGE/RUB
      00:00:00.012855
      0.01299899
      (001299899,)
      001299899
    
    
      1047744
      1970-01-01
      LTC/BTC
      00:00:00.004666
      0.00473865
      (000473865,)
      000473865
    
    
      1047745
      1970-01-01
      LTC/BTC
      00:00:00.004638
      0.00474998
      (000474998,)
      000474998
    
    
      1047746
      1970-01-01
      LTC/BTC
      00:00:00.004254
      0.00435327
      (000435327,)
      000435327



In [17]:

    
to_drop = df[df['date'] == date(1970,1,1)].index



In [18]:

    
to_drop









    Out[18]:





Int64Index([1047740, 1047741, 1047742, 1047743, 1047744, 1047745, 1047746], dtype='int64')



In [19]:

    
df.loc[to_drop, :]









    Out[19]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
    
  
  
    
      1047740
      1970-01-01
      BitcoinPlusOrg
      00:00:00.136185
      Network hashrate. (hash/s)
      (hashrate, network, hashs)
      network hashrate hashs
    
    
      1047741
      1970-01-01
      DOGE/RUB
      00:00:00.013534
      0.013619
      (0013619,)
      0013619
    
    
      1047742
      1970-01-01
      DOGE/RUB
      00:00:00.013331
      0.0134
      (00134,)
      00134
    
    
      1047743
      1970-01-01
      DOGE/RUB
      00:00:00.012855
      0.01299899
      (001299899,)
      001299899
    
    
      1047744
      1970-01-01
      LTC/BTC
      00:00:00.004666
      0.00473865
      (000473865,)
      000473865
    
    
      1047745
      1970-01-01
      LTC/BTC
      00:00:00.004638
      0.00474998
      (000474998,)
      000474998
    
    
      1047746
      1970-01-01
      LTC/BTC
      00:00:00.004254
      0.00435327
      (000435327,)
      000435327



In [20]:

    
df.drop(to_drop, inplace=True)



In [21]:

    
df.shape









    Out[21]:





(610687, 6)



In [22]:

    
610694 - 7 # we dropped the right amount!









    Out[22]:





610687



In [23]:

    
df.head()









    Out[23]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
    
  
  
    
      0
      2017-02-22
      FoabMoab
      19:35:43
      RT @bitcoinagile: .1 #bitcoin BTC Straight To Wallet USA Bitcoin Seller, Immediate Pay Out - BitCoin Gatherer
      (immediate, 1, to, #bitcoin, out, straight, usa, seller, wallet, gatherer, pay, btc, bitcoin)
      1 #bitcoin btc straight to wallet usa bitcoin seller immediate pay out  bitcoin gatherer
    
    
      1
      2017-02-22
      Bitcoin_Revo
      19:35:39
      #bitcoin “Is Like Positive Bacteria”: Russian State Bank Vice President
      (like, bank, #bitcoin, vice, state, russian, president, positive, bacteria, is)
      #bitcoin is like positive bacteria russian state bank vice president
    
    
      2
      2017-02-22
      alt_bit_coins
      19:35:31
      Volatile Bitcoin Nears Its All-Time High - Wall Street Journal (subscription) (blog)
      (volatile, wall, street, subscription, blog, its, nears, alltime, journal, high, bitcoin)
      volatile bitcoin nears its alltime high  wall street journal subscription blog
    
    
      3
      2017-02-22
      Rhino3nity
      19:35:26
      Missed out on #bitcoin? This #cryptocurrency created OVER 300 MILLIONAIRES WORLDWIDE within 12... by #davidf1982
      (this, millionaires, over, worldwide, created, within, on, #bitcoin, out, #davidf1982, 300, 12, by, #cryptocurrency, missed)
      missed out on #bitcoin this #cryptocurrency created over 300 millionaires worldwide within 12 by #davidf1982
    
    
      4
      2017-02-22
      Siimple_inc
      19:35:18
      RT @blockchainhelpr: 5 Weak Points Of #Blockchain Technology  #tech #bitcoin
      (points, of, #bitcoin, technology, #tech, 5, weak, #blockchain)
      5 weak points of #blockchain technology #tech #bitcoin

Let's now clean up our tweets further! This will ensure we dont have garbage hashtags or nonsensical words. This will be important for the lemmatization process later on!

In order to ensure that we have fully removed, duplciates, I will again drop duplicates based on clean_tweet_V2 column.



In [24]:

    
clean_df = filtration_1(df,"clean_tweet_V1", "clean_tweet_V2")



In [25]:

    
clean_df.head()









    Out[25]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
      clean_tweet_V2
    
  
  
    
      0
      2017-02-22
      FoabMoab
      19:35:43
      RT @bitcoinagile: .1 #bitcoin BTC Straight To Wallet USA Bitcoin Seller, Immediate Pay Out - BitCoin Gatherer
      (immediate, 1, to, #bitcoin, out, straight, usa, seller, wallet, gatherer, pay, btc, bitcoin)
      1 #bitcoin btc straight to wallet usa bitcoin seller immediate pay out  bitcoin gatherer
      #bitcoin btc straight wallet usa bitcoin seller immediate pay bitcoin gatherer
    
    
      1
      2017-02-22
      Bitcoin_Revo
      19:35:39
      #bitcoin “Is Like Positive Bacteria”: Russian State Bank Vice President
      (like, bank, #bitcoin, vice, state, russian, president, positive, bacteria, is)
      #bitcoin is like positive bacteria russian state bank vice president
      #bitcoin like positive bacteria russian state bank vice president
    
    
      2
      2017-02-22
      alt_bit_coins
      19:35:31
      Volatile Bitcoin Nears Its All-Time High - Wall Street Journal (subscription) (blog)
      (volatile, wall, street, subscription, blog, its, nears, alltime, journal, high, bitcoin)
      volatile bitcoin nears its alltime high  wall street journal subscription blog
      volatile bitcoin nears alltime high wall street journal subscription blog
    
    
      3
      2017-02-22
      Rhino3nity
      19:35:26
      Missed out on #bitcoin? This #cryptocurrency created OVER 300 MILLIONAIRES WORLDWIDE within 12... by #davidf1982
      (this, millionaires, over, worldwide, created, within, on, #bitcoin, out, #davidf1982, 300, 12, by, #cryptocurrency, missed)
      missed out on #bitcoin this #cryptocurrency created over 300 millionaires worldwide within 12 by #davidf1982
      missed #bitcoin #cryptocurrency created millionaires worldwide within #davidf1982
    
    
      4
      2017-02-22
      Siimple_inc
      19:35:18
      RT @blockchainhelpr: 5 Weak Points Of #Blockchain Technology  #tech #bitcoin
      (points, of, #bitcoin, technology, #tech, 5, weak, #blockchain)
      5 weak points of #blockchain technology #tech #bitcoin
      weak points #blockchain technology #tech #bitcoin



In [26]:

    
clean_df = filtration_2(clean_df, "clean_tweet_V2")



In [27]:

    
clean_df.head()









    Out[27]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
      clean_tweet_V2
    
  
  
    
      0
      2017-02-22
      FoabMoab
      19:35:43
      RT @bitcoinagile: .1 #bitcoin BTC Straight To Wallet USA Bitcoin Seller, Immediate Pay Out - BitCoin Gatherer
      (immediate, 1, to, #bitcoin, out, straight, usa, seller, wallet, gatherer, pay, btc, bitcoin)
      1 #bitcoin btc straight to wallet usa bitcoin seller immediate pay out  bitcoin gatherer
      #bitcoin btc straight wallet usa bitcoin seller immediate pay bitcoin gatherer
    
    
      1
      2017-02-22
      Bitcoin_Revo
      19:35:39
      #bitcoin “Is Like Positive Bacteria”: Russian State Bank Vice President
      (like, bank, #bitcoin, vice, state, russian, president, positive, bacteria, is)
      #bitcoin is like positive bacteria russian state bank vice president
      #bitcoin like positive bacteria russian state bank vice president
    
    
      2
      2017-02-22
      alt_bit_coins
      19:35:31
      Volatile Bitcoin Nears Its All-Time High - Wall Street Journal (subscription) (blog)
      (volatile, wall, street, subscription, blog, its, nears, alltime, journal, high, bitcoin)
      volatile bitcoin nears its alltime high  wall street journal subscription blog
      volatile bitcoin nears all time high wall street journal subscription blog
    
    
      3
      2017-02-22
      Rhino3nity
      19:35:26
      Missed out on #bitcoin? This #cryptocurrency created OVER 300 MILLIONAIRES WORLDWIDE within 12... by #davidf1982
      (this, millionaires, over, worldwide, created, within, on, #bitcoin, out, #davidf1982, 300, 12, by, #cryptocurrency, missed)
      missed out on #bitcoin this #cryptocurrency created over 300 millionaires worldwide within 12 by #davidf1982
      missed #bitcoin #cryptocurrency created millionaires worldwide withinavidf1982
    
    
      4
      2017-02-22
      Siimple_inc
      19:35:18
      RT @blockchainhelpr: 5 Weak Points Of #Blockchain Technology  #tech #bitcoin
      (points, of, #bitcoin, technology, #tech, 5, weak, #blockchain)
      5 weak points of #blockchain technology #tech #bitcoin
      weak points #blockchain technology #tech #bitcoin



In [28]:

    
clean_df.shape









    Out[28]:





(610687, 7)



In [29]:

    
clean_df.drop_duplicates(subset="clean_tweet_V2", inplace=True)



In [30]:

    
clean_df.shape #lost around 80K rows! so many duplicates!









    Out[30]:





(531257, 7)

It seems that our handles column does not strictly contain the user names of twitter handles. Rather, they contain tweets! it is likely that this is the result of bots. As such, we will remove these entries from our dataset!

Also, if the handle contains the word bot, we will also remove it!



In [31]:

    
clean_df.sort_values(by="handle").head(50) #as we can see from the sample, its all nonsense tweets









    Out[31]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
      clean_tweet_V2
    
  
  
    
      500708
      2016-12-28
      "BTC qualifies as money, a federal judge ruled"
      22:22:05
      h…
      (h…,)
      h…
      h
    
    
      491015
      2016-12-30
      "Destinia decides to operate exclusively in Bitcoins in Venezuela"
      00:27:16
      v…
      (v…,)
      v…
      v
    
    
      566239
      2016-12-21
      "I like big blocks &amp; I cannot lie"
      07:05:57
      previewing at a #blockchain conference near you
      (#blockchain, near, conference, at, previewing, a, you)
      previewing at a #blockchain conference near you
      previewing #blockchain conference near
    
    
      575012
      2016-12-20
      "I want my house to be seen from space!": FESTIVE!
      07:39:12
      (ego)…
      (ego…,)
      ego…
      ego
    
    
      1006691
      2016-10-31
      #ARK $ARK #altcoin #Bitcoin #fintech #crowdfund #ICO
      06:30:22
      #block…
      (#block…,)
      #block…
      #block
    
    
      572863
      2016-12-20
      #ATM #Kiosks #Debitcard #Digital$$$ #Bitcoin ready
      13:37:54
      #Unbanked #payroll #ID Cards…
      (#unbanked, #id, cards…, #payroll)
      #unbanked #payroll #id cards…
      #unbanked #payroll #id cards
    
    
      650214
      2016-12-11
      #ATM-#Debit-#Digital-#Kiosks
      18:57:39
      #BSA✅ #BIN-#Bank✅…
      (#bin#bank✅…, #bsa✅)
      #bsa✅ #bin#bank✅…
      #bsa✅ #bin#bank✅
    
    
      714070
      2016-12-04
      #BTC #bitcoin #crypto #bitcoins #marijuana #cannabis
      01:40:46
      #ganja #weed #hem…
      (#hem…, #ganja, #weed)
      #ganja #weed #hem…
      #ganja #weed #hem
    
    
      716578
      2016-12-03
      #BTC #bitcoin #crypto #bitcoins #marijuana #cannabis
      18:10:33
      #ganja #weed #hemp…
      (#ganja, #hemp…, #weed)
      #ganja #weed #hemp…
      #ganja #weed #hemp
    
    
      900752
      2016-11-12
      #BTC #cannabis #bitcoin #crypto #bitcoins #marijuana
      15:50:06
      #ganja #weed #hemp #domain #domains
      (#domains, #weed, #domain, #ganja, #hemp)
      #ganja #weed #hemp #domain #domains
      #ganja #weed #hempomainomains
    
    
      877212
      2016-11-15
      #BULLISH
      08:16:12
      #UNDERVALUED
      (#undervalued,)
      #undervalued
      #undervalued
    
    
      352528
      2017-01-14
      #BitCoin
      15:11:17
      #TheBillionCoin…
      (#thebillioncoin…,)
      #thebillioncoin…
      #thebillioncoin
    
    
      327975
      2017-01-17
      #BitCoin (feat. #drZIUBER) - EP by #drHAJS
      11:30:59
      →…
      (→…,)
      →…
      →
    
    
      745519
      2016-11-30
      #BitID &amp; maybe #YoursLogin? @ryanxcharles @YoursNetwork
      12:05:22
      (demo)
      (demo,)
      demo
      demo
    
    
      16791
      2017-02-21
      #Bitcoin
      00:03:59
      #Fint…
      (#fint…,)
      #fint…
      #fint
    
    
      372341
      2017-01-12
      #Bitcoin
      06:38:46
      #Crypto
      (#crypto,)
      #crypto
      #crypto
    
    
      248146
      2017-01-26
      #Bitcoin
      03:03:14
      #ライトライズ…
      (#ライトライズ…,)
      #ライトライズ…
      #ライトライズ
    
    
      298184
      2017-01-20
      #Bitcoin
      17:15:51
      $BTC $XBT
      (xbt, btc)
      btc xbt
      btc xbt
    
    
      536744
      2016-12-24
      #Bitcoin
      16:29:20
      #Gold
      (#gold,)
      #gold
      #gold
    
    
      300858
      2017-01-20
      #Bitcoin
      10:44:24
      #Golem
      (#golem,)
      #golem
      #golem
    
    
      623240
      2016-12-14
      #Bitcoin
      21:55:00
      #Privacy
      (#privacy,)
      #privacy
      #privacy
    
    
      648886
      2016-12-11
      #Bitcoin
      22:48:33
      #coins
      (#coins,)
      #coins
      #coins
    
    
      531577
      2016-12-25
      #Bitcoin #BTC
      07:59:01
      #Make_More_Bitcoins
      (#makemorebitcoins,)
      #makemorebitcoins
      #makemorebitcoins
    
    
      111424
      2017-02-09
      #Bitcoin #Blockchain #FinTech
      19:37:31
      $BTCCNY $BTC…
      (btccny, btc…)
      btccny btc…
      btccny btc
    
    
      6205
      2017-02-22
      #Bitcoin #Blockchain #FinTech
      04:10:25
      $BTC $BTCCNY @YourBTCC
      (btccny, btc)
      btc btccny
      btc btccny
    
    
      52254
      2017-02-16
      #Bitcoin #Blockchain #FinTech.
      16:52:55
      *#BTC $BTCCNY @YourBTCC
      (btccny, #btc)
      #btc btccny
      #btc btccny
    
    
      718337
      2016-12-03
      #Bitcoin #Hacked #Collapse
      13:28:02
      $BTC $XBTE…  see more
      (see, xbte…, btc, more)
      btc xbte… see more
      btc xbte see
    
    
      914540
      2016-11-10
      #Bitcoin #Hacked #Ponzi(!)
      23:19:37
      #Collapse #99DEDseriouz.
      (#collapse, #99dedseriouz)
      #collapse #99dedseriouz
      #collapse #99dedseriouz
    
    
      353147
      2017-01-14
      #Bitcoin #Ponzi #Collapse
      13:14:29
      $BTCCNY @YourBTCC(!)
      (btccny,)
      btccny
      btccny
    
    
      198698
      2017-01-31
      #Bitcoin #Ponzi #Collapse(!)
      13:57:29
      $BTC $XBTEUR…
      (xbteur…, btc)
      btc xbteur…
      btc xbteur
    
    
      503289
      2016-12-28
      #Bitcoin $BTC $1350
      15:55:24
      #Monero $XMR $100
      (#monero, 100, xmr)
      #monero xmr 100
      #monero xmr
    
    
      206719
      2017-01-30
      #Bitcoin &gt; CALL
      17:13:28
      #SuicideHotline: 666-R-E-K-T
      (#suicidehotline, 666rekt)
      #suicidehotline 666rekt
      #suicidehotline
    
    
      701212
      2016-12-05
      #Bitcoin Distributed Ledger:
      16:06:27
      • Public #Blockchain = Internet…
      (#blockchain, internet…, •, public)
      • public #blockchain  internet…
      public #blockchain internet
    
    
      900295
      2016-11-12
      #Bitcoin here
      17:00:03
      Bestellung per Mail &amp; PN / Page. #CCO #PeggyHetscher #Bitcoin #Geil...
      (pn, bestellung, #geil, #cco, #bitcoin, amp, per, mail, #peggyhetscher, page)
      bestellung per mail amp pn  page #cco #peggyhetscher #bitcoin #geil
      bestellung per mail and pn page #cco #peggyhetscher #bitcoin #geil
    
    
      457522
      2017-01-02
      #Bitcoin hits $1000
      21:57:29
      And it's just the beginning
      (just, beginning, its, the, and)
      and its just the beginning
      beginning
    
    
      437670
      2017-01-05
      #Bitcoin is at $0.018 Tn at Year 8.
      01:39:33
      2 more yrs?
      (2, yrs, more)
      2 more yrs
      yrs
    
    
      310823
      2017-01-19
      #Bitcoin is for Everyone!
      09:20:29
      Tr…
      (tr…,)
      tr…
      tr
    
    
      915383
      2016-11-10
      #Bitcoin is rising
      20:45:58
      Gold is rising
      (gold, rising, is)
      gold is rising
      gold rising
    
    
      890090
      2016-11-13
      #Bitcoin payroll
      22:23:04
      #Equity…
      (#equity…,)
      #equity…
      #equity
    
    
      890095
      2016-11-13
      #Bitcoin payroll
      22:20:35
      #Equity in project
      (in, project, #equity)
      #equity in project
      #equity project
    
    
      967390
      2016-11-04
      #Bitcoin sent 24h
      16:46:48
      6,562,716 BTC (41.14% cap…
      (4114, cap…, 6562716, btc)
      6562716 btc 4114 cap…
      btc cap
    
    
      73870
      2017-02-14
      #BitcoinGenerator #Bitcoinhack
      04:57:44
      #Freebitcoin
      (#freebitcoin,)
      #freebitcoin
      #freebitcoin
    
    
      48
      2017-02-22
      #BitcoinKE
      19:30:06
      #ReTwita
      (#retwita,)
      #retwita
      twita
    
    
      700977
      2016-12-05
      #Bitcoin’s #blockchain #technology
      16:42:38
      puts #tech before the financial system.
      (financial, puts, system, #tech, the, before)
      puts #tech before the financial system
      puts #tech financial system
    
    
      57875
      2017-02-16
      #Blockchain #FinTech
      01:43:31
      #Bitcoin $BTC $BTCCN…
      (btccn…, #bitcoin, btc)
      #bitcoin btc btccn…
      #bitcoin btc btccn
    
    
      109444
      2017-02-10
      #Blockchain #Fintech😟
      00:48:29
      #Bitcoin $BTC $BTCCNY @YourBTCC
      (btccny, #bitcoin, btc)
      #bitcoin btc btccny
      #bitcoin btc btccny
    
    
      355273
      2017-01-14
      #CRBIT is awakening NOW
      07:04:44
      CRBIT/BTC Last: 0.000128 BTC 24High: 0.00013 BTC...
      (crbitbtc, 24high, 000013, last, btc, 0000128)
      crbitbtc last 0000128 btc 24high 000013 btc
      crbitbtc last btc btc
    
    
      234499
      2017-01-27
      #ChurchOfBitcoin 24-25th February @5thBaseGallery
      14:16:23
      #bitcoin #mass #art
      (#art, #mass, #bitcoin)
      #bitcoin #mass #art
      #bitcoin #mass #art
    
    
      919182
      2016-11-10
      #Cryptocurrency #Bittrex #Blockchain now is the
      11:32:05
      time  for a #digital currency #CAPRICOIN IS The  Perfect Coin !!!!
      (perfect, currency, #capricoin, time, #digital, the, coin, for, is, a)
      time for a #digital currency #capricoin is the perfect coin
      timeigital currency #capricoin perfect coin
    
    
      17880
      2017-02-20
      #Cryptomania = #Tulipmania!
      21:05:02
      #Blockchain #Fin…
      (#blockchain, #fin…)
      #blockchain #fin…
      #blockchain #fin



In [32]:

    
clean_df.sort_values(by='handle', inplace=True) #lets prep our dataframe for the cleaning process
clean_df.reset_index(inplace=True)
del clean_df['index']
clean_df.head()









    Out[32]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
      clean_tweet_V2
    
  
  
    
      0
      2016-12-28
      "BTC qualifies as money, a federal judge ruled"
      22:22:05
      h…
      (h…,)
      h…
      h
    
    
      1
      2016-12-30
      "Destinia decides to operate exclusively in Bitcoins in Venezuela"
      00:27:16
      v…
      (v…,)
      v…
      v
    
    
      2
      2016-12-21
      "I like big blocks &amp; I cannot lie"
      07:05:57
      previewing at a #blockchain conference near you
      (#blockchain, near, conference, at, previewing, a, you)
      previewing at a #blockchain conference near you
      previewing #blockchain conference near
    
    
      3
      2016-12-20
      "I want my house to be seen from space!": FESTIVE!
      07:39:12
      (ego)…
      (ego…,)
      ego…
      ego
    
    
      4
      2016-10-31
      #ARK $ARK #altcoin #Bitcoin #fintech #crowdfund #ICO
      06:30:22
      #block…
      (#block…,)
      #block…
      #block



In [33]:

    
clean_df.shape









    Out[33]:





(531257, 7)



In [34]:

    
to_drop = []
for index in clean_df.index:
    if clean_df.iloc[index, 1][0] == "_":#some users have _ at the start of their name
        continue
    if not clean_df.iloc[index, 1][0].isalnum():
        to_drop.append(index)



In [35]:

    
len(to_drop) #we need to drop 212 records









    Out[35]:





212



In [36]:

    
for handle in clean_df.iloc[to_drop, :]['handle']:
    print(handle)
    print()









    



"BTC qualifies as money, a federal judge ruled"

"Destinia decides to operate exclusively in Bitcoins in Venezuela"

"I like big blocks &amp; I cannot lie"

"I want my house to be seen from space!": FESTIVE!

#ARK $ARK #altcoin #Bitcoin #fintech #crowdfund #ICO

#ATM #Kiosks #Debitcard #Digital$$$ #Bitcoin ready

#ATM-#Debit-#Digital-#Kiosks

#BTC #bitcoin #crypto #bitcoins #marijuana #cannabis

#BTC #bitcoin #crypto #bitcoins #marijuana #cannabis

#BTC #cannabis #bitcoin #crypto #bitcoins #marijuana

#BULLISH

#BitCoin

#BitCoin (feat. #drZIUBER) - EP by #drHAJS

#BitID &amp; maybe #YoursLogin? @ryanxcharles @YoursNetwork

#Bitcoin

#Bitcoin

#Bitcoin

#Bitcoin

#Bitcoin

#Bitcoin

#Bitcoin

#Bitcoin

#Bitcoin #BTC

#Bitcoin #Blockchain #FinTech

#Bitcoin #Blockchain #FinTech

#Bitcoin #Blockchain #FinTech.

#Bitcoin #Hacked #Collapse

#Bitcoin #Hacked #Ponzi(!)

#Bitcoin #Ponzi #Collapse

#Bitcoin #Ponzi #Collapse(!)

#Bitcoin $BTC $1350

#Bitcoin &gt; CALL

#Bitcoin Distributed Ledger:

#Bitcoin here

#Bitcoin hits $1000

#Bitcoin is at $0.018 Tn at Year 8.

#Bitcoin is for Everyone!

#Bitcoin is rising

#Bitcoin payroll

#Bitcoin payroll

#Bitcoin sent 24h

#BitcoinGenerator #Bitcoinhack

#BitcoinKE

#Bitcoin’s #blockchain #technology

#Blockchain #FinTech

#Blockchain #Fintech😟

#CRBIT is awakening NOW

#ChurchOfBitcoin 24-25th February @5thBaseGallery

#Cryptocurrency #Bittrex #Blockchain now is the

#Cryptomania = #Tulipmania!

#Digits mined from a digital world are worth $787.84 each

#Digits mined from a digital world are worth $787.84 each

#EH

#ForLimitedTime

#FreeBitcoin Every 5 Minutes!

#FreeBitcoin Every 5 Minutes!

#Hero

#HiddenInflation

#IndigenousSovereignty

#IndigenousSovereignty

#JulianIsDead #proofoflife #bitcoin #ethereum #Linux #MIT #netsec #infosec

#MakeMoreMoney101 😈💯💥

#MovieTVTechGeeks

#NUMEROLOGY

#RT let's see who's interested.

#RT let's see who's interested.

#SBXofficial #SBXfintech

#STEEMIT

#SatoshiNakamoto #disruptivetech

#SatoshiNakamoto #disruptivetech

#SaveEdmund #FindSweetLiberty #CollapseTrx

#SaveEdmund #WEF #Davos2017 #Davos #QE2

#SaveEdmund #WEF #Davos2017 #Davos #QE2

#SaveEdmund @USMC #Davos2017 #Davos #WEF

#SpreadTheWord

#TCC

#TRUMPCOIN

#Thanks

#Trading

#VR #bitcoin #trade #business #money #btc  #bitcoins

#WTC7

#WTC7

#Whales

#WinBitcoin

#WomanCrushWednesday

#ZCLASSIC

#adspace #marketing #promote

#bit #bitcoinbillionaire #bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin

#bitcoin #Cryptocurrency #Crypto

#bitcoin #Cryptocurrency #Crypto

#bitcoin #bitclubnetwork #bitcoinrevolution

#bitcoin #bitcoins #insurance #CyberSecurity #domains

#bitcoin #bitcoins #insurance #CyberSecurity #domains

#bitcoin #bitcoins #insurance #CyberSecurity #domains

#bitcoin #cryptocurrency

#bitcoin #hyip #cash #money #investment #business

#bitcoin #hyip #mlm #M2M #Russia #Vietnam #Pakistan #india #philippines

#bitcoin #investment #inversión #investimento

#bitcoincasino #bitcoin #btc #bitcoins #casino #casinos

#bitcoincasino #bitcoin #btc #bitcoins #casino #casinos

#bitcoins

#blockchain #bitcoin #fintech

#blockchain #bitcoin #fintech #regtech

#blockchain Experts: Mine ZCash But Be Cautious

#blockchain without the #bitcoin

#bloktex

#btc

#btc #bitcoin #insurance #insure #coin #money

#btcc

#coolchickwisdom

#cryptocurrency

#earn 0,4 u #btc by #visiting #websites for a #minute

#earthsfinal321

#fintech

#fintech #AI #bitcoin #IoT

#fintech #bitcoin #blockchain #unbanked

#fintech #bitcoin #blockchain #unbanked

#fintech #bitcoin #payments #Insurtech

#fintech #bitcoin #payments #IoT

#fintech #blockchain

#fintech #blockchain

#fintech #blockchain #payments

#fintech #payments #Russia

#fintech #payments #blockchain

#fintech #payments #cryptocurrency #bitcoin #unbanked

#iOS, now. 😊

#iOS, now. 😊

#litecoin

#litecoin

#mindful #balance #Awareness

#myrant

#newsflash Kids!

#oldschool vs. #newparadigm

#restaurant

#silveretblack

#techfin #blockchain #fintech

#techfin #blockchain #fintech

#techfin #blockchain #fintech

#thefuture

#virtualreality #bitcoin #btc #bitcoins #VRgames

#virtualreality #bitcoin #btc #bitcoins #VRgames

#virtualreality #bitcoin #btc #bitcoins #VRgames

$1,999.99

$5Million/bitcoin: Q: infinitely divisible? Past 8 decimal places?

$ARK #altcoin #Bitcoin #fintech #crowdfund #ICO

$ARK #altcoin #Bitcoin #fintech #crowdfund #ICO

$ARK #altcoin #Bitcoin #fintech #crowdfund #ICO

$ARK-TEC

$ARK-TEC

$BTC #Blockchain #FinTech

$BTC #bitcoin

$BTC $XBTEUR @krakenfx #Bitcoin

$BTC $XBTEUR krakenfx #Bitcoin

$BTC +9.8%

$FacebookSDKs-iOS-4.18.0

&gt; Paypal (Friends and Family)

&gt; https://t.co/7Slymx5lis

&gt; https://t.co/7Slymx5lis via @vidme

&gt; https://t.co/7Slymx5lis via @vidme

&gt; https://t.co/P4cqCnIkCT

(@ezscott48)

(Transmitting payments)

*govt. pushes for digital transactions*

*govt. pushes for digital transactions*

- BitKings educate about Bitcoins

- Bitcoin Network ($1.34B #VC)

- For sale -

- Scheduled Pay

- Win immensely at QueenSpin casino!

- buy starbucks gift card w/ cash

-After $500 - $600USD

-Globally #Accepted. -Lowest #Fees. -#Unbanked👄

-this site is alot of fun, one of the better sites

-this site is alot of fun, one of the better sites

.....

...if not Bitcoin, then what?

.05 bitcoin one time out of pocket ($50.00)

@BitCongress @DemocracyEarth @FEC

@BitCongress @DemocracyEarth @FEC

@BitcoinFullnode @bitcointopia @NODEfather

@BitcoinKinetics @BitSwitchIO @BitCongress @inc_bitcoin

@CryptoHustle @ChrisDunnTV

@KimDotcom

@OKCoin

@VentureScanner

@eBitcoinics.com

@el33th4xor

@realDonaldTrump

| #fintech #money

• 25/25 Love Match Deposit bonus.

• 25/25 Love Match Deposit bonus.

⏩Min deposit $10

⏫⏫⏫⏫⏫⏫⏫⏫⏫⏫ ⏫⏫⏫⏫

▼▼▼

⚫ Your hardware is Already Running :

✔✔ Bitstarz Bonus Packages.

�BECOME A MILLIONAIRE THROUGH BITCOIN

🏆Top Overview and Trends

👉🏿



In [37]:

    
clean_df.drop(to_drop, inplace=True)



In [38]:

    
clean_df.head()









    Out[38]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
      clean_tweet_V2
    
  
  
    
      190
      2016-12-01
      0.1 #bitcoin will be shared among everyone who'll be present in the chat.
      20:31:41
      11PM, UK time.…
      (uk, time…, 11pm)
      11pm uk time…
      uk time
    
    
      191
      2016-12-29
      00000000R0B
      06:33:11
      Building a crypto coin: some FAQs  #bitcoin #money
      (#money, #bitcoin, faqs, building, crypto, coin, some, a)
      building a crypto coin some faqs #bitcoin #money
      building crypto coin faqs #bitcoin #money
    
    
      192
      2017-02-20
      0000nikolay3
      23:33:50
      Accept Bitcoin and other crypto in your online store with CoinPayments  #bitcoin #altcoins #CoinPayments
      (in, accept, online, #bitcoin, bitcoin, store, crypto, coinpayments, #coinpayments, and, your, with, #altcoins, other)
      accept bitcoin and other crypto in your online store with coinpayments #bitcoin #altcoins #coinpayments
      accept bitcoin crypto online store coinpayments #bitcoin #acoins #coinpayments
    
    
      193
      2017-02-09
      000_EVAN_000
      15:28:43
      RT @Cointelegraph: WireX Buy #Bitcoin - a Step Closer to the World’s First Bankless Personal #Banking  @wirexapp htt…
      (step, personal, #banking, to, closer, #bitcoin, world’s, bankless, buy, first, the, htt…, wirex, a)
      wirex buy #bitcoin  a step closer to the world’s first bankless personal #banking htt…
      wirex buy #bitcoin step closer worlds first bankless personal #banking htt
    
    
      194
      2016-11-10
      000gun
      07:41:28
      RT @SecNewsBot: Hacker News - Bitcoin’s Civil War Is Ending Soon
      (news, bitcoin’s, civil, ending, war, hacker, is, soon)
      hacker news  bitcoin’s civil war is ending soon
      hacker news bitcoins civil war ending soon



In [39]:

    
clean_df.drop([190], inplace=True)



In [40]:

    
clean_df.shape









    Out[40]:





(531044, 7)



In [41]:

    
clean_df.sort_values(by=['date', 'time'], inplace=True)
clean_df.reset_index(inplace=True)
del clean_df['index']
clean_df.head()









    Out[41]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
      clean_tweet_V2
    
  
  
    
      0
      2016-10-26
      ChiefApricot
      17:51:46
      Excellent primer on bitcoin mining
      (mining, primer, excellent, on, bitcoin)
      excellent primer on bitcoin mining
      excellent primer bitcoin mining
    
    
      1
      2016-10-26
      bitcoin_experts
      17:52:23
      theymos: Do you know of a good tutorial for that?
      (tutorial, of, for, know, theymos, that, do, good, a, you)
      theymos do you know of a good tutorial for that
      theymos know good tutorial
    
    
      2
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Darkstore opens on-demand delivery fulfillment centers in New York and Phoenix:  Darkst...  #bitcoin #crypto
      (in, york, #btc, #bitcoin, fulfillment, centers, darkst, delivery, new, phoenix, and, #crypto, ondemand, darkstore, opens)
      #btc darkstore opens ondemand delivery fulfillment centers in new york and phoenix darkst #bitcoin #crypto
      #btc darkstore opens ondemand delivery fulfillment centers new york phoenix darkst #bitcoin #crypto
    
    
      3
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Javelin Venture Partners just closed its fourth fund with $125 million:  Javelin Ventur...  #bitcoin #crypto
      (closed, just, 125, #btc, million, fourth, #bitcoin, partners, its, fund, venture, #crypto, with, ventur, javelin)
      #btc javelin venture partners just closed its fourth fund with 125 million javelin ventur #bitcoin #crypto
      #btc javelin venture partners closed fourth fund million javelin ventur #bitcoin #crypto
    
    
      4
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Microsoft shows off a new $99 input method with the Surface Dial:  Okay, this is cool. ...  #bitcoin #crypto
      (this, is, shows, microsoft, cool, #btc, method, #bitcoin, #crypto, dial, off, with, the, okay, input, 99, new, a, surface)
      #btc microsoft shows off a new 99 input method with the surface dial okay this is cool  #bitcoin #crypto
      #btc microsoft shows new input method surface dial okay cool #bitcoin #crypto



In [42]:

    
clean_df_tweet_by_day_plot = bar_graph(clean_df.groupby("date").count()['tweet'].index, clean_df.groupby("date").count()['tweet'], "tweets by day", "Date", "No. Tweets", "tweets-by-day-V2")
clean_df_tweet_by_day_plot #plot looks pretty uniform!









    Out[42]:



In [43]:

    
Image("tweets_by_day.png")









    Out[43]:



In [44]:

    
diff_df = pd.DataFrame(clean_df.groupby("date").count()['tweet'].diff())
diff_df_plot = line_graph(list(diff_df.index)[2:], list(diff_df['tweet'])[2:],'Difference in tweets by day', 'Date', 'Difference', 'difference-bar-plot')
diff_df_plot # a couple of spikes!









    Out[44]:



In [45]:

    
Image("difference_by_day.png")









    Out[45]:



In [46]:

    
#Let's find how many unique hashtags we have over a 4 month period
set_of_all_hashtags = set()
counts = {}
for i in range(clean_df.shape[0]):
    temp_lst = clean_df.iloc[i, 6].split()
    hashtags = [x for x in temp_lst if x.startswith("#")]
    for tag in hashtags:
        if tag not in counts:
            counts[tag] = 1
        else:
            counts[tag] += 1
        set_of_all_hashtags.add(tag)
print("There are {} unique hashtags over the course of 4 months".format(len(set_of_all_hashtags)))









    



There are 33110 unique hashtags over the course of 4 months



In [47]:

    
# let's visualize the top 50 hashtags
counts_lst = []
for k, v in counts.items():
    counts_lst.append((k, v))

top_50_hashtags = pd.DataFrame(sorted(counts_lst, key=lambda x: x[1], reverse=True)[:50])
top_50_hashtags.set_index(0, inplace=True)
top_50_hashtags.sort_values(by=1, inplace=True)
top_50_hashtags.index.names = ['hashtag']
top_50_hashtags = top_50_hashtags.rename(columns = {1:'count'})

top_50_hashtags_plot = horizontal_bar_graph(top_50_hashtags['count'], top_50_hashtags.index, 'Top 50 hashtags by count', 'Count', 'Hashtag','top-50-hashtags-barh')
top_50_hashtags_plot









    Out[47]:



In [48]:

    
Image("top_50_hashtags.png")









    Out[48]:

Lets look at some stats at the day level:



In [49]:

    
stats = pd.DataFrame(clean_df.groupby("date")['tweet'].size().describe())
stats.drop(["count"], inplace=True)
stats = stats.rename(columns = {"tweet":'tweets_per_day'})
print(tabulate(stats, headers='keys', tablefmt='fancy_grid'))









    



╒══════╤══════════════════╕
│      │   tweets_per_day │
╞══════╪══════════════════╡
│ mean │         4425.37  │
├──────┼──────────────────┤
│ std  │          547.018 │
├──────┼──────────────────┤
│ min  │         1453     │
├──────┼──────────────────┤
│ 25%  │         4184.5   │
├──────┼──────────────────┤
│ 50%  │         4468.5   │
├──────┼──────────────────┤
│ 75%  │         4737.5   │
├──────┼──────────────────┤
│ max  │         5593     │
╘══════╧══════════════════╛

Now that we have plotted the hashtags. There is no need to keep the hashtags in the corpus of a particular tweet. In fact, keeping the hashtag would serve to only confuse our sentiment calculations.



In [50]:

    
clean_df['clean_tweet_V2'] = clean_df['clean_tweet_V2'].apply(lambda x: x.replace("#",""))



In [51]:

    
clean_df.head()









    Out[51]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
      clean_tweet_V2
    
  
  
    
      0
      2016-10-26
      ChiefApricot
      17:51:46
      Excellent primer on bitcoin mining
      (mining, primer, excellent, on, bitcoin)
      excellent primer on bitcoin mining
      excellent primer bitcoin mining
    
    
      1
      2016-10-26
      bitcoin_experts
      17:52:23
      theymos: Do you know of a good tutorial for that?
      (tutorial, of, for, know, theymos, that, do, good, a, you)
      theymos do you know of a good tutorial for that
      theymos know good tutorial
    
    
      2
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Darkstore opens on-demand delivery fulfillment centers in New York and Phoenix:  Darkst...  #bitcoin #crypto
      (in, york, #btc, #bitcoin, fulfillment, centers, darkst, delivery, new, phoenix, and, #crypto, ondemand, darkstore, opens)
      #btc darkstore opens ondemand delivery fulfillment centers in new york and phoenix darkst #bitcoin #crypto
      btc darkstore opens ondemand delivery fulfillment centers new york phoenix darkst bitcoin crypto
    
    
      3
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Javelin Venture Partners just closed its fourth fund with $125 million:  Javelin Ventur...  #bitcoin #crypto
      (closed, just, 125, #btc, million, fourth, #bitcoin, partners, its, fund, venture, #crypto, with, ventur, javelin)
      #btc javelin venture partners just closed its fourth fund with 125 million javelin ventur #bitcoin #crypto
      btc javelin venture partners closed fourth fund million javelin ventur bitcoin crypto
    
    
      4
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Microsoft shows off a new $99 input method with the Surface Dial:  Okay, this is cool. ...  #bitcoin #crypto
      (this, is, shows, microsoft, cool, #btc, method, #bitcoin, #crypto, dial, off, with, the, okay, input, 99, new, a, surface)
      #btc microsoft shows off a new 99 input method with the surface dial okay this is cool  #bitcoin #crypto
      btc microsoft shows new input method surface dial okay cool bitcoin crypto

While it is excellent that we have such a high level of granularity for our time column, it is not needed for our analysis. Rather, it would be excellent if we could place tweets into "hourly" buckets. This way, we can have analysis for both the day level and the hour level!



In [52]:

    
hours = []

for index in clean_df.index:
    hours.append(clean_df.iloc[index, 2].hour)



In [53]:

    
clean_df.shape[0] == len(hours) #perfect









    Out[53]:





True



In [54]:

    
clean_df['hour_of_day'] = hours
clean_df = clean_df.set_value(clean_df[clean_df['hour_of_day'] == 0].index, "hour_of_day", 24)



In [55]:

    
clean_df.head()









    Out[55]:







  
    
      
      date
      handle
      time
      tweet
      tuple_version_tweet
      clean_tweet_V1
      clean_tweet_V2
      hour_of_day
    
  
  
    
      0
      2016-10-26
      ChiefApricot
      17:51:46
      Excellent primer on bitcoin mining
      (mining, primer, excellent, on, bitcoin)
      excellent primer on bitcoin mining
      excellent primer bitcoin mining
      17
    
    
      1
      2016-10-26
      bitcoin_experts
      17:52:23
      theymos: Do you know of a good tutorial for that?
      (tutorial, of, for, know, theymos, that, do, good, a, you)
      theymos do you know of a good tutorial for that
      theymos know good tutorial
      17
    
    
      2
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Darkstore opens on-demand delivery fulfillment centers in New York and Phoenix:  Darkst...  #bitcoin #crypto
      (in, york, #btc, #bitcoin, fulfillment, centers, darkst, delivery, new, phoenix, and, #crypto, ondemand, darkstore, opens)
      #btc darkstore opens ondemand delivery fulfillment centers in new york and phoenix darkst #bitcoin #crypto
      btc darkstore opens ondemand delivery fulfillment centers new york phoenix darkst bitcoin crypto
      17
    
    
      3
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Javelin Venture Partners just closed its fourth fund with $125 million:  Javelin Ventur...  #bitcoin #crypto
      (closed, just, 125, #btc, million, fourth, #bitcoin, partners, its, fund, venture, #crypto, with, ventur, javelin)
      #btc javelin venture partners just closed its fourth fund with 125 million javelin ventur #bitcoin #crypto
      btc javelin venture partners closed fourth fund million javelin ventur bitcoin crypto
      17
    
    
      4
      2016-10-26
      Bitcoin_City
      17:52:25
      #btc Microsoft shows off a new $99 input method with the Surface Dial:  Okay, this is cool. ...  #bitcoin #crypto
      (this, is, shows, microsoft, cool, #btc, method, #bitcoin, #crypto, dial, off, with, the, okay, input, 99, new, a, surface)
      #btc microsoft shows off a new 99 input method with the surface dial okay this is cool  #bitcoin #crypto
      btc microsoft shows new input method surface dial okay cool bitcoin crypto
      17



In [56]:

    
clean_df_tweet_by_hour_plot = line_graph(clean_df.groupby("hour_of_day").count()['tweet'].index, clean_df.groupby("hour_of_day").count()['tweet'], "tweets by hour", "Hour", "No. Tweets", "tweets-by-hour")
clean_df_tweet_by_hour_plot #plot looks pretty uniform!









    Out[56]:



In [57]:

    
Image("tweets_by_hour.png")









    Out[57]:



In [58]:

    
number_of_tweets = []
number_of_users = []

for k, v in Counter(clean_df.groupby("handle").count()['tweet']).items():
    number_of_tweets.append(k)
    number_of_users.append(v)
scaled_num_users = list(map(lambda x: x/10e3, number_of_users))

tweets_per_user_plot = bar_graph(number_of_tweets, scaled_num_users, "Tweets per user", "Number of tweets per user", "Number of Users (10e3)", "tweet-per-user")
tweets_per_user_plot









    Out[58]:



In [59]:

    
Image("tweets_by_user.png") #the first bar has a value of 5.0471 x 10^3









    Out[59]:

This concludes the exploration notebook! In the next notebook, we will gather some additional data and prepare our data for the modelling process!

As far as model building is concerned, we only need the dates, hours and clean_tweet_V2. Everything else is irrelevant. Let's go ahead and make these changes!



In [85]:

    
clean_df.drop(["handle", "time", "tweet", "tuple_version_tweet", "clean_tweet_V1"], axis=1, inplace=True)



In [86]:

    
clean_df.head()









    Out[86]:







  
    
      
      date
      clean_tweet_V2
      hour_of_day
    
  
  
    
      0
      2016-10-26
      excellent primer bitcoin mining
      17
    
    
      1
      2016-10-26
      theymos know good tutorial
      17
    
    
      2
      2016-10-26
      btc darkstore opens ondemand delivery fulfillment centers new york phoenix darkst bitcoin crypto
      17
    
    
      3
      2016-10-26
      btc javelin venture partners closed fourth fund million javelin ventur bitcoin crypto
      17
    
    
      4
      2016-10-26
      btc microsoft shows new input method surface dial okay cool bitcoin crypto
      17



In [87]:

    
pickle_object(clean_df, "clean_df_NB3_Complete")

Conclusion

In this notebook, we have attempted to control for bots as well as dupliate tweets. I am sure that there are tweets that slopped throughmy filtrations and cleaning code. However, this should be very small and not have a major impact on my results. At least, that is the hope!



In [ ]:

	date	handle	time	tweet	tuple_version_tweet	clean_tweet_V1
0	2017-02-22	FoabMoab	19:35:43	RT @bitcoinagile: .1 #bitcoin BTC Straight To Wallet USA Bitcoin Seller, Immediate Pay Out - BitCoin Gatherer	(immediate, 1, to, #bitcoin, out, straight, usa, seller, wallet, gatherer, pay, btc, bitcoin)	1 #bitcoin btc straight to wallet usa bitcoin seller immediate pay out bitcoin gatherer
1	2017-02-22	Bitcoin_Revo	19:35:39	#bitcoin “Is Like Positive Bacteria”: Russian State Bank Vice President	(like, bank, #bitcoin, vice, state, russian, president, positive, bacteria, is)	#bitcoin is like positive bacteria russian state bank vice president
2	2017-02-22	alt_bit_coins	19:35:31	Volatile Bitcoin Nears Its All-Time High - Wall Street Journal (subscription) (blog)	(volatile, wall, street, subscription, blog, its, nears, alltime, journal, high, bitcoin)	volatile bitcoin nears its alltime high wall street journal subscription blog
3	2017-02-22	Rhino3nity	19:35:26	Missed out on #bitcoin? This #cryptocurrency created OVER 300 MILLIONAIRES WORLDWIDE within 12... by #davidf1982	(this, millionaires, over, worldwide, created, within, on, #bitcoin, out, #davidf1982, 300, 12, by, #cryptocurrency, missed)	missed out on #bitcoin this #cryptocurrency created over 300 millionaires worldwide within 12 by #davidf1982
4	2017-02-22	Siimple_inc	19:35:18	RT @blockchainhelpr: 5 Weak Points Of #Blockchain Technology #tech #bitcoin	(points, of, #bitcoin, technology, #tech, 5, weak, #blockchain)	5 weak points of #blockchain technology #tech #bitcoin

	date	handle	time	tweet	tuple_version_tweet	clean_tweet_V1
1047747	NaN	kointrend	NaN	RT @FractalNode: Great News #Dash, #Bitcoin & #Steem blockchains in the Smartcoin Wallet	(in, news, blockchains, #bitcoin, #dash, amp, the, smartcoin, wallet, #steem, great)	great news #dash #bitcoin amp #steem blockchains in the smartcoin wallet
1047748	NaN	FaucetGaming	NaN	Multiplayer #Poker launching tomorrow on	(multiplayer, on, launching, tomorrow, #poker)	multiplayer #poker launching tomorrow on
1047755	NaN	FollowOrFadeMe	NaN	RT @FollowOrFadeMe: Join me at:	(me, at, join)	join me at
1047765	NaN	bitcoin_experts	NaN	luke-jr: > So you propose a soft fork to 500 kB limit?	(500, propose, soft, fork, kb, limit, so, to, gt, lukejr, a, you)	lukejr gt so you propose a soft fork to 500 kb limit
1047766	NaN	mikeytea123	NaN	Get 2 ETH back with my referral! #KIBO less than 2 weeks left	(my, weeks, get, referral, eth, #kibo, 2, less, back, than, with, left)	get 2 eth back with my referral #kibo less than 2 weeks left

	date	handle	time	tweet	tuple_version_tweet	clean_tweet_V1
1047740	1970-01-01	BitcoinPlusOrg	00:00:00.136185	Network hashrate. (hash/s)	(hashrate, network, hashs)	network hashrate hashs
1047741	1970-01-01	DOGE/RUB	00:00:00.013534	0.013619	(0013619,)	0013619
1047742	1970-01-01	DOGE/RUB	00:00:00.013331	0.0134	(00134,)	00134
1047743	1970-01-01	DOGE/RUB	00:00:00.012855	0.01299899	(001299899,)	001299899
1047744	1970-01-01	LTC/BTC	00:00:00.004666	0.00473865	(000473865,)	000473865
1047745	1970-01-01	LTC/BTC	00:00:00.004638	0.00474998	(000474998,)	000474998
1047746	1970-01-01	LTC/BTC	00:00:00.004254	0.00435327	(000435327,)	000435327

	date	handle	time	tweet	tuple_version_tweet	clean_tweet_V1	clean_tweet_V2
500708	2016-12-28	"BTC qualifies as money, a federal judge ruled"	22:22:05	h…	(h…,)	h…	h
491015	2016-12-30	"Destinia decides to operate exclusively in Bitcoins in Venezuela"	00:27:16	v…	(v…,)	v…	v
566239	2016-12-21	"I like big blocks & I cannot lie"	07:05:57	previewing at a #blockchain conference near you	(#blockchain, near, conference, at, previewing, a, you)	previewing at a #blockchain conference near you	previewing #blockchain conference near
575012	2016-12-20	"I want my house to be seen from space!": FESTIVE!	07:39:12	(ego)…	(ego…,)	ego…	ego
1006691	2016-10-31	#ARK $ARK #altcoin #Bitcoin #fintech #crowdfund #ICO	06:30:22	#block…	(#block…,)	#block…	#block
572863	2016-12-20	#ATM #Kiosks #Debitcard #Digital$$$ #Bitcoin ready	13:37:54	#Unbanked #payroll #ID Cards…	(#unbanked, #id, cards…, #payroll)	#unbanked #payroll #id cards…	#unbanked #payroll #id cards
650214	2016-12-11	#ATM-#Debit-#Digital-#Kiosks	18:57:39	#BSA✅ #BIN-#Bank✅…	(#bin#bank✅…, #bsa✅)	#bsa✅ #bin#bank✅…	#bsa✅ #bin#bank✅
714070	2016-12-04	#BTC #bitcoin #crypto #bitcoins #marijuana #cannabis	01:40:46	#ganja #weed #hem…	(#hem…, #ganja, #weed)	#ganja #weed #hem…	#ganja #weed #hem
716578	2016-12-03	#BTC #bitcoin #crypto #bitcoins #marijuana #cannabis	18:10:33	#ganja #weed #hemp…	(#ganja, #hemp…, #weed)	#ganja #weed #hemp…	#ganja #weed #hemp
900752	2016-11-12	#BTC #cannabis #bitcoin #crypto #bitcoins #marijuana	15:50:06	#ganja #weed #hemp #domain #domains	(#domains, #weed, #domain, #ganja, #hemp)	#ganja #weed #hemp #domain #domains	#ganja #weed #hempomainomains
877212	2016-11-15	#BULLISH	08:16:12	#UNDERVALUED	(#undervalued,)	#undervalued	#undervalued
352528	2017-01-14	#BitCoin	15:11:17	#TheBillionCoin…	(#thebillioncoin…,)	#thebillioncoin…	#thebillioncoin
327975	2017-01-17	#BitCoin (feat. #drZIUBER) - EP by #drHAJS	11:30:59	→…	(→…,)	→…	→
745519	2016-11-30	#BitID & maybe #YoursLogin? @ryanxcharles @YoursNetwork	12:05:22	(demo)	(demo,)	demo	demo
16791	2017-02-21	#Bitcoin	00:03:59	#Fint…	(#fint…,)	#fint…	#fint
372341	2017-01-12	#Bitcoin	06:38:46	#Crypto	(#crypto,)	#crypto	#crypto
248146	2017-01-26	#Bitcoin	03:03:14	#ライトライズ…	(#ライトライズ…,)	#ライトライズ…	#ライトライズ
298184	2017-01-20	#Bitcoin	17:15:51	$BTC $XBT	(xbt, btc)	btc xbt	btc xbt
536744	2016-12-24	#Bitcoin	16:29:20	#Gold	(#gold,)	#gold	#gold
300858	2017-01-20	#Bitcoin	10:44:24	#Golem	(#golem,)	#golem	#golem
623240	2016-12-14	#Bitcoin	21:55:00	#Privacy	(#privacy,)	#privacy	#privacy
648886	2016-12-11	#Bitcoin	22:48:33	#coins	(#coins,)	#coins	#coins
531577	2016-12-25	#Bitcoin #BTC	07:59:01	#Make_More_Bitcoins	(#makemorebitcoins,)	#makemorebitcoins	#makemorebitcoins
111424	2017-02-09	#Bitcoin #Blockchain #FinTech	19:37:31	$BTCCNY $BTC…	(btccny, btc…)	btccny btc…	btccny btc
6205	2017-02-22	#Bitcoin #Blockchain #FinTech	04:10:25	$BTC $BTCCNY @YourBTCC	(btccny, btc)	btc btccny	btc btccny
52254	2017-02-16	#Bitcoin #Blockchain #FinTech.	16:52:55	*#BTC $BTCCNY @YourBTCC	(btccny, #btc)	#btc btccny	#btc btccny
718337	2016-12-03	#Bitcoin #Hacked #Collapse	13:28:02	$BTC $XBTE… see more	(see, xbte…, btc, more)	btc xbte… see more	btc xbte see
914540	2016-11-10	#Bitcoin #Hacked #Ponzi(!)	23:19:37	#Collapse #99DEDseriouz.	(#collapse, #99dedseriouz)	#collapse #99dedseriouz	#collapse #99dedseriouz
353147	2017-01-14	#Bitcoin #Ponzi #Collapse	13:14:29	$BTCCNY @YourBTCC(!)	(btccny,)	btccny	btccny
198698	2017-01-31	#Bitcoin #Ponzi #Collapse(!)	13:57:29	$BTC $XBTEUR…	(xbteur…, btc)	btc xbteur…	btc xbteur
503289	2016-12-28	#Bitcoin $BTC $1350	15:55:24	#Monero $XMR $100	(#monero, 100, xmr)	#monero xmr 100	#monero xmr
206719	2017-01-30	#Bitcoin > CALL	17:13:28	#SuicideHotline: 666-R-E-K-T	(#suicidehotline, 666rekt)	#suicidehotline 666rekt	#suicidehotline
701212	2016-12-05	#Bitcoin Distributed Ledger:	16:06:27	• Public #Blockchain = Internet…	(#blockchain, internet…, •, public)	• public #blockchain internet…	public #blockchain internet
900295	2016-11-12	#Bitcoin here	17:00:03	Bestellung per Mail & PN / Page. #CCO #PeggyHetscher #Bitcoin #Geil...	(pn, bestellung, #geil, #cco, #bitcoin, amp, per, mail, #peggyhetscher, page)	bestellung per mail amp pn page #cco #peggyhetscher #bitcoin #geil	bestellung per mail and pn page #cco #peggyhetscher #bitcoin #geil
457522	2017-01-02	#Bitcoin hits $1000	21:57:29	And it's just the beginning	(just, beginning, its, the, and)	and its just the beginning	beginning
437670	2017-01-05	#Bitcoin is at $0.018 Tn at Year 8.	01:39:33	2 more yrs?	(2, yrs, more)	2 more yrs	yrs
310823	2017-01-19	#Bitcoin is for Everyone!	09:20:29	Tr…	(tr…,)	tr…	tr
915383	2016-11-10	#Bitcoin is rising	20:45:58	Gold is rising	(gold, rising, is)	gold is rising	gold rising
890090	2016-11-13	#Bitcoin payroll	22:23:04	#Equity…	(#equity…,)	#equity…	#equity
890095	2016-11-13	#Bitcoin payroll	22:20:35	#Equity in project	(in, project, #equity)	#equity in project	#equity project
967390	2016-11-04	#Bitcoin sent 24h	16:46:48	6,562,716 BTC (41.14% cap…	(4114, cap…, 6562716, btc)	6562716 btc 4114 cap…	btc cap
73870	2017-02-14	#BitcoinGenerator #Bitcoinhack	04:57:44	#Freebitcoin	(#freebitcoin,)	#freebitcoin	#freebitcoin
48	2017-02-22	#BitcoinKE	19:30:06	#ReTwita	(#retwita,)	#retwita	twita
700977	2016-12-05	#Bitcoin’s #blockchain #technology	16:42:38	puts #tech before the financial system.	(financial, puts, system, #tech, the, before)	puts #tech before the financial system	puts #tech financial system
57875	2017-02-16	#Blockchain #FinTech	01:43:31	#Bitcoin $BTC $BTCCN…	(btccn…, #bitcoin, btc)	#bitcoin btc btccn…	#bitcoin btc btccn
109444	2017-02-10	#Blockchain #Fintech😟	00:48:29	#Bitcoin $BTC $BTCCNY @YourBTCC	(btccny, #bitcoin, btc)	#bitcoin btc btccny	#bitcoin btc btccny
355273	2017-01-14	#CRBIT is awakening NOW	07:04:44	CRBIT/BTC Last: 0.000128 BTC 24High: 0.00013 BTC...	(crbitbtc, 24high, 000013, last, btc, 0000128)	crbitbtc last 0000128 btc 24high 000013 btc	crbitbtc last btc btc
234499	2017-01-27	#ChurchOfBitcoin 24-25th February @5thBaseGallery	14:16:23	#bitcoin #mass #art	(#art, #mass, #bitcoin)	#bitcoin #mass #art	#bitcoin #mass #art
919182	2016-11-10	#Cryptocurrency #Bittrex #Blockchain now is the	11:32:05	time for a #digital currency #CAPRICOIN IS The Perfect Coin !!!!	(perfect, currency, #capricoin, time, #digital, the, coin, for, is, a)	time for a #digital currency #capricoin is the perfect coin	timeigital currency #capricoin perfect coin
17880	2017-02-20	#Cryptomania = #Tulipmania!	21:05:02	#Blockchain #Fin…	(#blockchain, #fin…)	#blockchain #fin…	#blockchain #fin

	date	handle	time	tweet	tuple_version_tweet	clean_tweet_V1	clean_tweet_V2
190	2016-12-01	0.1 #bitcoin will be shared among everyone who'll be present in the chat.	20:31:41	11PM, UK time.…	(uk, time…, 11pm)	11pm uk time…	uk time
191	2016-12-29	00000000R0B	06:33:11	Building a crypto coin: some FAQs #bitcoin #money	(#money, #bitcoin, faqs, building, crypto, coin, some, a)	building a crypto coin some faqs #bitcoin #money	building crypto coin faqs #bitcoin #money
192	2017-02-20	0000nikolay3	23:33:50	Accept Bitcoin and other crypto in your online store with CoinPayments #bitcoin #altcoins #CoinPayments	(in, accept, online, #bitcoin, bitcoin, store, crypto, coinpayments, #coinpayments, and, your, with, #altcoins, other)	accept bitcoin and other crypto in your online store with coinpayments #bitcoin #altcoins #coinpayments	accept bitcoin crypto online store coinpayments #bitcoin #acoins #coinpayments
193	2017-02-09	000_EVAN_000	15:28:43	RT @Cointelegraph: WireX Buy #Bitcoin - a Step Closer to the World’s First Bankless Personal #Banking @wirexapp htt…	(step, personal, #banking, to, closer, #bitcoin, world’s, bankless, buy, first, the, htt…, wirex, a)	wirex buy #bitcoin a step closer to the world’s first bankless personal #banking htt…	wirex buy #bitcoin step closer worlds first bankless personal #banking htt
194	2016-11-10	000gun	07:41:28	RT @SecNewsBot: Hacker News - Bitcoin’s Civil War Is Ending Soon	(news, bitcoin’s, civil, ending, war, hacker, is, soon)	hacker news bitcoin’s civil war is ending soon	hacker news bitcoins civil war ending soon

	date	handle	time	tweet	tuple_version_tweet	clean_tweet_V1	clean_tweet_V2
0	2016-10-26	ChiefApricot	17:51:46	Excellent primer on bitcoin mining	(mining, primer, excellent, on, bitcoin)	excellent primer on bitcoin mining	excellent primer bitcoin mining
1	2016-10-26	bitcoin_experts	17:52:23	theymos: Do you know of a good tutorial for that?	(tutorial, of, for, know, theymos, that, do, good, a, you)	theymos do you know of a good tutorial for that	theymos know good tutorial
2	2016-10-26	Bitcoin_City	17:52:25	#btc Darkstore opens on-demand delivery fulfillment centers in New York and Phoenix: Darkst... #bitcoin #crypto	(in, york, #btc, #bitcoin, fulfillment, centers, darkst, delivery, new, phoenix, and, #crypto, ondemand, darkstore, opens)	#btc darkstore opens ondemand delivery fulfillment centers in new york and phoenix darkst #bitcoin #crypto	#btc darkstore opens ondemand delivery fulfillment centers new york phoenix darkst #bitcoin #crypto
3	2016-10-26	Bitcoin_City	17:52:25	#btc Javelin Venture Partners just closed its fourth fund with $125 million: Javelin Ventur... #bitcoin #crypto	(closed, just, 125, #btc, million, fourth, #bitcoin, partners, its, fund, venture, #crypto, with, ventur, javelin)	#btc javelin venture partners just closed its fourth fund with 125 million javelin ventur #bitcoin #crypto	#btc javelin venture partners closed fourth fund million javelin ventur #bitcoin #crypto
4	2016-10-26	Bitcoin_City	17:52:25	#btc Microsoft shows off a new $99 input method with the Surface Dial: Okay, this is cool. ... #bitcoin #crypto	(this, is, shows, microsoft, cool, #btc, method, #bitcoin, #crypto, dial, off, with, the, okay, input, 99, new, a, surface)	#btc microsoft shows off a new 99 input method with the surface dial okay this is cool #bitcoin #crypto	#btc microsoft shows new input method surface dial okay cool #bitcoin #crypto