Twitter: An Analysis of Linguistic Diversity

Part IV

This particular database was collecting tweets between middle of January through the middle of May. So there is a time dimension to these data. If we glance back at the first tweet notebook, we see that there is an attribute named created_at. This is a timestamp of when the tweet was published for the world to see.

Adding a time component to an analysis gives us the option to follow trends. When are hashtags popular? How quickly do they die? For our linguistic diversity analysis, it certainly begs for a modified analysis.

Quick question: Would accounting for time when calculating a shannon index on a city have any effect? Would cities stay stable throughout time in regards to their linguistic diversity? Are some cities more prone to fluctuations than others? Well, a timestamp allows us to explore these questions and more.


Today we are going to be comparing two different cities: New York City, New York and Columbia, Missouri. By now you are probably aware that the job id for Columbia is 261. But what is the job id for New York City. That is a simple query of the job table.


In [1]:
# BE SURE TO RUN THIS CELL BEFORE ANY OF THE OTHER CELLS

import psycopg2
import pandas as pd
from skbio.diversity.alpha import shannon

In [2]:
# query database
statement = """
SELECT * 
FROM twitter.job
WHERE description LIKE '%New York City%';
"""

try:
    connect_str = "dbname='twitter' user='dsa_ro_user' host='dbase.dsa.missouri.edu'password='readonly'"
    # use our connection values to establish a connection
    conn = psycopg2.connect(connect_str)
    cursor = conn.cursor()
    cursor.execute(statement)
    
    column_names = [desc[0] for desc in cursor.description]
    rows = cursor.fetchall()
except Exception as e:
    print("Uh oh, can't connect. Invalid dbname, user or password?")
    print(e)
    
# create dictionary from the rows and column names   
new_york = {}
for i in list(range(len(column_names))):
     new_york['{}'.format(column_names[i])] = [x[i] for x in rows]

# turn dictionary into a data frame
pd.DataFrame(new_york)


Out[2]:
analysis_state description job_id last_count last_run oauth_id query since_id_str state zombie_head
0 0 New York City, New York 291 3799 2017-05-18 14:49:19 4 q=&geocode=40.7127837,-74.00594129999999,40km 865292001268436992 32 6

So above we use the LIKE statement in conjunction with the % sign. The LIKE operator is going to match a string while the % matches any string of 0 or greater length. If you know the exact match then you needn't use the % sign.

YOUR TURN

PRACTICE: Just a refresher. Query the 10,000 tweets from the tweet table where the job_id corresponds to New York City. Be sure to also select the the description column from the job table so that every record returned has a description saying "New York City, New York.


In [3]:
# put your code here
# ------------------

# query database
statement = """
SELECT j.description, t.*
FROM twitter.job j, twitter.tweet t
WHERE j.description LIKE '%New York City%' AND j.job_id = t.job_id
LIMIT 10000;
"""

try:
    connect_str = "dbname='twitter' user='dsa_ro_user' host='dbase.dsa.missouri.edu'password='readonly'"
    # use our connection values to establish a connection
    conn = psycopg2.connect(connect_str)
    cursor = conn.cursor()
    cursor.execute(statement)
    
    column_names = [desc[0] for desc in cursor.description]
    rows = cursor.fetchall()
except Exception as e:
    print("Uh oh, can't connect. Invalid dbname, user or password?")
    print(e)
    
# create dictionary from the rows and column names   
new_york = {}
for i in list(range(len(column_names))):
     new_york['{}'.format(column_names[i])] = [x[i] for x in rows]

# turn dictionary into a data frame
pd.DataFrame(new_york)


Out[3]:
analysis_state created_at description from_user from_user_created_at from_user_favorites from_user_followers from_user_following from_user_fullname from_user_id_str ... job_id location_geo location_geo_0 location_geo_1 source text to_user to_user_id_str to_user_name tweet_id_str
0 0 2017-02-24 01:04:21 New York City, New York 2905763033 2014-12-05 00:11:06 3222 621 304 muñeca 2905763033 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @trinnyyy__: @jerryyrich https://t.co/rRB1T... None None None 834931939408822272
1 0 2017-02-24 01:04:21 New York City, New York 364652604 2011-08-30 03:12:51 29166 889 366 Amelia Thorngate 364652604 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @hopeTGOD: https://t.co/YzFURpsoZH None None None 834931942340579328
2 0 2017-02-24 01:04:21 New York City, New York 433954531 2011-12-11 07:05:56 1120 750 183 Drippy 2x 433954531 ... 291 None None None <a href="http://twitter.com/download/iphone" r... Lmaooo https://t.co/SpzM9P775M None None None 834931938590867457
3 0 2017-02-24 01:04:20 New York City, New York 14608191 2008-05-01 00:25:19 15542 797 1262 Rainbow Doom 14608191 ... 291 None None None <a href="http://twitter.com/download/iphone" r... @MorganL666 @GrumpyTheology Oh Pence, definite... 2321682552 2321682552 MorganL666 834931936455966725
4 0 2017-02-24 01:04:20 New York City, New York 4004931792 2015-10-24 18:33:43 84 334 1 The Rumor 4004931792 ... 291 None None None <a href="http://www.botize.com" rel="nofollow"... Pero LopezObrador rechazó debatir, dice que p... None None None 834931936820879360
5 0 2017-02-24 01:04:20 New York City, New York 1510127773 2013-06-12 10:18:18 4906 1814 542 Steph Royalty 1510127773 ... 291 None None None <a href="http://twitter.com/download/iphone" r... Lmao I've missed you https://t.co/pD1lqGn4gf None None None 834931936929939456
6 0 2017-02-24 01:04:20 New York City, New York 291427983 2011-05-02 00:37:20 2739 426 606 Jess 291427983 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @abbn0rmal_: People will swear on anything ... None None None 834931937055768576
7 0 2017-02-24 01:04:20 New York City, New York 382475814 2011-09-30 04:11:04 6651 184 347 terry johnson 382475814 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @HillaryClinton: If you can't stand the hea... None None None 834931937114476544
8 0 2017-02-24 01:04:20 New York City, New York 2596152405 2014-06-09 22:18:24 404 56 87 Dr Farasat 2596152405 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @JoyAnnReid: Exaxtly. And back then all we ... None None None 834931937227784193
9 0 2017-02-24 01:04:20 New York City, New York 466129515 2012-01-17 02:56:49 8165 440 213 15 days areej 466129515 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @ltsKermit: people: you're so quiet \\n\\... None None None 834931937357639681
10 0 2017-02-24 01:04:21 New York City, New York 754129117 2012-08-13 01:12:06 0 304 157 Johnson Speer 754129117 ... 291 None None None <a href="https://dlvrit.com/" rel="nofollow">d... Drone footage of erupting volcano gets me hot ... None None None 834931939324747776
11 0 2017-02-24 01:04:20 New York City, New York 438902612 2011-12-17 04:35:28 4948 196 130 Rissa Amor 438902612 ... 291 None None None <a href="http://twitter.com/download/android" ... Ka sweet ani akong partner sa ojt oyk None None None 834931937567363073
12 0 2017-02-24 01:04:20 New York City, New York 2241738288 2013-12-12 03:39:27 26885 2094 262 sal 2241738288 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @baratunde: #transrights https://t.co/LUOGq... None None None 834931937580040192
13 0 2017-02-24 01:04:20 New York City, New York 1525554416 2013-06-17 18:32:43 1709 155 985 LMC 1525554416 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @NoahCRothman: Holy crap! https://t.co/QLeB... None None None 834931937584246784
14 0 2017-02-24 01:04:20 New York City, New York 3170925475 2015-04-24 14:37:59 13163 235 174 Kimberly Bozman 3170925475 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @CecileRichards: Fact: Planned Parenthood s... None None None 834931937613594624
15 0 2017-02-24 01:04:20 New York City, New York 805112058631114752 2016-12-03 18:10:47 4210 306 689 Denise Savage 805112058631114752 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @JoyAnnReid: What you're hearing from Banno... None None None 834931937622036481
16 0 2017-02-24 01:04:20 New York City, New York 71702060 2009-09-05 02:18:17 122 106 416 Ca Gingrich 71702060 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @JamesOKeefeIII: In this bit we found @CNN'... None None None 834931937739427841
17 0 2017-02-24 01:04:20 New York City, New York 291301221 2011-05-01 19:35:14 284 9424 9841 Queen Sha ❤ 291301221 ... 291 None None None <a href="https://about.twitter.com/products/tw... RT @Baddiesvibe: None None None 834931937752072192
18 0 2017-02-24 01:04:20 New York City, New York 32818485 2009-04-18 06:52:53 8962 1005 584 Rick Deckard 32818485 ... 291 None None None <a href="https://about.twitter.com/products/tw... Y'all been saying it for years, but... INJUSTI... None None None 834931937768796160
19 0 2017-02-24 01:04:20 New York City, New York 419156720 2011-11-23 01:16:54 3390 7795 744 Tat'Apach ○● 419156720 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @HISHAMTAWFIQ: What's definition of betraya... None None None 834931937777184770
20 0 2017-02-24 01:04:20 New York City, New York 1718473842 2013-09-01 09:50:27 39 10853 674 Lisa McCoy 1718473842 ... 291 None None None <a href="https://dlvrit.com/" rel="nofollow">d... #Chat Now &lt;3 aooadachris https://t.co/BP2oc... None None None 834931937839988736
21 0 2017-02-24 01:04:20 New York City, New York 173948894 2010-08-02 19:13:55 2483 319 603 Adriana Sañudo 173948894 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @mehreenkasana: You won't have to host refu... None None None 834931937903054848
22 0 2017-02-24 01:04:20 New York City, New York 39765949 2009-05-13 14:57:45 1302 574 581 Mala Nicholson 39765949 ... 291 None None None <a href="http://twitter.com/download/android" ... Eeeewwwwww https://t.co/1R6WmHzJNc None None None 834931937919778816
23 0 2017-02-24 01:04:20 New York City, New York 349406147 2011-08-06 01:53:28 31333 417 187 mary 349406147 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @ltsKermit: Me: Stop spending money you nee... None None None 834931937995341824
24 0 2017-02-24 01:04:20 New York City, New York 17099732 2008-11-01 12:31:20 208 151 406 Lucian Lipinsky 17099732 ... 291 None None None <a href="http://twitter.com/download/iphone" r... @potus what happened to believing in states ri... 822215679726100480 822215679726100480 POTUS 834931938024636416
25 0 2017-02-24 01:04:20 New York City, New York 284531088 2011-04-19 13:16:16 3 16608 865 Tech News 284531088 ... 291 None None None <a href="https://dlvrit.com/" rel="nofollow">d... #Chat Now &lt;3 aooadachris https://t.co/toR8r... None None None 834931938087444481
26 0 2017-02-24 01:04:20 New York City, New York 754567111628955648 2016-07-17 06:43:13 11836 107 73 Domestic Violence 754567111628955648 ... 291 None None None <a href="https://www.socialoomph.com" rel="nof... Domestic Violence Advocate-Transitional Housin... None None None 834931938087555072
27 0 2017-02-24 01:04:21 New York City, New York 824847214644039682 2017-01-27 05:11:15 0 1214 4447 Shaquana Toll 824847214644039682 ... 291 None None None <a href="http://twitter.com" rel="nofollow">Tw... THIS Is How To Flush Toxins From Your -- https... None None None 834931938297331714
28 0 2017-02-24 01:04:21 New York City, New York 17296829 2008-11-10 22:26:50 5286 204 1021 Mot Justice 17296829 ... 291 None None None <a href="http://twitter.com" rel="nofollow">Tw... If the Cabinet picks were chosen to deconstruc... None None None 834931938368565251
29 0 2017-02-24 01:04:21 New York City, New York 350398451 2011-08-07 18:16:14 65343 431 298 Caryn Wallace 350398451 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @activist360: A bold opposition to Trump's ... None None None 834931938410520577
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9970 0 2017-02-24 13:20:42 New York City, New York 181747515 2010-08-22 23:43:46 150 585 357 LUVERDEMUSIC 181747515 ... 291 None None None <a href="http://www.luverdemusic.net/p/lvmradi... #SUENA: Don Miguelo - Te Vienes Conmigo En LVM... None None None 835117247446265856
9971 0 2017-02-24 13:20:42 New York City, New York 159908535 2010-06-26 16:57:34 18572 1345 2413 Liberal 159908535 ... 291 None None None <a href="http://twitter.com/#!/download/ipad" ... RT @mmpadellan: PLEASE RETWEET!\\ntrump DIAGNO... None None None 835117247601471489
9972 0 2017-02-24 13:20:42 New York City, New York 2362933736 2014-02-26 16:47:55 3264 172 1301 Best Friends of Pets 2362933736 ... 291 None None None <a href="http://twitter.com" rel="nofollow">Tw... RT @quotebxundnews: Awesome post about project... None None None 835117247651708928
9973 0 2017-02-24 13:20:42 New York City, New York 514992082 2012-03-05 02:05:30 1621 9 67 compostela68 514992082 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @thelatintimes: Watch @jorgeramosnews deliv... None None None 835117247718965249
9974 0 2017-02-24 13:20:42 New York City, New York 3351486346 2015-06-30 01:59:44 3323 66 162 prince royce fan 3351486346 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @PrinceRoyce: #PrinceRoyceFIVE on @AppleMus... None None None 835117247735660544
9975 0 2017-02-24 13:20:42 New York City, New York 16630576 2008-10-07 14:45:33 2211 471 156 Alex Kang 16630576 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @businessinsider: Bitcoin is hovering near ... None None None 835117247739932672
9976 0 2017-02-24 13:20:42 New York City, New York 2398740702 2014-03-20 02:34:11 6726 1567 4936 sed 2398740702 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @KawaiiDesuBiach: Wild Thing, You Make My H... None None None 835117247844732928
9977 0 2017-02-24 13:20:42 New York City, New York 825499839689609216 2017-01-29 00:24:33 209 72 102 Stephanie Bacino 825499839689609216 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @foxandfriends: A grim milestone! Chicago s... None None None 835117248280997893
9978 0 2017-02-24 13:20:42 New York City, New York 784465812631400448 2016-10-07 18:49:59 648 30 31 저미해시의 커뮤계~~~ 784465812631400448 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @M_Lastwinter: [3차 홍보]\\n\\n❄LAST W... None None None 835117248456974337
9979 0 2017-02-24 13:20:42 New York City, New York 23696679 2009-03-11 01:13:22 1374 1502 861 MARCUS MARTIN 23696679 ... 291 None None None <a href="http://twitter.com/download/iphone" r... @Liiiii_Liiiii uh oh lol 65691436 65691436 Liiiii_Liiiii 835117248733917184
9980 0 2017-02-24 13:20:42 New York City, New York 1452943862 2013-05-24 01:27:43 25794 1798 1290 KAT 1452943862 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @ABOOGlE_: ⠀\\n⠀⠀⠀⠀⠀⠀⠀⠀â ... None None None 835117248817790977
9981 0 2017-02-24 13:20:42 New York City, New York 756456272979857408 2016-07-22 11:50:04 3258 91 227 ㅇㅅㅇ 756456272979857408 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @KawaiiDesuBiach: Whoever invented the Zero... None None None 835117249040035841
9982 0 2017-02-24 13:20:42 New York City, New York 809178334311677958 2016-12-14 23:28:43 124 1369 196 Lumberjack Jones 809178334311677958 ... 291 None None None <a href="http://twitter.com" rel="nofollow">Tw... #TheTazShow set is falling apart!!! #WeOver ht... None None None 835117249048547330
9983 0 2017-02-24 13:20:42 New York City, New York 3258096714 2015-06-27 21:59:34 372 24 965 Brian 3258096714 ... 291 None None None <a href="http://twitter.com" rel="nofollow">Tw... RT @Mark_Beech: Time to sum up @Londonfashionw... None None None 835117249098878976
9984 0 2017-02-24 13:20:42 New York City, New York 16227630 2008-09-10 20:19:13 48955 25073 631 ItsTheReal 16227630 ... 291 None None None <a href="http://twitter.com" rel="nofollow">Tw... RT @bodegaboxoffice: The AKAs are working @its... None None None 835117249128247298
9985 0 2017-02-24 13:20:42 New York City, New York 714695484552687616 2016-03-29 06:07:36 1967 211 181 VB 714695484552687616 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @CaseyNeistat: ok. noted. do NOT fly dron... None None None 835117249157599232
9986 0 2017-02-24 13:20:42 New York City, New York 1115957280 2013-01-24 04:30:39 11534 657 644 Burr 1115957280 ... 291 None None None <a href="http://twitter.com/download/iphone" r... If I ever told u I loved u I was lying see I f... None None None 835117249337905153
9987 0 2017-02-24 13:20:42 New York City, New York 42469038 2009-05-25 19:27:53 18587 622 307 Lean Lantern ™ 42469038 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @50cent: i might jerk off it depends on if ... None None None 835117249346293761
9988 0 2017-02-24 13:20:42 New York City, New York 129189199 2010-04-03 13:22:45 1419 135 249 Lisa Pierce 129189199 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @foxandfriends: A grim milestone! Chicago s... None None None 835117249375526912
9989 0 2017-02-24 13:20:42 New York City, New York 239533317 2011-01-17 21:06:17 12150 126 195 PDJ Barrett 239533317 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @JoyAnnReid: Um... https://t.co/3YyVMzOhFJ None None None 835117249547599872
9990 0 2017-02-24 13:20:42 New York City, New York 3191292081 2015-04-21 16:05:50 1391 81 145 raggedy man 3191292081 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @darebaevil: #ilovebeingmixed None None None 835117249576976384
9991 0 2017-02-24 13:20:42 New York City, New York 830657836619227136 2017-02-12 06:00:35 15 6 0 Ohmm 830657836619227136 ... 291 None None None <a href="http://twitter.com/download/android" ... RT @MSCYX61: อยากมีเด็กๆ... None None None 835117249585274881
9992 0 2017-02-24 13:20:42 New York City, New York 380604763 2011-09-26 22:52:13 94 777 1570 Djelloul Marbrook 380604763 ... 291 None None None <a href="http://www.facebook.com/twitter" rel=... Joe Biden Campaigns for Delaware Candidate in ... None None None 835117249656733696
9993 0 2017-02-24 13:20:42 New York City, New York 15722314 2008-08-04 14:01:37 64 268 1746 German Gonzalez 15722314 ... 291 None None None <a href="http://tapbots.com/tweetbot" rel="nof... RT @GMA: We remember Steve Jobs today. The App... None None None 835117250008985600
9994 0 2017-02-24 13:20:42 New York City, New York 88851336 2009-11-10 04:33:17 49 1298 1707 Paul Seong 88851336 ... 291 \0\0\0\0\0\0\0 Òo_vD@¡ø1æ®}RÀ 40.9221000000 -73.9638000000 <a href="http://instagram.com" rel="nofollow">... 오늘 날씨 정말 좋구나!! 추운 날이... None None None 835117250046738433
9995 0 2017-02-24 13:20:42 New York City, New York 1245166369 2013-03-06 03:18:51 31574 1207 911 Ben Zee 1245166369 ... 291 None None None <a href="http://twitter.com/download/iphone" r... RT @smoothietunes: ❤️❤️❤️ https://... None None None 835117250168356865
9996 0 2017-02-24 13:20:42 New York City, New York 240915410 2011-01-21 01:22:01 18159 2050 4166 Esienne Esien-oku 240915410 ... 291 None None None <a href="http://www.twitter.com" rel="nofollow... RT @JamesOKeefeIII: Really @bwreed of @RawStor... None None None 835117250336161793
9997 0 2017-02-24 13:20:42 New York City, New York 90747211 2009-11-17 23:24:59 10349 2258 1648 RN$ Papi Juice 90747211 ... 291 None None None <a href="http://twitter.com/download/iphone" r... Half a pint for breakfast None None None 835117250352975872
9998 0 2017-02-24 13:20:42 New York City, New York 45733772 2009-06-09 01:52:16 2398 345 439 Brady Darragh 45733772 ... 291 None None None <a href="http://twitter.com/download/iphone" r... Guy in front of me on the train is eating garl... None None None 835117250554302464
9999 0 2017-02-24 13:20:42 New York City, New York 270552079 2011-03-22 20:46:18 2188 723 1030 J.D. Bryant 270552079 ... 291 None None None <a href="http://twitter.com/#!/download/ipad" ... RT @anildash: RIP #SrinivasKuchibhotla, anothe... None None None 835117250738864128

10000 rows × 24 columns

A Couple Things to Think About

When dealing with timestamps, the timestamp itself is often too precise to extract anything meaningful. Therefore, we generally have to bin them into larger time buckets, say weeks, months or even years depending on the amount of data and the type of problem. That is where we find ourselves right now.

To start, we are going to practice using Postgres to creat columns of month and year so that we can do some aggregations on them.


In [4]:
# query database
statement = """
SELECT t.*,
       date_part('month',created_at) as month,
       date_part('year', created_at) as year
FROM twitter.job j, twitter.tweet t
WHERE j.description LIKE '%New York City%' AND j.job_id = t.job_id
LIMIT 1000;
"""

try:
    connect_str = "dbname='twitter' user='dsa_ro_user' host='dbase.dsa.missouri.edu'password='readonly'"
    # use our connection values to establish a connection
    conn = psycopg2.connect(connect_str)
    cursor = conn.cursor()
    cursor.execute(statement)
    
    column_names = [desc[0] for desc in cursor.description]
    rows = cursor.fetchall()
except Exception as e:
    print("Uh oh, can't connect. Invalid dbname, user or password?")
    print(e)
    
# create dictionary from the rows and column names   
new_york = {}
for i in list(range(len(column_names))):
     new_york['{}'.format(column_names[i])] = [x[i] for x in rows]

# turn dictionary into a data frame
pd.DataFrame(new_york)


Out[4]:
analysis_state created_at from_user from_user_created_at from_user_favorites from_user_followers from_user_following from_user_fullname from_user_id_str from_user_name ... location_geo_0 location_geo_1 month source text to_user to_user_id_str to_user_name tweet_id_str year
0 0 2017-02-24 13:20:41 1441848320 2013-05-19 18:04:10 483 347 610 Ryan Gerrity 1441848320 RyanGerrity ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... #QuoteOfTheDay "The best way to predict the fu... None None None 835117243973402624 2017.0
1 0 2017-02-24 13:20:41 1296351806 2013-03-24 19:24:13 2 1463 1449 garat-secretariat.fr 1296351806 JacquesGarat64 ... None None 2.0 <a href="http://linkis.com" rel="nofollow">Put... apporteur d-affaires 25 #Doubs https://t.co/M5... None None None 835117244472492032 2017.0
2 0 2017-02-24 13:20:41 480577804 2012-02-01 17:14:26 244 131 526 Nate Krumpos 480577804 N_Krumpos ... None None 2.0 <a href="http://www.tweetcaster.com" rel="nofo... RT @TimOBrien: Bannon, Trump and the chaos the... None None None 835117244606722048 2017.0
3 0 2017-02-24 13:20:41 1855363850 2013-09-11 20:19:41 13 1774 2310 Magesy-Pro 1855363850 magesy_pro ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... Vocal Hits WAV MiDi-DiSCOVER https://t.co/2CO... None None None 835117244606787584 2017.0
4 0 2017-02-24 13:20:41 1102207075 2013-01-18 22:38:59 620 463 852 Bilmediğini Bilen 1102207075 Bilen_us ... None None 2.0 <a href="http://twitter.com/download/android" ... RT @SelcukRSirin: Çıktı! https://t.co/J6dVl... None None None 835117245277814785 2017.0
5 0 2017-02-24 13:20:41 3346212609 2015-06-26 04:35:41 80 722 74 ロン毛と坊主とニューヨーク 3346212609 longebose ... None None 2.0 <a href="http://twitter.com/download/iphone" r... さすがに56万はアホすぎ\\n\\nhttps://... 3346212609 3346212609 longebose 835117245370097665 2017.0
6 0 2017-02-24 13:20:41 3261398631 2015-05-16 13:51:47 55156 3372 4988 Mark Houlsby 3261398631 HoulsbyMark ... None None 2.0 <a href="http://twitter.com/download/android" ... RT @NotifyNYC: AMBER Alert: Aylin Sofia Hernan... None None None 835117245667897344 2017.0
7 0 2017-02-24 13:20:41 465478461 2012-01-16 11:49:08 2816 91319 36640 Emprendedores 465478461 EmprenderGM ... None None 2.0 <a href="http://www.botize.com" rel="nofollow"... La nueva tendencia ‘job hopping’ de los jÃ... None None None 835117245915348993 2017.0
8 0 2017-02-24 13:20:41 714495646078779392 2016-03-28 16:53:31 551 148 242 Papi Jay 714495646078779392 jarrielis_ayala ... None None 2.0 <a href="http://twitter.com/download/iphone" r... me asf https://t.co/CNNjY9JyAK None None None 835117245927936000 2017.0
9 0 2017-02-24 13:20:41 823715410935586817 2017-01-24 02:13:52 0 821 4359 Concha Dorset 823715410935586817 concha_dorset ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... Chinese Farmer Makes Pigs Dive Into Water So &... None None None 835117245940408320 2017.0
10 0 2017-02-24 13:20:41 34156103 2009-04-22 02:54:40 1794 288 552 Wayne Allan Sage 34156103 Saggie27 ... None None 2.0 <a href="http://twitter.com/download/android" ... RT @GinsburgJobs: Prominent Chicago sportscast... None None None 835117245965688832 2017.0
11 0 2017-02-24 13:20:41 300534697 2011-05-17 22:51:58 4346 471 547 Lik 300534697 Malikg203 ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @JMoyston55: None None None 835117245969891328 2017.0
12 0 2017-02-24 13:20:41 830006982207795201 2017-02-10 10:54:20 0 47 485 Defund Dirty Banks 830006982207795201 BankDefund ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @joshjazztrumpet: “The closing of the cam... None None None 835117246078988290 2017.0
13 0 2017-02-24 13:20:41 98847314 2009-12-23 10:12:51 2052 1072 2252 action 98847314 littlexaction ... None None 2.0 <a href="https://mobile.twitter.com" rel="nofo... RT @KazuhiroSoda: 森友学園スキャンダã... None None None 835117246133329921 2017.0
14 0 2017-02-24 13:20:41 2416113875 2014-03-16 21:08:41 11574 521 464 mariah 2416113875 mtfackler ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @whosNikita: when you're in a great mood an... None None None 835117246271868928 2017.0
15 0 2017-02-24 13:20:41 31477932 2009-04-15 18:33:29 7919 1903 345 Joshua Guess 31477932 JoshuaGuess ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @pescami: I think you may be confusing free... None None None 835117246393503748 2017.0
16 0 2017-02-24 13:20:41 525343649 2012-03-15 12:52:51 1653 168 592 Kate 525343649 MadLock445 ... None None 2.0 <a href="http://twitter.com/download/android" ... RT @rosemcgowan: It's happening https://t.co/Z... None None None 835117246494175234 2017.0
17 0 2017-02-24 13:20:41 2216830081 2013-11-27 00:03:12 414 2256 571 2216830081 NaomiWolf_ ... None None 2.0 <a href="http://twitter.com/download/iphone" r... *Bankhead bouncing at my desk* None None None 835117246615781376 2017.0
18 0 2017-02-24 13:20:41 2468339574 2014-04-28 22:30:03 11932 812 1036 Caryn 2468339574 CarynScandlon ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @GossipGirltbh: take it from Blair https://... None None None 835117246632587265 2017.0
19 0 2017-02-24 13:20:42 3156922565 2015-04-11 16:09:27 1638 138 108 LC 3156922565 lissycxo ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @MysticxLipstick: I'll walk through the fir... None None None 835117246921977857 2017.0
20 0 2017-02-24 13:20:42 790460407953711104 2016-10-24 07:50:22 2235 85 193 Hugman_76 790460407953711104 Hugman_76 ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... @ThatMumboJumbo NOOOO YOU ARE DEAAAAD 542139063 542139063 ThatMumboJumbo 835117246963990529 2017.0
21 0 2017-02-24 13:20:42 723915836755304448 2016-04-23 16:46:00 38323 588 730 HesMyPresident 723915836755304448 GIAGM2013 ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @JamesOKeefeIII: EXPOSING THE MEDIA: Projec... None None None 835117246989000708 2017.0
22 0 2017-02-24 13:20:42 100703004 2009-12-31 06:37:51 344 811 359 I Said What I Said 100703004 Forever_DesiRay ... None None 2.0 <a href="http://www.echofon.com/" rel="nofollo... RT @BrianniT: People just don't become a deadb... None None None 835117247060377600 2017.0
23 0 2017-02-24 13:20:42 181747515 2010-08-22 23:43:46 150 585 357 LUVERDEMUSIC 181747515 LuVerdeMusic ... None None 2.0 <a href="http://www.luverdemusic.net/p/lvmradi... #SUENA: Don Miguelo - Te Vienes Conmigo En LVM... None None None 835117247446265856 2017.0
24 0 2017-02-24 13:20:42 159908535 2010-06-26 16:57:34 18572 1345 2413 Liberal 159908535 progressivehere ... None None 2.0 <a href="http://twitter.com/#!/download/ipad" ... RT @mmpadellan: PLEASE RETWEET!\\ntrump DIAGNO... None None None 835117247601471489 2017.0
25 0 2017-02-24 13:20:42 2362933736 2014-02-26 16:47:55 3264 172 1301 Best Friends of Pets 2362933736 FBOP_OCK ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @quotebxundnews: Awesome post about project... None None None 835117247651708928 2017.0
26 0 2017-02-24 13:20:42 514992082 2012-03-05 02:05:30 1621 9 67 compostela68 514992082 compostela68 ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @thelatintimes: Watch @jorgeramosnews deliv... None None None 835117247718965249 2017.0
27 0 2017-02-24 13:20:42 3351486346 2015-06-30 01:59:44 3323 66 162 prince royce fan 3351486346 prince_royce_si ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @PrinceRoyce: #PrinceRoyceFIVE on @AppleMus... None None None 835117247735660544 2017.0
28 0 2017-02-24 13:20:42 16630576 2008-10-07 14:45:33 2211 471 156 Alex Kang 16630576 kangalex ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @businessinsider: Bitcoin is hovering near ... None None None 835117247739932672 2017.0
29 0 2017-02-24 13:20:42 2398740702 2014-03-20 02:34:11 6726 1567 4936 sed 2398740702 Sed133 ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @KawaiiDesuBiach: Wild Thing, You Make My H... None None None 835117247844732928 2017.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
970 0 2017-02-24 14:23:58 281628141 2011-04-13 17:08:05 1107 215 155 Tom Creamz 281628141 FollowUr_Creamz ... None None 2.0 <a href="http://twitter.com/download/iphone" r... Facts the nast https://t.co/ZrWvJDwv6q None None None 835133168307027968 2017.0
971 0 2017-02-24 14:23:58 826833553291931650 2017-02-01 16:44:15 485 14 92 Amy Tucker 826833553291931650 zFxFBAwExgkkEVk ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @ThisIsRobThomas: song of the day: SOMEBODY... None None None 835133168449695744 2017.0
972 0 2017-02-24 14:23:58 33336965 2009-04-19 23:15:06 478 176 141 Andrea 33336965 PPQ_byAndrea ... None None 2.0 <a href="http://instagram.com" rel="nofollow">... happy friYAY None None None 835133168487436288 2017.0
973 0 2017-02-24 14:23:58 31169316 2009-04-14 17:09:10 2707 2194 404 Carmen Sandiego 31169316 TheGemJade ... None None 2.0 <a href="http://twitter.com/download/iphone" r... A vibe https://t.co/rsyisZC2hX None None None 835133168697106432 2017.0
974 0 2017-02-24 14:23:58 2942140428 2014-12-24 18:51:08 76989 13098 8268 A. Alico 2942140428 TheFavelakid ... None None 2.0 <a href="http://twitter.com/download/android" ... RT @TheFavelakid: https://t.co/iftYaKWH0L None None None 835133168726441984 2017.0
975 0 2017-02-24 14:23:58 2685228377 2014-07-08 01:14:07 46043 403 134 Nic 2685228377 queencheeeks ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @NYScanner: AMBER ALERT: 6 Y/o Connecticut ... None None None 835133168856535040 2017.0
976 0 2017-02-24 14:23:58 844577245 2012-09-25 00:37:01 3743 379 322 will vorra 844577245 william_joseph3 ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @Daniel_Ohana: #IStandUpToBulliesBy by voti... None None None 835133168923602945 2017.0
977 0 2017-02-24 14:23:58 2172483070 2013-11-07 19:09:27 8490 4498 2314 Lauren 2172483070 Amcboxer1 ... None None 2.0 <a href="http://twitter.com/download/iphone" r... @EPGCON @WI4TrumpPence Still on mine. Perhaps,... 2872541276 2872541276 EPGCON 835133168936235010 2017.0
978 0 2017-02-24 14:23:58 793631299059716096 2016-11-02 01:50:21 4030 248 625 JOY 793631299059716096 JOYMERRIT ... None None 2.0 <a href="http://twitter.com/download/android" ... RT @NYCMayor: We affirm the right of every New... None None None 835133168965529601 2017.0
979 0 2017-02-24 14:23:58 446635953 2011-12-26 00:07:29 761 94 92 Clement 446635953 thegreatgarrlic ... None None 2.0 <a href="http://twitter.com/download/iphone" r... Not only do I have a Friday clas now but it's ... None None None 835133168978124800 2017.0
980 0 2017-02-24 14:23:58 826167719900872704 2017-01-30 20:38:28 0 1 3 Hiroco 826167719900872704 uchumiko ... None None 2.0 <a href="http://twitter.com/download/iphone" r... Good luck sign on the taxi. #人生 #スピリ... None None None 835133169070444544 2017.0
981 0 2017-02-24 14:23:58 826224295756046337 2017-01-31 00:23:17 0 3 9 Andy 826224295756046337 scattyjack ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @bobby: there. https://t.co/N60cV2zlT8 None None None 835133169217191936 2017.0
982 0 2017-02-24 14:23:58 453297347 2012-01-02 19:44:46 2835 1123 871 T.I.M.L 453297347 moscatosundays1 ... None None 2.0 <a href="http://publicize.wp.com/" rel="nofoll... Rihanna Is Harvard University’s Humanitarian... None None None 835133169246552065 2017.0
983 0 2017-02-24 14:23:58 43050075 2009-05-28 04:51:22 62161 1823 2471 IZZYBK 43050075 Izzyizo ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @Izzyizo: S/O to my fav G's besides Malcol... None None None 835133169317855237 2017.0
984 0 2017-02-24 14:23:58 79162897 2009-10-02 11:59:47 1447 14869 10883 Cindy Vero 79162897 Cindy_Vero ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... Accident: Grand Central Pkwy EB - Approaching ... None None None 835133169338888196 2017.0
985 0 2017-02-24 14:23:58 796033080893050882 2016-11-08 16:54:10 4577 1902 1995 USS London 796033080893050882 openpodbaydoor_ ... None None 2.0 <a href="http://twitter.com/download/android" ... RT @frankiebee83: In October 2016 we were at w... None None None 835133169372446720 2017.0
986 0 2017-02-24 14:23:58 24655129 2009-03-16 05:24:53 6489 1612 722 Danni B 24655129 Muneca_Bella29 ... None None 2.0 <a href="http://twitter.com/download/iphone" r... Voy a reír, voy a bailar\\nvivir mi vida, la,... None None None 835133169582096384 2017.0
987 0 2017-02-24 14:23:58 40277679 2009-05-15 16:40:21 42 7498 1190 Eric Kleefeld 40277679 EricKleefeld ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... @TurgidsonBuck It's a literal translation. But... 3436756408 3436756408 TurgidsonBuck 835133169737338880 2017.0
988 0 2017-02-24 14:23:58 17027632 2008-10-28 20:06:03 10400 3600 3750 Sandi Bachom 17027632 sandibachom ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @nycjayjay: Liar. https://t.co/ZLPptMn67A None None None 835133170072834050 2017.0
989 0 2017-02-24 14:23:58 826761710115643392 2017-02-01 11:58:46 466 8 93 Nicola Welch 826761710115643392 Bfkt3gCXFsg0FIc ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @JohnScarce: So @FaZe_Censor invited me to ... None None None 835133170131562497 2017.0
990 0 2017-02-24 14:23:58 3017823159 2015-02-04 15:30:14 5695 792 152 BE-UNFRAID 3017823159 __UFH ... None None 2.0 <a href="http://twitter.com/download/iphone" r... Congress party celebrating #MahaShivaratri ,ut... None None None 835133170395774978 2017.0
991 0 2017-02-24 14:23:58 878439830 2012-10-13 18:42:37 181 534 80 Studd Muffins 878439830 studdmuffi ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @brendanspiegel: Still feels surreal that i... None None None 835133170513227778 2017.0
992 0 2017-02-24 14:23:58 53410616 2009-07-03 15:16:42 9183 502 659 Leslie Cashion 53410616 Ponydrivers ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @lsarsour: Trump came for Muslims, then for... None None None 835133170613911552 2017.0
993 0 2017-02-24 14:23:58 741052320989880320 2016-06-09 23:40:16 53506 3260 4481 emigre80 741052320989880320 emigre80 ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @armandodkos: "Trump's best week" guy accus... None None None 835133170781667328 2017.0
994 0 2017-02-24 14:23:58 4730166135 2016-01-07 02:05:01 2940 51 341 Jeanie L. Talton 4730166135 TaltonJeanie ... None None 2.0 <a href="http://twitter.com/download/android" ... RT @tripgabriel: Everyone I spoke to in this t... None None None 835133170811092996 2017.0
995 0 2017-02-24 14:23:58 81965252 2009-10-13 00:13:42 67974 14982 1893 Maya Kosoff 81965252 mekosoff ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @_grendan: Almost as if the administration.... None None None 835133170832007168 2017.0
996 0 2017-02-24 14:23:58 203301517 2010-10-16 00:00:03 67923 724 366 kc 203301517 kcardozaa ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @Complex: Calvin Harris teams up with Migos... None None None 835133170886520832 2017.0
997 0 2017-02-24 14:23:58 869028884 2012-10-09 01:48:08 28046 355 788 (25-3) 869028884 cincybercats ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @JonRothstein: For 30 minutes against Memph... None None None 835133170982998017 2017.0
998 0 2017-02-24 14:23:58 235965672 2011-01-09 13:58:56 16216 918 227 la mami chula⚠235965672 rusbelisr25 ... None None 2.0 <a href="http://twitter.com" rel="nofollow">Tw... RT @_fxckbabylia: Mamá solo dame el permiso y... None None None 835133170991390720 2017.0
999 0 2017-02-24 14:23:58 564612626 2012-04-27 13:37:33 1473 745 254 Barbara Giles 564612626 BegiiiGiles ... None None 2.0 <a href="http://twitter.com/download/iphone" r... RT @amjoyshow: .@GabbyGiffords to GOP members ... None None None 835133171050115072 2017.0

1000 rows × 25 columns

Now we can apply our counting of languages per month. Now that we have month and year columns, we just need to add that to our GROUP BY clause like so...


In [5]:
# query database
statement = """
SELECT DISTINCT iso_language,month,year , COUNT(*) FROM 
(SELECT t.*,
       date_part('month',created_at) as month,
       date_part('year', created_at) as year
FROM twitter.job j, twitter.tweet t
WHERE j.description LIKE '%New York City%' AND j.job_id = t.job_id
LIMIT 100000) AS new_york
GROUP BY iso_language ,month, year;
"""

try:
    connect_str = "dbname='twitter' user='dsa_ro_user' host='dbase.dsa.missouri.edu'password='readonly'"
    # use our connection values to establish a connection
    conn = psycopg2.connect(connect_str)
    cursor = conn.cursor()
    cursor.execute(statement)
    
    column_names = [desc[0] for desc in cursor.description]
    rows = cursor.fetchall()
except Exception as e:
    print("Uh oh, can't connect. Invalid dbname, user or password?")
    print(e)
    
# create dictionary from the rows and column names   
new_york = {}
for i in list(range(len(column_names))):
     new_york['{}'.format(column_names[i])] = [x[i] for x in rows]

# turn dictionary into a data frame
pd.DataFrame(new_york)


Out[5]:
count iso_language month year
0 5 hu 2.0 2017.0
1 26 cy 2.0 2017.0
2 4177 und 2.0 2017.0
3 109 de 2.0 2017.0
4 5 is 2.0 2017.0
5 11 vi 2.0 2017.0
6 29 uk 2.0 2017.0
7 30 zh 2.0 2017.0
8 77 pl 2.0 2017.0
9 451 ja 2.0 2017.0
10 3 ur 2.0 2017.0
11 128 tr 2.0 2017.0
12 23 fi 2.0 2017.0
13 1 sr 2.0 2017.0
14 6 lv 2.0 2017.0
15 15 hi 2.0 2017.0
16 32 no 2.0 2017.0
17 30 sv 2.0 2017.0
18 304 fr 2.0 2017.0
19 1895 es 2.0 2017.0
20 162 ko 2.0 2017.0
21 32 da 2.0 2017.0
22 1 iw 2.0 2017.0
23 61 et 2.0 2017.0
24 9 ne 2.0 2017.0
25 229 in 2.0 2017.0
26 115 it 2.0 2017.0
27 10 ta 2.0 2017.0
28 80 ru 2.0 2017.0
29 77 nl 2.0 2017.0
30 994 pt 2.0 2017.0
31 425 tl 2.0 2017.0
32 6 sl 2.0 2017.0
33 5 el 2.0 2017.0
34 472 ar 2.0 2017.0
35 1 bg 2.0 2017.0
36 89487 en 2.0 2017.0
37 13 eu 2.0 2017.0
38 6 lt 2.0 2017.0
39 202 th 2.0 2017.0
40 152 ht 2.0 2017.0
41 1 ka 2.0 2017.0
42 29 fa 2.0 2017.0
43 27 cs 2.0 2017.0
44 43 ro 2.0 2017.0
45 4 bn 2.0 2017.0

Well, it looks like limiting by 100,000 rows only gives a single month. That's not that interesting. What if we decrease the scope of time a little bit? Let's say by week of the year.

YOUR TURN

Count the number of languages in New York City per week of the year. Turn that into a data frame and call it week_ny. If you need some documentation on how to get the week from a timestamp field, look here (https://www.postgresql.org/docs/8.0/static/functions-datetime.html).


In [6]:
# put your code here
# ------------------

# query database
statement = """
SELECT DISTINCT iso_language,week,year , COUNT(*) FROM 
(SELECT t.*,
       date_part('week',created_at) as week,
       date_part('year', created_at) as year
FROM twitter.job j, twitter.tweet t
WHERE j.description LIKE '%New York City%' AND j.job_id = t.job_id
LIMIT 1000000) AS new_york
GROUP BY iso_language ,week, year;
"""

try:
    connect_str = "dbname='twitter' user='dsa_ro_user' host='dbase.dsa.missouri.edu'password='readonly'"
    # use our connection values to establish a connection
    conn = psycopg2.connect(connect_str)
    cursor = conn.cursor()
    cursor.execute(statement)
    
    column_names = [desc[0] for desc in cursor.description]
    rows = cursor.fetchall()
except Exception as e:
    print("Uh oh, can't connect. Invalid dbname, user or password?")
    print(e)
    
# create dictionary from the rows and column names   
new_york = {}
for i in list(range(len(column_names))):
     new_york['{}'.format(column_names[i])] = [x[i] for x in rows]

# turn dictionary into a data frame
week_ny = pd.DataFrame(new_york)

YOUR TURN

From the week_ny data frame that you created above, now find the shannon index for each week.


In [7]:
# put your code here
# ------------------


week_ny['count'].groupby(week_ny['week']).apply(shannon)


Out[7]:
week
8.0     0.933881
9.0     0.896749
10.0    0.942359
11.0    1.030654
Name: count, dtype: float64

Even weeks are rather few. So let's take a look at days. Keep in mind that this next query could take a few minutes.


In [8]:
# query database
statement = """
SELECT DISTINCT iso_language,day,month,year , COUNT(*) FROM 
(SELECT t.*,
       date_part('day',created_at) as day,
       date_part('month', created_at) as month,
       date_part('year', created_at) as year
FROM twitter.job j, twitter.tweet t
WHERE j.description LIKE '%New York City%' AND j.job_id = t.job_id
LIMIT 1000000) AS new_york
GROUP BY iso_language ,day,month, year;
"""

try:
    connect_str = "dbname='twitter' user='dsa_ro_user' host='dbase.dsa.missouri.edu'password='readonly'"
    # use our connection values to establish a connection
    conn = psycopg2.connect(connect_str)
    cursor = conn.cursor()
    cursor.execute(statement)
    
    column_names = [desc[0] for desc in cursor.description]
    rows = cursor.fetchall()
except Exception as e:
    print("Uh oh, can't connect. Invalid dbname, user or password?")
    print(e)
    
# create dictionary from the rows and column names   
new_york = {}
for i in list(range(len(column_names))):
     new_york['{}'.format(column_names[i])] = [x[i] for x in rows]

# turn dictionary into a data frame
day_ny = pd.DataFrame(new_york)

We can use the head method to see what this data frame looks like.


In [9]:
day_ny.head()


Out[9]:
count day iso_language month year
0 45 13.0 ru 3.0 2017.0
1 630 28.0 ja 3.0 2017.0
2 5 16.0 el 3.0 2017.0
3 34711 16.0 en 3.0 2017.0
4 14 26.0 fa 3.0 2017.0

And now we can find shannon for each day...


In [10]:
date_ny = day_ny.groupby(['day','month','year'])['count'].apply(shannon).reset_index()

We also want the day, month, and year columns to be one date column. We can do that by using the to_datetime method and specify the columns that contribute to the date. We will call this new column date.


In [11]:
date_ny['date'] = pd.to_datetime(date_ny.year*10000+date_ny.month*100+date_ny.day,format='%Y%m%d')
# nicer column name
date_ny['shannon'] = date_ny['count']

Let's glimpse at what this gave us...


In [16]:
date_ny.head()


Out[16]:
day month year count date shannon
0 13.0 3.0 2017.0 0.864325 2017-03-13 0.864325
1 14.0 3.0 2017.0 0.977634 2017-03-14 0.977634
2 15.0 3.0 2017.0 0.876521 2017-03-15 0.876521
3 16.0 3.0 2017.0 0.882064 2017-03-16 0.882064
4 17.0 3.0 2017.0 0.915172 2017-03-17 0.915172

AND FINALLY...

...we want to plot this relationship between date and shannon.


In [13]:
%matplotlib inline

#import matplotlib
#import numpy as np
#import matplotlib.pyplot as plt

pandas actually has matplotlib built in so that we can plot relationships. In this case, the date is going to be the x-axis and shannon will be the y-axis. pandas likes the x-axis to be the index of the data frame, so we first want to subset the data to be only the columns we want to plot, and then set the index to date. After that, we just call the plot method like so...


In [17]:
date_ny[['date','shannon']].set_index('date').plot()


Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f6dea4f5dd8>

YOUR TURN

Now, do the same for Columbia, MO. Be sure to find the day, month and year and to count the languages based on day. Finally plot your results after calculating the shannon index per day.


In [ ]:
# put your code here
# ------------------