Mikhail Kolodin. Project: Internet temperature. 2015-12-15 1.4.1
IPython research for internet temperature. We use now only fontanka.ru website, later other sites and methods will be added.
Version with database recording. Now full archive of headers since 2000.
Here we count good and bad words in the database. No more downloading info from websites.
In [16]:
import datetime
now = datetime.datetime.now()
import time
import sqlite3
Part I. Get database with data and correct it.
In [17]:
#db = "mp-nettemp3-fru-2015.db"
db = "mp-nettemp3-fru-2000-2015.db"
conn = sqlite3.connect(db)
cur = conn.cursor()
In [18]:
conn.execute ("alter table netdata add dtyear int")
Out[18]:
In [19]:
cur.execute ("select count(*) from netdata")
print ("total records: {}" .format(cur.fetchone()))
In [20]:
rc = cur.execute ("select distinct substr(ndate, 1, 10) from netdata")
cnt = 0
for r in rc: cnt += 1
print ("We have data for {} days" .format(cnt))
In [21]:
cur.execute ("update netdata set dtyear = substr(ndate, 1, 10)")
conn.commit()
Part II. get good and bad words and strore them locally.
In [22]:
goods, bads = "words-good.txt", "words-bad.txt"
In [23]:
with open(goods) as good:
goodw = good.read().split()
goodw.sort()
goodw = tuple(goodw)
In [24]:
print ("Good words:", goodw)
In [25]:
with open(bads) as bad:
badw = bad.read().split()
badw.sort()
badw = tuple(badw)
In [26]:
print ("Bad words:", badw)
Part III. Process add data in database, set wpos, wneg, mark as counters for good and bad words in each record.
In [27]:
cur.execute ("select *, rowid from netdata")
Out[27]:
In [28]:
toshow = 10
shown = 0
for row in cur.fetchall():
header = row[3].lower()
dthere = row[1]
cpos = cneg = 0
for w in goodw:
if w in header:
cpos += 1
for w in badw:
if w in header:
cneg += 1
mark = cpos - cneg
rid = row[-1]
cur.execute ("update netdata set wpos=?, wneg=?, mark=? where rowid=?", (cpos, cneg, mark, rid))
if shown < toshow:
# print ("update: rowid={5}, dt={4}, header={0}, wpos={1}, wneg={2}, mark={3}" .format(header, cpos, cneg, mark, dthere, rid))
shown += 1
In [29]:
conn.commit()
In [30]:
conn.close()
In [ ]: