Aaron Gonzales 2015-03-24
We are storing all of the features in a mongo database on Andres' machine. This is a simple notebook that shows you how to use that database to query for your features.
In each cell, you'll see a series of python commands. I recommend that you use ipython notebook or something similar to develop your stuff and then port it into a portable module. Please keep everything backed up on the repo.
First up, ensure you have pymongo installed on your machine and mongo version 3.0 or newer as well.
one of the following will take care of pymongo, depending on if you are using python3 or not (YOU SHOULD BE USING PYTHON3).
sudo python3 -m pip install pymongo
sudo pip install pymongo
In [1]:
# imports pymongo
import pymongo as pm
The following cell authenticates to the db and obtains the 'samples' collection. A collection is roughly analogous to a table in a SQL database. The 'samples' collection will be our main collection.
In [2]:
db_address = "afruizc-office.cs.unm.edu"
username = 'populator'
password = 'malware_challenge'
mg = pm.MongoClient(db_address)
if not mg.malware.authenticate(username, password):
sys.stderr.write("Authentication error. Terminating...")
sys.stderr.flush()
# Obtain the collection
samples = mg.malware.samples
Hurray! you have your first socket to the database set up. After that, you should run
In [4]:
samples.find_one()
Out[4]:
In [9]:
for post in samples.find({'class':'5'}):
print(post['id'] + ',' + post['class'])
to create a list of samples, you can use the following idiom (list comprehension)
In [13]:
names = [post['id'] for post in samples.find({'class':'5'})]
In [16]:
names
Out[16]:
Just make sure that when you insert a field into a document, you have to reference the doument's name to do so.
In [51]:
k_counts = []
for name in names:
count = 0
for i, c in enumerate(list(name)):
if c == 'k':
count += 1
k_counts.append(count)
In [58]:
[line for line in zip(names, k_counts)]
Out[58]:
now let's put these back in the database:
In [60]:
for name, kcount in zip(names, k_counts):
samples.update({'id': name},
{"$set": {"k_count": kcount}})
now let's print those out again and see our handiwork:
In [61]:
tmp = [post for post in samples.find({'class':'5'})]
In [62]:
tmp
Out[62]:
Have fun! let me know if you need help or anything. If you want to run these commands for yourself, please be sure to run the following to clear out the 'k_counts' field from the database. If you need more references, please consult this post or the pymongo tutorial.
In [70]:
for name in names:
samples.update({ 'id': name },
{ "$unset" : { 'k_count' : 1 } })
In [ ]: