Getting Started with ThreatExchange Sharing

Purpose

The ThreatExchange APIs are designed to make the sharing of indicators, and the connections between them, simple. Additionally, the APIs provide flexible options for deciding whom you share with: yourself, individual members, groups, and everyone!

What you need

Before getting started, you'll need a few things installed and some data.

  • Pytx for ThreatExchange access
  • Pandas for data manipulation and analysis
  • A CSV file with data suitable for sharing

All of the python packages mentioned below can easily be installed via

pip install <package_name>

Setup a ThreatExchange access_token

If you don't already have an access_token for your app, use the Facebook Access Token Tool to get one.


In [ ]:
from pytx.access_token import access_token

# Specify the location of your token via one of several ways:
# https://pytx.readthedocs.org/en/latest/pytx.access_token.html
access_token()

Optionally, enable debug level logging


In [ ]:
from pytx.logger import setup_logger

# Uncomment this, if you want debug logging enabled
# setup_logger(log_file="pytx.log")

Configure Privacy Settings

This will configure the API defaults for when you share data. There are multiple levels of privacy to choose from.

The code below will publish data to a whitelist that only your appID can see, for convenient testing.


In [ ]:
from pytx.access_token import get_app_id
from pytx.vocabulary import PrivacyType as pt

# Choose the privacy level from 
# https://pytx.readthedocs.org/en/latest/pytx.vocabulary.html#pytx.vocabulary.PrivacyType
privacy_type = pt.HAS_WHITELIST 

# Populate this with strings of app IDs or privacy groups.  If using pt.VISIBLE, set to None
privacy_members=[str(get_app_id())] # Will also take other member or privacy group IDs as strings

Define default fields for sharing

Sometimes, your CSV data is a raw list of IPs or domains. Use this map to set default fields on the descriptors that are created. Don't worry though, if your data does have any of the defaults you've defined, we won't clobber it.

In this example, our defaults are set for sharing manually curated data of malicious IP addresses from a botnet.


In [ ]:
from pytx.vocabulary import Attack as a
from pytx.vocabulary import ReviewStatus as rs
from pytx.vocabulary import Severity as s
from pytx.vocabulary import ShareLevel as sl
from pytx.vocabulary import Status as st
from pytx.vocabulary import ThreatDescriptor as td
from pytx.vocabulary import ThreatType as tt
from pytx.vocabulary import Types as t

# See: https://pytx.readthedocs.org/en/latest/pytx.vocabulary.html#pytx.vocabulary.ThreatDescriptor
default_fields = {
    #td.ATTACK_TYPE: a.MALWARE, # TODO uncomment when PR #120 gets added to Pytx in pip
    td.CONFIDENCE: 75,
    #td.EXPIRED_ON: '2016-02-25 00:00:00+0000',
    td.PRIVACY_TYPE: privacy_type,
    td.REVIEW_STATUS: rs.REVIEWED_MANUALLY,
    td.SHARE_LEVEL: sl.AMBER,
    td.SEVERITY: s.SEVERE,
    td.STATUS: st.MALICIOUS,
    td.THREAT_TYPE: tt.MALICIOUS_IP,
    td.TYPE: t.IP_ADDRESS,
    td.DESCRIPTION: '[example][tags] Test description'
}

# Add in privacy members, as needed
if privacy_members is not None:
    default_fields[td.PRIVACY_MEMBERS] = ','.join(privacy_members)

Share data from a file

Grabs the data from a local CSV file and publishes it to ThreatExchange. We interpret the columns in the data according to Pytx's Vocabulary

At a minimum, your CSV file should have one column, named indicator.


In [ ]:
import csv
import pytx.errors
from pytx import ThreatDescriptor

# The file to upload
file = 'test_share.csv'

# Load the CSV and serially publish it
ind_count = 0
fail_count = 0
with open(file, 'rb') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',', quotechar='"')
    for row in reader:
        try:
            fields = default_fields.copy()
            fields.update(row)
            result = ThreatDescriptor.new(params=fields)
        except Exception, e:
            print 'Unable to upload' + row['indicator'] + 'due to ' + result['message'] + "\n"
            fail_count = fail_count + 1
        else:
            ind_count = ind_count + 1
print "Done publishing %d indicators with %d failures!" % (ind_count, fail_count)

Confirm your data was shared

Now, we do a quick search to confirm the data was published correctly to ThreatExchange.


In [ ]:
from datetime import datetime, timedelta
from time import strftime
import pandas as pd
from pytx import ThreatDescriptor
from pytx.vocabulary import ThreatExchange as te

# Define your search string and other params, see 
# https://pytx.readthedocs.org/en/latest/pytx.common.html#pytx.common.Common.objects
# for the full list of options
results = ThreatDescriptor.objects(
    fields=ThreatDescriptor._default_fields,
    limit=search_params[te.LIMIT],
    owner=str(get_app_id()),
    since=strftime('%Y-%m-%d %H:%m:%S +0000', (datetime.utcnow() + timedelta(hours=(-1))).timetuple()), 
    until=strftime('%Y-%m-%d %H:%m:%S +0000', datetime.utcnow().timetuple())
)

data_frame = pd.DataFrame([result.to_dict() for result in results])
data_frame.head(n=10)

Excellent, we've shared data!

Now that we've walked through a simple example, try out the following exercises:

  • Share a list of malicious URLs with multiple members
  • Share a list of malicious domain names with a privacy group

In [ ]:
# Put your Python code here!