Silk Road Bitcoin Embezzling Visualization


In [1]:
import graphistry
import pandas as pd
import numpy as np
#graphistry.register(key='MY_API_KEY', server='labs.graphistry.com') #https://www.graphistry.com/api-request

Data: Load transactions CSV with Pandas


In [2]:
transactions = pd.read_csv('../../data/transactions.csv')
transactions['Date'] = pd.to_datetime(transactions['Date'],unit='ms') #coerce date format
print('# transactions', len(transactions))
transactions[:3]


('# transactions', 45117)
Out[2]:
Amount $ Date Destination Source Transaction ID isTainted
0 3223.9752 2013-11-23 20:53:20 84a0b53e1ac008b8dd0fd6212d4b7fa2... 2dd13954e18508bb8b3a41d96a022be9... b6eb8ba20df31fa74fbe7755f58c18f82a599d6bb5fa79... 0
1 3708.0216 2014-05-31 01:33:20 3b62a891b99969042d4e6ac8158d0a18... 7c74d3afb41e536e26948a1d2455a7c7... 60df3c67063e136a0c9715edcd12ae717e6f9ed492afe2... 0
2 2.4800 2014-04-27 00:53:20 3b62a891b99969042d4e6ac8158d0a18... 50dced19b8ee41114916bf3ca894f455... a6aafd3d85600844536b8a5f2c255686c33dc4969e68a4... 0

Visualization 1: Quick Visualization & Analysis

Task: Spot the embezzling

  1. Use the histogram tool to filter for only tainted transactions
  2. Turn on the Setting "Prune Isolated Nodes" to hide wallets with no remaining transactions
  3. Use the filters or excludes tool to only show transactions over 1000 or 1000.
  4. Verify that money flowed from Ross Ulbricht to Carl Force, and explore where else it flowed.

In [23]:
g = graphistry.edges(transactions).bind(source='Source', destination='Destination')

In [24]:
g.plot()


Out[24]:

Visualization 2: Summarizing Wallets

Compute how much wallets received


In [25]:
wallet_in = transactions\
  .groupby('Destination')\
  .agg({'isTainted': lambda x: 1 if x.sum() > 0 else 0, 'Amount $': np.sum})\
  .reset_index().rename(columns={'Destination': 'wallet', 'isTainted': 'isTaintedWallet'})

#not all wallets received money, tag these
wallet_in['Receivables'] = True

wallet_in[:3]


Out[25]:
wallet Amount $ isTaintedWallet Receivables
0 0002b3efbc3e742ee4cfaad18d8cf221... 41118.416840 0 True
1 0005e0fbac078e609bbc3239d3302ff7... 5577.768000 1 True
2 000b3df00e3ff9b7705452071c9e4e87... 11161.133824 0 True

Compute how much wallets sent


In [26]:
wallet_out = transactions\
  .groupby('Source')\
  .agg({'isTainted': np.sum, 'Amount $': np.max})\
  .reset_index().rename(columns={'Source': 'wallet', 'isTainted': 'isTaintedWallet'})

#not all wallets received money, tag these
wallet_out['Payables'] = True

wallet_out[:3]


Out[26]:
wallet Amount $ isTaintedWallet Payables
0 0005e0fbac078e609bbc3239d3302ff7... 6197.520000 0 True
1 000b3df00e3ff9b7705452071c9e4e87... 857.923098 0 True
2 0012742095ed1c2ceb334b2a5403da7d... 3472.000000 0 True

Join data


In [27]:
wallets = pd.merge(wallet_in, wallet_out, how='outer')
wallets['Receivables'] = wallets['Receivables'].fillna(False)
wallets['Payables'] = wallets['Payables'].fillna(False)
print('# Wallets only sent or only received', len(wallet_in) + len(wallet_out) - len(wallets))
wallets[:3]


('# Wallets only sent or only received', 875)
Out[27]:
wallet Amount $ isTaintedWallet Receivables Payables
0 0002b3efbc3e742ee4cfaad18d8cf221... 41118.416840 0 True False
1 0005e0fbac078e609bbc3239d3302ff7... 5577.768000 1 True False
2 000b3df00e3ff9b7705452071c9e4e87... 11161.133824 0 True False

Plot

Bind color to whether tainted


In [28]:
g.nodes(wallets).bind(node='wallet', point_color='isTaintedWallet').plot()


Out[28]:

In [ ]: