The goal of this notebook is to uncover several constructions inside the dataset which may help us to uncover fraud. With that, we can see whether we can create features related to connections inside the network. In this network analysis, we see if we can use the connections an account has in order to predict behavior. We will build a graph with the following properties:
Once we have this, we can see if we can use different graph properties to see whether or not an account is suspicious of unwanted behaviour.
In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
In [3]:
client_info = pd.read_csv('data/client_info.csv')
demographic_info = pd.read_csv('data/demographic_data.csv')
transaction_info = pd.read_csv('data/transction_info.csv')
order_info = pd.read_csv('data/order_info.csv')
In [4]:
transaction_info['bank_account'] = transaction_info['bank'] + transaction_info['account'].map(str)
In [5]:
G = nx.Graph()
G.add_nodes_from(transaction_info['account_id'].unique().tolist())
G.add_nodes_from(transaction_info['bank_account'][transaction_info['bank_account'].notnull()].unique())
In [6]:
nonEmpty = transaction_info[transaction_info['bank_account'].notnull()]
edges = zip(nonEmpty['account_id'],nonEmpty['bank_account'], nonEmpty['amount'])
G.add_weighted_edges_from(edges)
In [7]:
giant = max(nx.connected_component_subgraphs(G), key=len)
nx.draw_circular(giant)
In [8]:
Gdi = nx.DiGraph()
Gdi.add_nodes_from(giant.nodes())
Gdi.add_edges_from(giant.edges())
nx.draw_circular(Gdi)
Unfortunatly, we see that our datastet has no internal transactions (or we do not have the required mapping). Therefore a network analysis will not help us any further.
A different dataset which does have this mapping is hard to find. Internally, banks do have these but due to prviacy constraints these are never made pulic. We could also use Bitcoin transactions but a crypto currency won't provide us with any other features. Hence, we will abandon the network analysis for now.