AWS CloudWatch VPC Flow Logs <> Graphistry

Analyze cloudwatch logs, such as using vpc flow to map an account, with Graphistry

This example directly uses the AWS CLI for cloudwatch API access. You can also work from S3 or systems like Athena.

Installs & Configure

Set aws_access_key_id, aws_secret_access_key, key or pull from your env



In [0]:

    
!pip install graphistry -q
!pip install awscli -q



In [0]:

    
!aws configure set region us-west-2
!aws configure set aws_access_key_id "FILL_ME_IN"
!aws configure set aws_secret_access_key "FILL_ME_IN"



In [0]:

    
import pandas as pd
import json
import graphistry
#graphistry.register(key='FILL_ME_IN', server='FILL_ME_IN')

Record logs

If you do not already have logs, you can record VPC flow logs from your EC2 console:

Services -> EC2 -> Network Interfaces -> select interface(s) -> Action -> create flow log
- Send to cloudwatch; use default settings for IAM and elsewhere
When enough data available, stop logging

Download & summarize logs

Pick a log group from available
Fetch: See AWS docs on filter-log-events
Load into a dataframe
Compute summary stats



In [113]:

    
!aws logs describe-log-groups









    



{
    "logGroups": [
        {
            "logGroupName": "/aws/lambda/ami-test-AZInfoFunction-1V3BW2PT09ER2",
            "creationTime": 1534508995180,
            "metricFilterCount": 0,
            "arn": "arn:aws:logs:us-west-2:520859498379:log-group:/aws/lambda/ami-test-AZInfoFunction-1V3BW2PT09ER2:*",
            "storedBytes": 1615
        },
        {
            "logGroupName": "VPCFlowDemo",
            "creationTime": 1556422724248,
            "metricFilterCount": 0,
            "arn": "arn:aws:logs:us-west-2:520859498379:log-group:VPCFlowDemo:*",
            "storedBytes": 0
        }
    ]
}



In [40]:

    
!aws logs filter-log-events --log-group-name VPCFlowDemo > data.json
!ls -al data.json









    



-rw-r--r-- 1 root root 3761828 Apr 28 20:43 data.json



In [108]:

    
with open('data.json', 'r') as f:
    data = json.load(f)
df = pd.DataFrame([x['message'].split(" ") for x in data['events']])
df.columns = cols = ['version', 'accountid', 'interfaceid', 'src_ip', 'dest_ip', 'src_port', 'dest_port', 'protocol', 'packets', 'bytes', 'time_start', 'time_end', 'action', 'status']

print('# rows', len(df))
df.sample(3)









    



# rows 9671






    Out[108]:







  
    
      
      version
      accountid
      interfaceid
      src_ip
      dest_ip
      src_port
      dest_port
      protocol
      packets
      bytes
      time_start
      time_end
      action
      status
    
  
  
    
      3748
      2
      520859498379
      eni-03cefc09700cd0f3b
      172.31.18.239
      35.188.230.101
      443
      44448
      6
      8
      3922
      1556422848
      1556422903
      ACCEPT
      OK
    
    
      6289
      2
      520859498379
      eni-08275497a357fd66a
      172.20.45.114
      172.20.59.137
      31161
      22186
      6
      2
      112
      1556423050
      1556423110
      ACCEPT
      OK
    
    
      1396
      2
      520859498379
      eni-092275301fc5694d9
      172.20.60.118
      172.20.55.224
      80
      33936
      6
      2
      112
      1556422660
      1556422718
      ACCEPT
      OK



In [114]:

    
# Int->Float for precision errors
df2 = df.copy()
for c in ['packets', 'bytes']:
    df2[c] = df2[c].astype(float)

summary_df = df2\
    .groupby(['src_ip', 'dest_ip', 'interfaceid', 'dest_port', 'protocol', 'action', 'status'])\
    .agg({
        'time_start': ['min', 'max'],
        'time_end': ['min', 'max'],
        'packets': ['min', 'max', 'sum', 'count'],
        'bytes': ['min', 'max', 'sum', 'count']
    }).reset_index()
summary_df.columns = [(" ".join(x)).strip().replace(" ", "_") for x in list(summary_df.columns)]
print('# rows', len(summary_df))
summary_df.sample(3)









    



# rows 5049






    Out[114]:







  
    
      
      src_ip
      dest_ip
      interfaceid
      dest_port
      protocol
      action
      status
      time_start_min
      time_start_max
      time_end_min
      time_end_max
      packets_min
      packets_max
      packets_sum
      packets_count
      bytes_min
      bytes_max
      bytes_sum
      bytes_count
    
  
  
    
      3107
      172.20.55.224
      172.20.61.101
      eni-016babb4349103670
      38076
      6
      ACCEPT
      OK
      1556422627
      1556422627
      1556422686
      1556422686
      2.0
      2.0
      2.0
      1
      112.0
      112.0
      112.0
      1
    
    
      1356
      172.20.45.114
      172.20.41.131
      eni-08275497a357fd66a
      3240
      6
      ACCEPT
      OK
      1556422990
      1556422990
      1556423050
      1556423050
      2.0
      2.0
      2.0
      1
      112.0
      112.0
      112.0
      1
    
    
      4311
      172.20.60.118
      172.20.59.137
      eni-092275301fc5694d9
      8842
      6
      ACCEPT
      OK
      1556422660
      1556422660
      1556422718
      1556422718
      2.0
      2.0
      2.0
      1
      112.0
      112.0
      112.0
      1

Plot



In [110]:

    
hg = graphistry.hypergraph(
    summary_df,
    entity_types=['src_ip', 'dest_ip'], #'dest_port', 'interfaceid', 'action', ...
    direct=True)
hg['graph'].bind(edge_title='bytes_sum').plot()









    



# links 5049
# events 5049
# attrib entities 255






    Out[110]:



In [0]:

	version	accountid	interfaceid	src_ip	dest_ip	src_port	dest_port	protocol	packets	bytes	time_start	time_end	action	status
3748	2	520859498379	eni-03cefc09700cd0f3b	172.31.18.239	35.188.230.101	443	44448	6	8	3922	1556422848	1556422903	ACCEPT	OK
6289	2	520859498379	eni-08275497a357fd66a	172.20.45.114	172.20.59.137	31161	22186	6	2	112	1556423050	1556423110	ACCEPT	OK
1396	2	520859498379	eni-092275301fc5694d9	172.20.60.118	172.20.55.224	80	33936	6	2	112	1556422660	1556422718	ACCEPT	OK

	src_ip	dest_ip	interfaceid	dest_port	protocol	action	status	time_start_min	time_start_max	time_end_min	time_end_max	packets_min	packets_max	packets_sum	packets_count	bytes_min	bytes_max	bytes_sum	bytes_count
3107	172.20.55.224	172.20.61.101	eni-016babb4349103670	38076	6	ACCEPT	OK	1556422627	1556422627	1556422686	1556422686	2.0	2.0	2.0	1	112.0	112.0	112.0	1
1356	172.20.45.114	172.20.41.131	eni-08275497a357fd66a	3240	6	ACCEPT	OK	1556422990	1556422990	1556423050	1556423050	2.0	2.0	2.0	1	112.0	112.0	112.0	1
4311	172.20.60.118	172.20.59.137	eni-092275301fc5694d9	8842	6	ACCEPT	OK	1556422660	1556422660	1556422718	1556422718	2.0	2.0	2.0	1	112.0	112.0	112.0	1