Function Entropy for "notepad.exe"

The purpose of this example is to showcase how you can use the python scientifc computing tools with IDA Pro. In order to complete this demo you will need to have numpy, pandas, scipy and matplotlib installed. Once installed launch IDA with a notepad.exe database loaded (actually doesn't have to be notepad.exe but it's the one used in this example).


In [1]:
%matplotlib inline
import scipy.stats
import idc
import idaapi
import idautils
import numpy as np
import pandas as pd
import pylab
#Better looking Graphs..
pd.options.display.mpl_style = 'default' 
pylab.rcParams['figure.figsize'] = 12.0, 8.0
#Binary Info
print "MD5: {} Binary: {}".format(idc.GetInputMD5(), idc.GetInputFile())


-----------------------------------------------------------------------------------------------------
Python 2.7.5 |Anaconda 2.1.0 (32-bit)| (default, May 31 2013, 10:43:53) [MSC v.1500 32 bit (Intel)] 
IDAPython v1.7.0 final (serial 0) (c) The IDAPython Team <idapython@googlegroups.com>
-----------------------------------------------------------------------------------------------------
MD5: E30299799C4ECE3B53F4A7B8897A35B6 Binary: notepad.exe

In [2]:
def entropy(in_bytes):
    bytes = np.array(np.fromstring(in_bytes,dtype='uint8'), dtype='int32')
    return scipy.stats.entropy(bytes[np.nonzero(bytes)])

In [3]:
def get_func_bytes(func_ea):
    bytes = ""
    for start, end in idautils.Chunks(func_ea):
        bytes += idaapi.get_many_bytes(start, end - start)
    return bytes
    
func_start = idc.GetFunctionAttr(idc.ScreenEA(), idc.FUNCATTR_START)

In [4]:
data = ((func_ea, entropy(get_func_bytes(func_ea))) for func_ea in idautils.Functions())
func_df = pd.DataFrame(data, columns=["EA", "Entropy"])

In [5]:
func_df['Formatted_EA'] = func_df['EA'].map(lambda ea: "{:X}".format(ea))

In [6]:
df_plt = func_df
ax = df_plt.plot(kind='scatter', x='EA', y='Entropy')
ax.set_xticklabels(['{:X}'.format(int(ea)) for ea in ax.get_xticks()])
ax


Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0xdb5a030>

In [7]:
func_df.sort(['Entropy'], ascending=False)


Out[7]:
EA Entropy Formatted_EA
50 4223618 10.309925 407282
9 4199506 9.205673 401452
114 4279164 7.074449 414B7C
11 4202542 6.874658 40202E
117 4281006 6.558433 4152AE
34 4220529 6.495139 406671
13 4204565 6.365993 402815
101 4275385 6.263491 413CB9
108 4277295 6.136212 41442F
93 4274148 6.068605 4137E4
31 4220082 5.946059 4064B2
109 4277951 5.812822 4146BF
1 4198624 5.805990 4010E0
107 4276843 5.769901 41426B
84 4272057 5.663713 412FB9
123 4282943 5.635076 415A3F
90 4273209 5.622370 413439
92 4273763 5.433877 413663
112 4278645 5.361000 414975
22 4219325 5.339757 4061BD
77 4270842 5.337793 412AFA
61 4269088 5.297247 412420
74 4270285 5.212553 4128CD
81 4271678 5.192738 412E3E
0 4198405 5.168159 401005
144 4283747 5.157894 415D63
120 4282474 5.152816 41586A
89 4272974 5.110607 41334E
119 4282243 5.108883 415783
113 4278944 5.052786 414AA0
... ... ... ...
128 4283454 2.196426 415C3E
138 4283610 2.190622 415CDA
133 4283528 2.184170 415C88
126 4283418 2.176904 415C1A
43 4222965 2.173053 406FF5
131 4283492 2.168547 415C64
127 4283436 2.158516 415C2C
137 4283594 2.143894 415CCA
2 4198750 2.060482 40115E
82 4271969 2.008445 412F61
12 4204543 1.496372 4027FF
8 4199495 1.433464 401447
153 4284699 1.433189 41611B
147 4284279 1.433088 415F77
148 4284290 1.432506 415F82
27 4219819 1.430097 4063AB
10 4202531 1.420613 402023
135 4283564 1.418097 415CAC
63 4269465 1.412329 412599
59 4269062 1.396386 412406
60 4269075 1.377923 412413
53 4268456 1.229824 4121A8
129 4283472 1.189524 415C50
130 4283482 1.126457 415C5A
140 4283638 1.090974 415CF6
139 4283628 1.069728 415CEC
58 4269052 1.045125 4123FC
54 4268460 1.039051 4121AC
25 4219744 0.972180 406360
154 4284712 -0.000000 416128

155 rows × 3 columns


In [8]:
idc.Message(str(func_df.sort(['Entropy'], ascending=False)))