# Tutorial: topic modeling to analyze the EGC conference

EGC is a French-speaking conference on knowledge discovery in databases (KDD). In this notebook we show how to use TOM for inferring latent topics that pervade the corpus of articles published at EGC between 2004 and 2015 using non-negative matrix factorization. Based on the discovered topics we use TOM to shed light on interesting facts on the topical structure of the EGC society.

We prune words which absolute frequency in the corpus is less than 4, as well as words which relative frequency is higher than 80%, with the aim to only keep the most significant ones. Eventually, we build the vector space representation of these articles with $tf \cdot idf$ weighting. It is a $n \times m$ matrix denoted by $A$, where each line represents an article, with $n = 817$ (i.e. the number of articles) and $m = 1738$ (i.e. the number of words).



In [1]:

from tom_lib.structure.corpus import Corpus
from tom_lib.visualization.visualization import Visualization

corpus = Corpus(source_file_path='input/egc_lemmatized.csv',
language='french',
vectorization='tfidf',
max_relative_frequency=0.8,
min_absolute_frequency=4)
print('corpus size:', corpus.size)
print('vocabulary size:', len(corpus.vocabulary))




corpus size: 817
vocabulary size: 1738



## Estimating the optimal number of topics ($k$)

Non-negative matrix factorization approximates $A$, the document-term matrix, in the following way:

$$A \approx HW$$

where $H$ is a $n \times k$ matrix that describes the documents in terms of topics, and $W$ is a $k \times m$ matrix that describes topics in terms of words. More precisely, the coefficient $h_{i,j}$ defines the importance of topic $j$ in article $i$, and the coefficient $w_{i,j}$ defines the importance of word $j$ in topic $i$.

Determining an appropriate value of $k$ is critical to ensure a pertinent analysis of the EGC anthology. If $k$ is too small, then the discovered topics will be too vague; if $k$ is too large, then the discovered topics will be too narrow and may be redundant. To help us with this task, we compute two metrics implemented in TOM : the stability metric proposed by Greene et al. (2014) and the spectral metric proposed by Arun et al. (2010).



In [2]:

from tom_lib.nlp.topic_model import NonNegativeMatrixFactorization

topic_model = NonNegativeMatrixFactorization(corpus)



### Weighted Jaccard average stability

The figure below shows this metric for a number of topics varying between 5 and 50 (higher is better).



In [3]:

from bokeh.io import show, output_notebook
from bokeh.plotting import figure
output_notebook()

p = figure(plot_height=250)
p.line(range(10, 51), topic_model.greene_metric(min_num_topics=10, step=1, max_num_topics=50, top_n_words=10, tao=10), line_width=2)
show(p)




var element = $('#bdf36c5c-1757-4e07-9f51-c111d8eab574'); (function(root) { function now() { return new Date(); } var force = true; if (typeof (root._bokeh_onload_callbacks) === "undefined" || force === true) { root._bokeh_onload_callbacks = []; root._bokeh_is_loading = undefined; } var JS_MIME_TYPE = 'application/javascript'; var HTML_MIME_TYPE = 'text/html'; var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json'; var CLASS_NAME = 'output_bokeh rendered_html'; /** * Render data to the DOM node */ function render(props, node) { var script = document.createElement("script"); node.appendChild(script); } /** * Handle when an output is cleared or removed */ function handleClearOutput(event, handle) { var cell = handle.cell; var id = cell.output_area._bokeh_element_id; var server_id = cell.output_area._bokeh_server_id; // Clean up Bokeh references if (id != null && id in Bokeh.index) { Bokeh.index[id].model.document.clear(); delete Bokeh.index[id]; } if (server_id !== undefined) { // Clean up Bokeh references var cmd = "from bokeh.io.state import curstate; print(curstate().uuid_to_server['" + server_id + "'].get_sessions()[0].document.roots[0]._id)"; cell.notebook.kernel.execute(cmd, { iopub: { output: function(msg) { var id = msg.content.text.trim(); if (id in Bokeh.index) { Bokeh.index[id].model.document.clear(); delete Bokeh.index[id]; } } } }); // Destroy server and session var cmd = "import bokeh.io.notebook as ion; ion.destroy_server('" + server_id + "')"; cell.notebook.kernel.execute(cmd); } } /** * Handle when a new output is added */ function handleAddOutput(event, handle) { var output_area = handle.output_area; var output = handle.output; // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only if ((output.output_type != "display_data") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) { return } var toinsert = output_area.element.find("." + CLASS_NAME.split(' ')[0]); if (output.metadata[EXEC_MIME_TYPE]["id"] !== undefined) { toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE]; // store reference to embed id on output_area output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE]["id"]; } if (output.metadata[EXEC_MIME_TYPE]["server_id"] !== undefined) { var bk_div = document.createElement("div"); bk_div.innerHTML = output.data[HTML_MIME_TYPE]; var script_attrs = bk_div.children[0].attributes; for (var i = 0; i < script_attrs.length; i++) { toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value); } // store reference to server id on output_area output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE]["server_id"]; } } function register_renderer(events, OutputArea) { function append_mime(data, metadata, element) { // create a DOM node to render to var toinsert = this.create_output_subarea( metadata, CLASS_NAME, EXEC_MIME_TYPE ); this.keyboard_manager.register_events(toinsert); // Render to node var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]}; render(props, toinsert[toinsert.length - 1]); element.append(toinsert); return toinsert } /* Handle when an output is cleared or removed */ events.on('clear_output.CodeCell', handleClearOutput); events.on('delete.Cell', handleClearOutput); /* Handle when a new output is added */ events.on('output_added.OutputArea', handleAddOutput); /** * Register the mime type and append_mime function with output_area */ OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, { /* Is output safe? */ safe: true, /* Index of renderer in output_area.display_order */ index: 0 }); } // register the mime type if in Jupyter Notebook environment and previously unregistered if (root.Jupyter !== undefined) { var events = require('base/js/events'); var OutputArea = require('notebook/js/outputarea').OutputArea; if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) { register_renderer(events, OutputArea); } } if (typeof (root._bokeh_timeout) === "undefined" || force === true) { root._bokeh_timeout = Date.now() + 5000; root._bokeh_failed_load = false; } var NB_LOAD_WARNING = {'data': {'text/html': "<div style='background-color: #fdd'>\n"+ "<p>\n"+ "BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \n"+ "may be due to a slow or bad network connection. Possible fixes:\n"+ "</p>\n"+ "<ul>\n"+ "<li>re-rerun output_notebook() to attempt to load from CDN again, or</li>\n"+ "<li>use INLINE resources instead, as so:</li>\n"+ "</ul>\n"+ "<code>\n"+ "from bokeh.resources import INLINE\n"+ "output_notebook(resources=INLINE)\n"+ "</code>\n"+ "</div>"}}; function display_loaded() { var el = document.getElementById("1001"); if (el != null) { el.textContent = "BokehJS is loading..."; } if (root.Bokeh !== undefined) { if (el != null) { el.textContent = "BokehJS " + root.Bokeh.version + " successfully loaded."; } } else if (Date.now() < root._bokeh_timeout) { setTimeout(display_loaded, 100) } } function run_callbacks() { try { root._bokeh_onload_callbacks.forEach(function(callback) { callback() }); } finally { delete root._bokeh_onload_callbacks } console.info("Bokeh: all callbacks have finished"); } function load_libs(js_urls, callback) { root._bokeh_onload_callbacks.push(callback); if (root._bokeh_is_loading > 0) { console.log("Bokeh: BokehJS is being loaded, scheduling callback at", now()); return null; } if (js_urls == null || js_urls.length === 0) { run_callbacks(); return null; } console.log("Bokeh: BokehJS not loaded, scheduling load and callback at", now()); root._bokeh_is_loading = js_urls.length; for (var i = 0; i < js_urls.length; i++) { var url = js_urls[i]; var s = document.createElement('script'); s.src = url; s.async = false; s.onreadystatechange = s.onload = function() { root._bokeh_is_loading--; if (root._bokeh_is_loading === 0) { console.log("Bokeh: all BokehJS libraries loaded"); run_callbacks() } }; s.onerror = function() { console.warn("failed to load library " + url); }; console.log("Bokeh: injecting script tag for BokehJS library: ", url); document.getElementsByTagName("head")[0].appendChild(s); } };var element = document.getElementById("1001"); if (element == null) { console.log("Bokeh: ERROR: autoload.js configured with elementid '1001' but no matching script tag was found. ") return false; } var js_urls = ["https://cdn.pydata.org/bokeh/release/bokeh-1.0.2.min.js", "https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.2.min.js", "https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.2.min.js", "https://cdn.pydata.org/bokeh/release/bokeh-gl-1.0.2.min.js"]; var inline_js = [ function(Bokeh) { Bokeh.set_log_level("info"); }, function(Bokeh) { }, function(Bokeh) { console.log("Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-1.0.2.min.css"); Bokeh.embed.inject_css("https://cdn.pydata.org/bokeh/release/bokeh-1.0.2.min.css"); console.log("Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.2.min.css"); Bokeh.embed.inject_css("https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.2.min.css"); console.log("Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.2.min.css"); Bokeh.embed.inject_css("https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.2.min.css"); } ]; function run_inline_js() { if ((root.Bokeh !== undefined) || (force === true)) { for (var i = 0; i < inline_js.length; i++) { inline_js[i].call(root, root.Bokeh); }if (force === true) { display_loaded(); }} else if (Date.now() < root._bokeh_timeout) { setTimeout(run_inline_js, 100); } else if (!root._bokeh_failed_load) { console.log("Bokeh: BokehJS failed to load within specified timeout."); root._bokeh_failed_load = true; } else if (force !== true) { var cell =$(document.getElementById("1001")).parents('.cell').data().cell;
}

}

console.log("Bokeh: BokehJS loaded, going straight to plotting");
run_inline_js();
} else {
console.log("Bokeh: BokehJS plotting callback run at", now());
run_inline_js();
});
}
}(window));

var element = $('#5aa72e44-8710-4e0f-9200-debb0fae51c3'); (function(root) { function embed_document(root) { var docs_json = {"40ea02c5-504f-4c11-a9bf-c88ce53b2b7a":{"roots":{"references":[{"attributes":{"below":[{"id":"1011","type":"LinearAxis"}],"left":[{"id":"1016","type":"LinearAxis"}],"plot_height":250,"renderers":[{"id":"1011","type":"LinearAxis"},{"id":"1015","type":"Grid"},{"id":"1016","type":"LinearAxis"},{"id":"1020","type":"Grid"},{"id":"1029","type":"BoxAnnotation"},{"id":"1039","type":"GlyphRenderer"}],"title":{"id":"1042","type":"Title"},"toolbar":{"id":"1027","type":"Toolbar"},"x_range":{"id":"1003","type":"DataRange1d"},"x_scale":{"id":"1007","type":"LinearScale"},"y_range":{"id":"1005","type":"DataRange1d"},"y_scale":{"id":"1009","type":"LinearScale"}},"id":"1002","subtype":"Figure","type":"Plot"},{"attributes":{"line_color":"#1f77b4","line_width":2,"x":{"field":"x"},"y":{"field":"y"}},"id":"1037","type":"Line"},{"attributes":{"callback":null},"id":"1005","type":"DataRange1d"},{"attributes":{},"id":"1026","type":"HelpTool"},{"attributes":{"active_drag":"auto","active_inspect":"auto","active_multi":null,"active_scroll":"auto","active_tap":"auto","tools":[{"id":"1021","type":"PanTool"},{"id":"1022","type":"WheelZoomTool"},{"id":"1023","type":"BoxZoomTool"},{"id":"1024","type":"SaveTool"},{"id":"1025","type":"ResetTool"},{"id":"1026","type":"HelpTool"}]},"id":"1027","type":"Toolbar"},{"attributes":{},"id":"1007","type":"LinearScale"},{"attributes":{"source":{"id":"1036","type":"ColumnDataSource"}},"id":"1040","type":"CDSView"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"1029","type":"BoxAnnotation"},{"attributes":{},"id":"1009","type":"LinearScale"},{"attributes":{},"id":"1045","type":"BasicTickFormatter"},{"attributes":{"formatter":{"id":"1043","type":"BasicTickFormatter"},"plot":{"id":"1002","subtype":"Figure","type":"Plot"},"ticker":{"id":"1012","type":"BasicTicker"}},"id":"1011","type":"LinearAxis"},{"attributes":{"callback":null,"data":{"x":[10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50],"y":[0.6723693941679236,0.6679513581846118,0.6214303997301289,0.6527203217187142,0.6732751272819273,0.6029257679369133,0.6508952091993404,0.6255664305645183,0.6468699748394022,0.605016451597395,0.6488312321235696,0.6688501831559107,0.626803684186002,0.6651139888065449,0.6454281034260717,0.6294986140553632,0.6174693778113343,0.6291175482838709,0.6408193650236441,0.6372296904754735,0.6272039542529481,0.6274137388017842,0.647427594609338,0.6461949198241065,0.6363277478101736,0.6321649188150293,0.6485795992625845,0.6190597616851486,0.6509280385054989,0.6219808948581871,0.5824832903036076,0.6195084771923857,0.5802044957372801,0.6179236413820977,0.5973139197228171,0.5808329398385125,0.5751122471421132,0.5806603035346696,0.5824383417436823,0.5758287650479159,0.5411867179672752]},"selected":{"id":"1047","type":"Selection"},"selection_policy":{"id":"1048","type":"UnionRenderers"}},"id":"1036","type":"ColumnDataSource"},{"attributes":{},"id":"1012","type":"BasicTicker"},{"attributes":{"data_source":{"id":"1036","type":"ColumnDataSource"},"glyph":{"id":"1037","type":"Line"},"hover_glyph":null,"muted_glyph":null,"nonselection_glyph":{"id":"1038","type":"Line"},"selection_glyph":null,"view":{"id":"1040","type":"CDSView"}},"id":"1039","type":"GlyphRenderer"},{"attributes":{"plot":{"id":"1002","subtype":"Figure","type":"Plot"},"ticker":{"id":"1012","type":"BasicTicker"}},"id":"1015","type":"Grid"},{"attributes":{"formatter":{"id":"1045","type":"BasicTickFormatter"},"plot":{"id":"1002","subtype":"Figure","type":"Plot"},"ticker":{"id":"1017","type":"BasicTicker"}},"id":"1016","type":"LinearAxis"},{"attributes":{},"id":"1017","type":"BasicTicker"},{"attributes":{"dimension":1,"plot":{"id":"1002","subtype":"Figure","type":"Plot"},"ticker":{"id":"1017","type":"BasicTicker"}},"id":"1020","type":"Grid"},{"attributes":{"line_alpha":0.1,"line_color":"#1f77b4","line_width":2,"x":{"field":"x"},"y":{"field":"y"}},"id":"1038","type":"Line"},{"attributes":{"plot":null,"text":""},"id":"1042","type":"Title"},{"attributes":{},"id":"1048","type":"UnionRenderers"},{"attributes":{"callback":null},"id":"1003","type":"DataRange1d"},{"attributes":{},"id":"1021","type":"PanTool"},{"attributes":{},"id":"1047","type":"Selection"},{"attributes":{},"id":"1022","type":"WheelZoomTool"},{"attributes":{"overlay":{"id":"1029","type":"BoxAnnotation"}},"id":"1023","type":"BoxZoomTool"},{"attributes":{},"id":"1043","type":"BasicTickFormatter"},{"attributes":{},"id":"1024","type":"SaveTool"},{"attributes":{},"id":"1025","type":"ResetTool"}],"root_ids":["1002"]},"title":"Bokeh Application","version":"1.0.2"}}; var render_items = [{"docid":"40ea02c5-504f-4c11-a9bf-c88ce53b2b7a","roots":{"1002":"f82d65eb-2e21-4d95-b08d-cc796e87c2c0"}}]; root.Bokeh.embed.embed_items_notebook(docs_json, render_items); } if (root.Bokeh !== undefined) { embed_document(root); } else { var attempts = 0; var timer = setInterval(function(root) { if (root.Bokeh !== undefined) { embed_document(root); clearInterval(timer); } attempts++; if (attempts > 100) { console.log("Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing"); clearInterval(timer); } }, 10, root) } })(window);  ### Symmetric Kullback-Liebler divergence The figure below shows this metric for a number of topics varying between 5 and 50 (lower is better).  In [4]: p = figure(plot_height=250) p.line(range(10, 51), topic_model.arun_metric(min_num_topics=10, max_num_topics=50, iterations=10), line_width=2) show(p)   var element =$('#59761b89-7c36-49d3-81a1-d436dc5273e8');
(function(root) {
function embed_document(root) {

var docs_json = {"3aba52ba-1cac-470f-a190-3f70c00aed0e":{"roots":{"references":[{"attributes":{"below":[{"id":"1113","type":"LinearAxis"}],"left":[{"id":"1118","type":"LinearAxis"}],"plot_height":250,"renderers":[{"id":"1113","type":"LinearAxis"},{"id":"1117","type":"Grid"},{"id":"1118","type":"LinearAxis"},{"id":"1122","type":"Grid"},{"id":"1131","type":"BoxAnnotation"},{"id":"1141","type":"GlyphRenderer"}],"title":{"id":"1153","type":"Title"},"toolbar":{"id":"1129","type":"Toolbar"},"x_range":{"id":"1105","type":"DataRange1d"},"x_scale":{"id":"1109","type":"LinearScale"},"y_range":{"id":"1107","type":"DataRange1d"},"y_scale":{"id":"1111","type":"LinearScale"}},"id":"1104","subtype":"Figure","type":"Plot"},{"attributes":{"callback":null},"id":"1107","type":"DataRange1d"},{"attributes":{},"id":"1126","type":"SaveTool"},{"attributes":{},"id":"1111","type":"LinearScale"},{"attributes":{},"id":"1158","type":"Selection"},{"attributes":{"callback":null,"data":{"x":[10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50],"y":{"__ndarray__":"BnPX34Zzvj8oh0JsvWy9P9MuwGs7O7w/pcLKIJrKvz+DvvR/lT6/P6xs1BKdXsQ/phhW1v+jxj+Hpsg0bavGP9NPmw7JOMo/oMNRFbtyzD/ueobAi17LP1OihHZG4s8/CVcpfAIT0T8gfRLXDkTSP5owAQxiIdM/gu5YB2ak0j+e6steQ3rUP8aqy9vwydQ/Y4eUSnZw1j/hACRLTRTXP2BBYmm0Udc/XsOxgEbg1T/OIYecAODYP0L1OuDe2Ns/tUK/QOfs3T/ruz4bmWnePwXsfsLKxt8/q0Prmr8y4T9V/vXler/gP1y/htzu/+E/2+JYg+3g4j9KZhvAT3/jP2oaE/25vOU/7ade+y0g6D/V9yTzkiroP7Jyp7atYeg/KL45rLg56D+KA3DX4nftP0imPmHslu0/hCY6s2u48D+ONZQ5ynrxPw==","dtype":"float64","shape":[41]}},"selected":{"id":"1158","type":"Selection"},"selection_policy":{"id":"1159","type":"UnionRenderers"}},"id":"1138","type":"ColumnDataSource"},{"attributes":{"plot":{"id":"1104","subtype":"Figure","type":"Plot"},"ticker":{"id":"1114","type":"BasicTicker"}},"id":"1117","type":"Grid"},{"attributes":{"formatter":{"id":"1154","type":"BasicTickFormatter"},"plot":{"id":"1104","subtype":"Figure","type":"Plot"},"ticker":{"id":"1114","type":"BasicTicker"}},"id":"1113","type":"LinearAxis"},{"attributes":{"plot":null,"text":""},"id":"1153","type":"Title"},{"attributes":{},"id":"1114","type":"BasicTicker"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"1131","type":"BoxAnnotation"},{"attributes":{},"id":"1124","type":"WheelZoomTool"},{"attributes":{"line_alpha":0.1,"line_color":"#1f77b4","line_width":2,"x":{"field":"x"},"y":{"field":"y"}},"id":"1140","type":"Line"},{"attributes":{},"id":"1109","type":"LinearScale"},{"attributes":{"formatter":{"id":"1156","type":"BasicTickFormatter"},"plot":{"id":"1104","subtype":"Figure","type":"Plot"},"ticker":{"id":"1119","type":"BasicTicker"}},"id":"1118","type":"LinearAxis"},{"attributes":{"dimension":1,"plot":{"id":"1104","subtype":"Figure","type":"Plot"},"ticker":{"id":"1119","type":"BasicTicker"}},"id":"1122","type":"Grid"},{"attributes":{},"id":"1159","type":"UnionRenderers"},{"attributes":{"source":{"id":"1138","type":"ColumnDataSource"}},"id":"1142","type":"CDSView"},{"attributes":{"callback":null},"id":"1105","type":"DataRange1d"},{"attributes":{},"id":"1127","type":"ResetTool"},{"attributes":{"overlay":{"id":"1131","type":"BoxAnnotation"}},"id":"1125","type":"BoxZoomTool"},{"attributes":{},"id":"1154","type":"BasicTickFormatter"},{"attributes":{},"id":"1123","type":"PanTool"},{"attributes":{},"id":"1119","type":"BasicTicker"},{"attributes":{"data_source":{"id":"1138","type":"ColumnDataSource"},"glyph":{"id":"1139","type":"Line"},"hover_glyph":null,"muted_glyph":null,"nonselection_glyph":{"id":"1140","type":"Line"},"selection_glyph":null,"view":{"id":"1142","type":"CDSView"}},"id":"1141","type":"GlyphRenderer"},{"attributes":{"active_drag":"auto","active_inspect":"auto","active_multi":null,"active_scroll":"auto","active_tap":"auto","tools":[{"id":"1123","type":"PanTool"},{"id":"1124","type":"WheelZoomTool"},{"id":"1125","type":"BoxZoomTool"},{"id":"1126","type":"SaveTool"},{"id":"1127","type":"ResetTool"},{"id":"1128","type":"HelpTool"}]},"id":"1129","type":"Toolbar"},{"attributes":{},"id":"1156","type":"BasicTickFormatter"},{"attributes":{"line_color":"#1f77b4","line_width":2,"x":{"field":"x"},"y":{"field":"y"}},"id":"1139","type":"Line"},{"attributes":{},"id":"1128","type":"HelpTool"}],"root_ids":["1104"]},"title":"Bokeh Application","version":"1.0.2"}};
var render_items = [{"docid":"3aba52ba-1cac-470f-a190-3f70c00aed0e","roots":{"1104":"bdb33dfd-1e00-46a5-a7f3-52fc43c2a12e"}}];
root.Bokeh.embed.embed_items_notebook(docs_json, render_items);

}
if (root.Bokeh !== undefined) {
embed_document(root);
} else {
var attempts = 0;
var timer = setInterval(function(root) {
if (root.Bokeh !== undefined) {
embed_document(root);
clearInterval(timer);
}
attempts++;
if (attempts > 100) {
console.log("Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing");
clearInterval(timer);
}
}, 10, root)
}
})(window);



Guided by the two metrics described previously, we manually evaluate the quality of the topics identified with $k$ varying between 15 and 20. Eventually, we judge that the best results are achieved with NMF for $k=15$.



In [5]:

k = 15
topic_model.infer_topics(num_topics=k)



## Results

### Description of the discovered topics

The table below lists the most relevant words for each of the 15 topics discovered from the articles with NMF. They reveal that the people who form the EGC society are interested in a wide variety of both theoretical and applied issues. For instance, topics 11 and 12 are related to theoretical issues: topic 11 covers papers about model and variable selection, and topic 12 covers papers that propose new or improved learning algorithms. On the other hand, topics 0 and 6 are related to applied issues: topic 13 covers papers about social network analysis, and topic 6 covers papers about Web usage mining.



In [6]:

import pandas as pd
pd.set_option('display.max_colwidth', 500)

d = {'Most relevant words': [', '.join([word for word, weight in topic_model.top_words(i, 10)]) for i in range(k)]}
df = pd.DataFrame(data=d)




Out[6]:

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

Most relevant words

0
classification, algorithme, méthode, non, clustering, classe, apprentissage, superviser, avoir, nouveau

1
web, site, page, analyse, sémantique, usage, comportement, contenu, mining, navigation

2
motif, séquentiel, extraction, contrainte, fréquent, extraire, découverte, condensé, donnée, proposer

3
règle, association, extraction, mesure, base, extraire, confiance, indice, associatif, nombre

4
document, xml, annotation, texte, recherche, mots, structure, corpus, textuel, extraction

5
ontologie, alignement, sémantique, concept, domaine, annotation, construction, owl, entre, ressource

6
image, afc, recherche, segmentation, région, objet, satellite, descripteur, base, visuel

7
donnée, flux, base, fouille, visualisation, cube, requête, entrepôt, analyse, pouvoir

8
connaissance, gestion, expert, agent, extraction, outil, acquisition, compétence, processus, métier

9
réseau, graphe, social, communauté, détection, analyse, structure, méthode, lien, sommet

10
carte, topologique, auto, organisatrice, som, cognitif, contrainte, probabiliste, pondération, hiérarchique

11
variable, modèle, sélection, superviser, table, méthode, pondération, apprentissage, naïf, classifieur

12
information, utilisateur, système, recherche, modèle, recommandation, profil, préférence, qualité, avoir

13
séquence, temporel, événement, série, modèle, évènement, spatio, vidéo, intervalle, chronique

14
arbre, décision, résultat, mesure, asymétrique, entropie, évaluation, induction, critère, présenter



In the following, we leverage the discovered topics to highlight interesting particularities about the EGC society. To be able to analyze the topics, supplemented with information about the related papers, we partition the papers into 15 non-overlapping clusters, i.e. a cluster per topic. Each article $i \in [0;1-n]$ is assigned to the cluster $j$ that corresponds to the topic with the highest weight $w_{ij}$:

$$\text{cluster}_i = \underset{j}{\mathrm{argmax}}(w_{i,j}) \label{eq:cluster}$$

### Global topic proportions



In [7]:

p = figure(x_range=[str(_) for _ in range(k)], plot_height=350, x_axis_label='topic', y_axis_label='proportion')
p.vbar(x=[str(_) for _ in range(k)], top=topic_model.topics_frequency(), width=0.7)
show(p)




var element = $('#41dae596-5e67-4a95-ab07-eebdf4cf654e'); (function(root) { function embed_document(root) { var docs_json = {"ac46d2bf-85f8-454f-af6a-67bbfad11a30":{"roots":{"references":[{"attributes":{"below":[{"id":"1224","type":"CategoricalAxis"}],"left":[{"id":"1228","type":"LinearAxis"}],"plot_height":350,"renderers":[{"id":"1224","type":"CategoricalAxis"},{"id":"1227","type":"Grid"},{"id":"1228","type":"LinearAxis"},{"id":"1232","type":"Grid"},{"id":"1241","type":"BoxAnnotation"},{"id":"1251","type":"GlyphRenderer"}],"title":{"id":"1272","type":"Title"},"toolbar":{"id":"1239","type":"Toolbar"},"x_range":{"id":"1216","type":"FactorRange"},"x_scale":{"id":"1220","type":"CategoricalScale"},"y_range":{"id":"1218","type":"DataRange1d"},"y_scale":{"id":"1222","type":"LinearScale"}},"id":"1215","subtype":"Figure","type":"Plot"},{"attributes":{"fill_alpha":{"value":0.1},"fill_color":{"value":"#1f77b4"},"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"top":{"field":"top"},"width":{"value":0.7},"x":{"field":"x"}},"id":"1250","type":"VBar"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"1241","type":"BoxAnnotation"},{"attributes":{},"id":"1222","type":"LinearScale"},{"attributes":{"overlay":{"id":"1241","type":"BoxAnnotation"}},"id":"1235","type":"BoxZoomTool"},{"attributes":{},"id":"1278","type":"UnionRenderers"},{"attributes":{"axis_label":"topic","formatter":{"id":"1273","type":"CategoricalTickFormatter"},"plot":{"id":"1215","subtype":"Figure","type":"Plot"},"ticker":{"id":"1225","type":"CategoricalTicker"}},"id":"1224","type":"CategoricalAxis"},{"attributes":{},"id":"1236","type":"SaveTool"},{"attributes":{"callback":null,"data":{"top":{"__ndarray__":"wFxYxjeutD+6XFjGN66kP/fQdQj1kqs/dMQdcUfcsT98NvLrP3StPwGcbs+KVa8/4zqEeskNpD8cGbAuW23DP1yx6oNLP7Y/OPfb4uzMsj8199vi7MyiP+0FfUFf0Kc/zAM0epqDvD98NvLrP3StP7CRX/+h66A/","dtype":"float64","shape":[15]},"x":["0","1","2","3","4","5","6","7","8","9","10","11","12","13","14"]},"selected":{"id":"1277","type":"Selection"},"selection_policy":{"id":"1278","type":"UnionRenderers"}},"id":"1248","type":"ColumnDataSource"},{"attributes":{"axis_label":"proportion","formatter":{"id":"1275","type":"BasicTickFormatter"},"plot":{"id":"1215","subtype":"Figure","type":"Plot"},"ticker":{"id":"1229","type":"BasicTicker"}},"id":"1228","type":"LinearAxis"},{"attributes":{},"id":"1237","type":"ResetTool"},{"attributes":{"callback":null},"id":"1218","type":"DataRange1d"},{"attributes":{},"id":"1275","type":"BasicTickFormatter"},{"attributes":{},"id":"1225","type":"CategoricalTicker"},{"attributes":{"source":{"id":"1248","type":"ColumnDataSource"}},"id":"1252","type":"CDSView"},{"attributes":{"callback":null,"factors":["0","1","2","3","4","5","6","7","8","9","10","11","12","13","14"]},"id":"1216","type":"FactorRange"},{"attributes":{"data_source":{"id":"1248","type":"ColumnDataSource"},"glyph":{"id":"1249","type":"VBar"},"hover_glyph":null,"muted_glyph":null,"nonselection_glyph":{"id":"1250","type":"VBar"},"selection_glyph":null,"view":{"id":"1252","type":"CDSView"}},"id":"1251","type":"GlyphRenderer"},{"attributes":{"plot":null,"text":""},"id":"1272","type":"Title"},{"attributes":{},"id":"1277","type":"Selection"},{"attributes":{},"id":"1229","type":"BasicTicker"},{"attributes":{},"id":"1238","type":"HelpTool"},{"attributes":{"plot":{"id":"1215","subtype":"Figure","type":"Plot"},"ticker":{"id":"1225","type":"CategoricalTicker"}},"id":"1227","type":"Grid"},{"attributes":{"fill_color":{"value":"#1f77b4"},"line_color":{"value":"#1f77b4"},"top":{"field":"top"},"width":{"value":0.7},"x":{"field":"x"}},"id":"1249","type":"VBar"},{"attributes":{},"id":"1273","type":"CategoricalTickFormatter"},{"attributes":{"active_drag":"auto","active_inspect":"auto","active_multi":null,"active_scroll":"auto","active_tap":"auto","tools":[{"id":"1233","type":"PanTool"},{"id":"1234","type":"WheelZoomTool"},{"id":"1235","type":"BoxZoomTool"},{"id":"1236","type":"SaveTool"},{"id":"1237","type":"ResetTool"},{"id":"1238","type":"HelpTool"}]},"id":"1239","type":"Toolbar"},{"attributes":{},"id":"1233","type":"PanTool"},{"attributes":{},"id":"1220","type":"CategoricalScale"},{"attributes":{},"id":"1234","type":"WheelZoomTool"},{"attributes":{"dimension":1,"plot":{"id":"1215","subtype":"Figure","type":"Plot"},"ticker":{"id":"1229","type":"BasicTicker"}},"id":"1232","type":"Grid"}],"root_ids":["1215"]},"title":"Bokeh Application","version":"1.0.2"}}; var render_items = [{"docid":"ac46d2bf-85f8-454f-af6a-67bbfad11a30","roots":{"1215":"5dbc6ab6-a10e-48c3-9336-f1d1b7edbdc7"}}]; root.Bokeh.embed.embed_items_notebook(docs_json, render_items); } if (root.Bokeh !== undefined) { embed_document(root); } else { var attempts = 0; var timer = setInterval(function(root) { if (root.Bokeh !== undefined) { embed_document(root); clearInterval(timer); } attempts++; if (attempts > 100) { console.log("Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing"); clearInterval(timer); } }, 10, root) } })(window);  ### Shifting attention, evolving interests Here we focus on topics topics 12 (social network analysis and mining) and 3 (association rule mining). The following figures describe these topics in terms of their respective top 10 words and top 3 documents.  In [8]: def plot_top_words(topic_id): words = [word for word, weight in topic_model.top_words(topic_id, 10)] weights = [weight for word, weight in topic_model.top_words(topic_id, 10)] p = figure(x_range=words, plot_height=300, plot_width=800, x_axis_label='word', y_axis_label='weight') p.vbar(x=words, top=weights, width=0.7) show(p) def top_documents_df(topic_id): top_docs = topic_model.top_documents(topic_id, 3) d = {'Article title': [corpus.title(doc_id) for doc_id, weight in top_docs], 'Year': [int(corpus.date(doc_id)) for doc_id, weight in top_docs]} df = pd.DataFrame(data=d) return df  #### Topic #12 ##### Top 10 words  In [9]: plot_top_words(12)   var element =$('#686c69de-e42e-4389-8a12-c98947906f49');
(function(root) {
function embed_document(root) {

var docs_json = {"80ed5836-0e0f-4a11-b930-e385de79193e":{"roots":{"references":[{"attributes":{"below":[{"id":"1343","type":"CategoricalAxis"}],"left":[{"id":"1347","type":"LinearAxis"}],"plot_height":300,"plot_width":800,"renderers":[{"id":"1343","type":"CategoricalAxis"},{"id":"1346","type":"Grid"},{"id":"1347","type":"LinearAxis"},{"id":"1351","type":"Grid"},{"id":"1360","type":"BoxAnnotation"},{"id":"1370","type":"GlyphRenderer"}],"title":{"id":"1400","type":"Title"},"toolbar":{"id":"1358","type":"Toolbar"},"x_range":{"id":"1335","type":"FactorRange"},"x_scale":{"id":"1339","type":"CategoricalScale"},"y_range":{"id":"1337","type":"DataRange1d"},"y_scale":{"id":"1341","type":"LinearScale"}},"id":"1334","subtype":"Figure","type":"Plot"},{"attributes":{},"id":"1403","type":"BasicTickFormatter"},{"attributes":{},"id":"1405","type":"Selection"},{"attributes":{"plot":null,"text":""},"id":"1400","type":"Title"},{"attributes":{"source":{"id":"1367","type":"ColumnDataSource"}},"id":"1371","type":"CDSView"},{"attributes":{"axis_label":"weight","formatter":{"id":"1403","type":"BasicTickFormatter"},"plot":{"id":"1334","subtype":"Figure","type":"Plot"},"ticker":{"id":"1348","type":"BasicTicker"}},"id":"1347","type":"LinearAxis"},{"attributes":{},"id":"1355","type":"SaveTool"},{"attributes":{},"id":"1356","type":"ResetTool"},{"attributes":{},"id":"1353","type":"WheelZoomTool"},{"attributes":{"axis_label":"word","formatter":{"id":"1401","type":"CategoricalTickFormatter"},"plot":{"id":"1334","subtype":"Figure","type":"Plot"},"ticker":{"id":"1344","type":"CategoricalTicker"}},"id":"1343","type":"CategoricalAxis"},{"attributes":{},"id":"1401","type":"CategoricalTickFormatter"},{"attributes":{},"id":"1341","type":"LinearScale"},{"attributes":{"fill_color":{"value":"#1f77b4"},"line_color":{"value":"#1f77b4"},"top":{"field":"top"},"width":{"value":0.7},"x":{"field":"x"}},"id":"1368","type":"VBar"},{"attributes":{"callback":null,"factors":["information","utilisateur","syst\u00e8me","recherche","mod\u00e8le","recommandation","profil","pr\u00e9f\u00e9rence","qualit\u00e9","avoir"]},"id":"1335","type":"FactorRange"},{"attributes":{"active_drag":"auto","active_inspect":"auto","active_multi":null,"active_scroll":"auto","active_tap":"auto","tools":[{"id":"1352","type":"PanTool"},{"id":"1353","type":"WheelZoomTool"},{"id":"1354","type":"BoxZoomTool"},{"id":"1355","type":"SaveTool"},{"id":"1356","type":"ResetTool"},{"id":"1357","type":"HelpTool"}]},"id":"1358","type":"Toolbar"},{"attributes":{},"id":"1406","type":"UnionRenderers"},{"attributes":{},"id":"1357","type":"HelpTool"},{"attributes":{},"id":"1348","type":"BasicTicker"},{"attributes":{},"id":"1344","type":"CategoricalTicker"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"1360","type":"BoxAnnotation"},{"attributes":{"data_source":{"id":"1367","type":"ColumnDataSource"},"glyph":{"id":"1368","type":"VBar"},"hover_glyph":null,"muted_glyph":null,"nonselection_glyph":{"id":"1369","type":"VBar"},"selection_glyph":null,"view":{"id":"1371","type":"CDSView"}},"id":"1370","type":"GlyphRenderer"},{"attributes":{},"id":"1352","type":"PanTool"},{"attributes":{"plot":{"id":"1334","subtype":"Figure","type":"Plot"},"ticker":{"id":"1344","type":"CategoricalTicker"}},"id":"1346","type":"Grid"},{"attributes":{"overlay":{"id":"1360","type":"BoxAnnotation"}},"id":"1354","type":"BoxZoomTool"},{"attributes":{"callback":null,"data":{"top":[0.5921544000012993,0.5851467003569814,0.4585458795006536,0.28651921727773755,0.28474946377546356,0.21022368414321754,0.1995247486360353,0.18088307531120965,0.16162890343430855,0.1455098142643917],"x":["information","utilisateur","syst\u00e8me","recherche","mod\u00e8le","recommandation","profil","pr\u00e9f\u00e9rence","qualit\u00e9","avoir"]},"selected":{"id":"1405","type":"Selection"},"selection_policy":{"id":"1406","type":"UnionRenderers"}},"id":"1367","type":"ColumnDataSource"},{"attributes":{"dimension":1,"plot":{"id":"1334","subtype":"Figure","type":"Plot"},"ticker":{"id":"1348","type":"BasicTicker"}},"id":"1351","type":"Grid"},{"attributes":{},"id":"1339","type":"CategoricalScale"},{"attributes":{"fill_alpha":{"value":0.1},"fill_color":{"value":"#1f77b4"},"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"top":{"field":"top"},"width":{"value":0.7},"x":{"field":"x"}},"id":"1369","type":"VBar"},{"attributes":{"callback":null},"id":"1337","type":"DataRange1d"}],"root_ids":["1334"]},"title":"Bokeh Application","version":"1.0.2"}};
var render_items = [{"docid":"80ed5836-0e0f-4a11-b930-e385de79193e","roots":{"1334":"397aac17-b5e7-4499-8e87-7e616a63897b"}}];
root.Bokeh.embed.embed_items_notebook(docs_json, render_items);

}
if (root.Bokeh !== undefined) {
embed_document(root);
} else {
var attempts = 0;
var timer = setInterval(function(root) {
if (root.Bokeh !== undefined) {
embed_document(root);
clearInterval(timer);
}
attempts++;
if (attempts > 100) {
console.log("Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing");
clearInterval(timer);
}
}, 10, root)
}
})(window);


##### Top 3 articles


In [10]:




Out[10]:

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

Article title
Year

0
Un modèle de qualité de l'information
2006

1
Apprentissage incrémental des profils dans un système de filtrage d'information
2004

2
Détection des profils à long terme et à court terme dans les réseaux sociaux
2011



#### Topic #3

##### Top 10 words


In [11]:

plot_top_words(3)




var element = ('#fa2600bf-d041-4b75-8ab6-bb6a8d143564'); (function(root) { function embed_document(root) { var docs_json = {"6adfdf41-be57-4d9c-8c59-fc8653e125fb":{"roots":{"references":[{"attributes":{"below":[{"id":"1471","type":"CategoricalAxis"}],"left":[{"id":"1475","type":"LinearAxis"}],"plot_height":300,"plot_width":800,"renderers":[{"id":"1471","type":"CategoricalAxis"},{"id":"1474","type":"Grid"},{"id":"1475","type":"LinearAxis"},{"id":"1479","type":"Grid"},{"id":"1488","type":"BoxAnnotation"},{"id":"1498","type":"GlyphRenderer"}],"title":{"id":"1537","type":"Title"},"toolbar":{"id":"1486","type":"Toolbar"},"x_range":{"id":"1463","type":"FactorRange"},"x_scale":{"id":"1467","type":"CategoricalScale"},"y_range":{"id":"1465","type":"DataRange1d"},"y_scale":{"id":"1469","type":"LinearScale"}},"id":"1462","subtype":"Figure","type":"Plot"},{"attributes":{},"id":"1476","type":"BasicTicker"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"1488","type":"BoxAnnotation"},{"attributes":{"active_drag":"auto","active_inspect":"auto","active_multi":null,"active_scroll":"auto","active_tap":"auto","tools":[{"id":"1480","type":"PanTool"},{"id":"1481","type":"WheelZoomTool"},{"id":"1482","type":"BoxZoomTool"},{"id":"1483","type":"SaveTool"},{"id":"1484","type":"ResetTool"},{"id":"1485","type":"HelpTool"}]},"id":"1486","type":"Toolbar"},{"attributes":{},"id":"1480","type":"PanTool"},{"attributes":{},"id":"1483","type":"SaveTool"},{"attributes":{},"id":"1469","type":"LinearScale"},{"attributes":{"axis_label":"word","formatter":{"id":"1538","type":"CategoricalTickFormatter"},"plot":{"id":"1462","subtype":"Figure","type":"Plot"},"ticker":{"id":"1472","type":"CategoricalTicker"}},"id":"1471","type":"CategoricalAxis"},{"attributes":{"dimension":1,"plot":{"id":"1462","subtype":"Figure","type":"Plot"},"ticker":{"id":"1476","type":"BasicTicker"}},"id":"1479","type":"Grid"},{"attributes":{},"id":"1543","type":"UnionRenderers"},{"attributes":{"source":{"id":"1495","type":"ColumnDataSource"}},"id":"1499","type":"CDSView"},{"attributes":{"callback":null},"id":"1465","type":"DataRange1d"},{"attributes":{},"id":"1540","type":"BasicTickFormatter"},{"attributes":{"callback":null,"data":{"top":[1.3980213385442162,0.4849342808645179,0.19384800279754993,0.1518411645998789,0.1359491799548399,0.13539852094927413,0.123750859914672,0.1095622916561784,0.09777829847003079,0.09603415375087411],"x":["r\u00e8gle","association","extraction","mesure","base","extraire","confiance","indice","associatif","nombre"]},"selected":{"id":"1542","type":"Selection"},"selection_policy":{"id":"1543","type":"UnionRenderers"}},"id":"1495","type":"ColumnDataSource"},{"attributes":{"fill_color":{"value":"#1f77b4"},"line_color":{"value":"#1f77b4"},"top":{"field":"top"},"width":{"value":0.7},"x":{"field":"x"}},"id":"1496","type":"VBar"},{"attributes":{"plot":null,"text":""},"id":"1537","type":"Title"},{"attributes":{"plot":{"id":"1462","subtype":"Figure","type":"Plot"},"ticker":{"id":"1472","type":"CategoricalTicker"}},"id":"1474","type":"Grid"},{"attributes":{},"id":"1484","type":"ResetTool"},{"attributes":{},"id":"1472","type":"CategoricalTicker"},{"attributes":{"data_source":{"id":"1495","type":"ColumnDataSource"},"glyph":{"id":"1496","type":"VBar"},"hover_glyph":null,"muted_glyph":null,"nonselection_glyph":{"id":"1497","type":"VBar"},"selection_glyph":null,"view":{"id":"1499","type":"CDSView"}},"id":"1498","type":"GlyphRenderer"},{"attributes":{},"id":"1538","type":"CategoricalTickFormatter"},{"attributes":{},"id":"1485","type":"HelpTool"},{"attributes":{},"id":"1542","type":"Selection"},{"attributes":{"axis_label":"weight","formatter":{"id":"1540","type":"BasicTickFormatter"},"plot":{"id":"1462","subtype":"Figure","type":"Plot"},"ticker":{"id":"1476","type":"BasicTicker"}},"id":"1475","type":"LinearAxis"},{"attributes":{},"id":"1467","type":"CategoricalScale"},{"attributes":{"fill_alpha":{"value":0.1},"fill_color":{"value":"#1f77b4"},"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"top":{"field":"top"},"width":{"value":0.7},"x":{"field":"x"}},"id":"1497","type":"VBar"},{"attributes":{"overlay":{"id":"1488","type":"BoxAnnotation"}},"id":"1482","type":"BoxZoomTool"},{"attributes":{},"id":"1481","type":"WheelZoomTool"},{"attributes":{"callback":null,"factors":["r\u00e8gle","association","extraction","mesure","base","extraire","confiance","indice","associatif","nombre"]},"id":"1463","type":"FactorRange"}],"root_ids":["1462"]},"title":"Bokeh Application","version":"1.0.2"}}; var render_items = [{"docid":"6adfdf41-be57-4d9c-8c59-fc8653e125fb","roots":{"1462":"7dbab8e9-d6d6-410b-bb3a-d65c2e1a8382"}}]; root.Bokeh.embed.embed_items_notebook(docs_json, render_items); } if (root.Bokeh !== undefined) { embed_document(root); } else { var attempts = 0; var timer = setInterval(function(root) { if (root.Bokeh !== undefined) { embed_document(root); clearInterval(timer); } attempts++; if (attempts > 100) { console.log("Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing"); clearInterval(timer); } }, 10, root) } })(window);  ##### Top 3 articles  In [12]: top_documents_df(3).head()   Out[12]: .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } Article title Year 0 Hiérarchisation des règles d'association en fouille de textes 2005 1 Critère VT100 de sélection des règles d'association 2006 2 Extraction optimisée de Règles d'Association Positives et Négatives (RAPN) 2013  #### Evolution of the frequencies of topics 3 and 12 The figure below shows the frequency of topics 12 (social network analysis and mining) and 3 (association rule mining) per year, from 2004 until 2015. The frequency of a topic for a given year is defined as the proportion of articles, among those published this year, that belong to the corresponding cluster. This figure reveals two opposite trends: topic 12 is emerging and topic 3 is fading over time. While there was apparently no article about social network analysis in 2004, in 2013, 12% of the articles presented at the conference were related to this topic. In contrast, papers related to association rule mining were the most frequent in 2006 (12%), but their frequency dropped down to as low as 0.2% in 2014. This illustrates how the attention of the members of the EGC society is shifting between topics through time. This goes on to show that the EGC society is evolving and is enlarging its scope to incorporate works about novel issues.  In [13]: p = figure(plot_height=250, x_axis_label='year', y_axis_label='topic frequency') p.line(range(2004, 2015), [topic_model.topic_frequency(3, date=i) for i in range(2004, 2015)], line_width=2, line_color='blue', legend='topic #3') p.line(range(2004, 2015), [topic_model.topic_frequency(12, date=i) for i in range(2004, 2015)], line_width=2, line_color='red', legend='topic #12') show(p)   var element =('#54ed2ee2-d6b4-46ff-a2e4-69f4ba1cdd82');
(function(root) {
function embed_document(root) {

var docs_json = {"f2ea5bf5-15bc-4c19-a5c9-15993fc77286":{"roots":{"references":[{"attributes":{"below":[{"id":"1608","type":"LinearAxis"}],"left":[{"id":"1613","type":"LinearAxis"}],"plot_height":250,"renderers":[{"id":"1608","type":"LinearAxis"},{"id":"1612","type":"Grid"},{"id":"1613","type":"LinearAxis"},{"id":"1617","type":"Grid"},{"id":"1626","type":"BoxAnnotation"},{"id":"1645","type":"Legend"},{"id":"1636","type":"GlyphRenderer"},{"id":"1650","type":"GlyphRenderer"}],"title":{"id":"1639","type":"Title"},"toolbar":{"id":"1624","type":"Toolbar"},"x_range":{"id":"1600","type":"DataRange1d"},"x_scale":{"id":"1604","type":"LinearScale"},"y_range":{"id":"1602","type":"DataRange1d"},"y_scale":{"id":"1606","type":"LinearScale"}},"id":"1599","subtype":"Figure","type":"Plot"},{"attributes":{},"id":"1642","type":"BasicTickFormatter"},{"attributes":{"active_drag":"auto","active_inspect":"auto","active_multi":null,"active_scroll":"auto","active_tap":"auto","tools":[{"id":"1618","type":"PanTool"},{"id":"1619","type":"WheelZoomTool"},{"id":"1620","type":"BoxZoomTool"},{"id":"1621","type":"SaveTool"},{"id":"1622","type":"ResetTool"},{"id":"1623","type":"HelpTool"}]},"id":"1624","type":"Toolbar"},{"attributes":{},"id":"1640","type":"BasicTickFormatter"},{"attributes":{},"id":"1621","type":"SaveTool"},{"attributes":{"callback":null},"id":"1602","type":"DataRange1d"},{"attributes":{"items":[{"id":"1646","type":"LegendItem"},{"id":"1661","type":"LegendItem"}],"plot":{"id":"1599","subtype":"Figure","type":"Plot"}},"id":"1645","type":"Legend"},{"attributes":{"overlay":{"id":"1626","type":"BoxAnnotation"}},"id":"1620","type":"BoxZoomTool"},{"attributes":{},"id":"1716","type":"UnionRenderers"},{"attributes":{},"id":"1606","type":"LinearScale"},{"attributes":{},"id":"1658","type":"Selection"},{"attributes":{"line_alpha":0.1,"line_color":"#1f77b4","line_width":2,"x":{"field":"x"},"y":{"field":"y"}},"id":"1649","type":"Line"},{"attributes":{"line_alpha":0.1,"line_color":"#1f77b4","line_width":2,"x":{"field":"x"},"y":{"field":"y"}},"id":"1635","type":"Line"},{"attributes":{"label":{"value":"topic #3"},"renderers":[{"id":"1636","type":"GlyphRenderer"}]},"id":"1646","type":"LegendItem"},{"attributes":{"axis_label":"year","formatter":{"id":"1640","type":"BasicTickFormatter"},"plot":{"id":"1599","subtype":"Figure","type":"Plot"},"ticker":{"id":"1609","type":"BasicTicker"}},"id":"1608","type":"LinearAxis"},{"attributes":{"line_color":"red","line_width":2,"x":{"field":"x"},"y":{"field":"y"}},"id":"1648","type":"Line"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"1626","type":"BoxAnnotation"},{"attributes":{},"id":"1614","type":"BasicTicker"},{"attributes":{"callback":null,"data":{"x":[2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014],"y":[0.08620689655172414,0.06756756756756757,0.09210526315789473,0.08888888888888889,0.06521739130434782,0.12499999999999997,0.08974358974358973,0.16666666666666663,0.1,0.1568627450980392,0.15384615384615385]},"selected":{"id":"1715","type":"Selection"},"selection_policy":{"id":"1716","type":"UnionRenderers"}},"id":"1647","type":"ColumnDataSource"},{"attributes":{"source":{"id":"1633","type":"ColumnDataSource"}},"id":"1637","type":"CDSView"},{"attributes":{},"id":"1618","type":"PanTool"},{"attributes":{},"id":"1622","type":"ResetTool"},{"attributes":{},"id":"1604","type":"LinearScale"},{"attributes":{"data_source":{"id":"1647","type":"ColumnDataSource"},"glyph":{"id":"1648","type":"Line"},"hover_glyph":null,"muted_glyph":null,"nonselection_glyph":{"id":"1649","type":"Line"},"selection_glyph":null,"view":{"id":"1651","type":"CDSView"}},"id":"1650","type":"GlyphRenderer"},{"attributes":{"axis_label":"topic frequency","formatter":{"id":"1642","type":"BasicTickFormatter"},"plot":{"id":"1599","subtype":"Figure","type":"Plot"},"ticker":{"id":"1614","type":"BasicTicker"}},"id":"1613","type":"LinearAxis"},{"attributes":{"data_source":{"id":"1633","type":"ColumnDataSource"},"glyph":{"id":"1634","type":"Line"},"hover_glyph":null,"muted_glyph":null,"nonselection_glyph":{"id":"1635","type":"Line"},"selection_glyph":null,"view":{"id":"1637","type":"CDSView"}},"id":"1636","type":"GlyphRenderer"},{"attributes":{},"id":"1659","type":"UnionRenderers"},{"attributes":{"callback":null,"data":{"x":[2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014],"y":[0.10344827586206898,0.10810810810810811,0.13157894736842105,0.08888888888888889,0.043478260869565216,0.05357142857142857,0.05128205128205128,0.07575757575757576,0.08,0.0392156862745098,0.01282051282051282]},"selected":{"id":"1658","type":"Selection"},"selection_policy":{"id":"1659","type":"UnionRenderers"}},"id":"1633","type":"ColumnDataSource"},{"attributes":{},"id":"1609","type":"BasicTicker"},{"attributes":{"label":{"value":"topic #12"},"renderers":[{"id":"1650","type":"GlyphRenderer"}]},"id":"1661","type":"LegendItem"},{"attributes":{"plot":null,"text":""},"id":"1639","type":"Title"},{"attributes":{},"id":"1623","type":"HelpTool"},{"attributes":{"callback":null},"id":"1600","type":"DataRange1d"},{"attributes":{"source":{"id":"1647","type":"ColumnDataSource"}},"id":"1651","type":"CDSView"},{"attributes":{},"id":"1715","type":"Selection"},{"attributes":{"dimension":1,"plot":{"id":"1599","subtype":"Figure","type":"Plot"},"ticker":{"id":"1614","type":"BasicTicker"}},"id":"1617","type":"Grid"},{"attributes":{},"id":"1619","type":"WheelZoomTool"},{"attributes":{"plot":{"id":"1599","subtype":"Figure","type":"Plot"},"ticker":{"id":"1609","type":"BasicTicker"}},"id":"1612","type":"Grid"},{"attributes":{"line_color":"blue","line_width":2,"x":{"field":"x"},"y":{"field":"y"}},"id":"1634","type":"Line"}],"root_ids":["1599"]},"title":"Bokeh Application","version":"1.0.2"}};
var render_items = [{"docid":"f2ea5bf5-15bc-4c19-a5c9-15993fc77286","roots":{"1599":"61414fc3-e94b-4c42-9929-4141c11a356a"}}];
root.Bokeh.embed.embed_items_notebook(docs_json, render_items);

}
if (root.Bokeh !== undefined) {
embed_document(root);
} else {
var attempts = 0;
var timer = setInterval(function(root) {
if (root.Bokeh !== undefined) {
embed_document(root);
clearInterval(timer);
}
attempts++;
if (attempts > 100) {
console.log("Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing");
clearInterval(timer);
}
}, 10, root)
}
})(window);




In [ ]: