In [ ]:
%%HTML
<style>
body:after {
background-image: url('lincs.png');
background-size: 200px 100px;
position: fixed;
bottom: 1em;
right: 8em;
width: 200px;
height: 100px;
content:"";
/*box-shadow: 5px 5px 10px #000; */
}
</style>
In [ ]:
%pylab inline
In [ ]:
from traitlets.config.manager import BaseJSONConfigManager
path = "/home/celine/Programmes/anaconda3/etc/jupyter/nbconfig"
myconf = BaseJSONConfigManager(config_dir=path)
myconf.update('livereveal', {
'width': 1000,
'height': 800,
'theme': 'Moon',
'transition': 'Slide',
'slideNumber': True,
'start_slideshow_at': 'selected',
}
)
# 'width': 1024,
# 'height': 768,
# 'scroll': True,
# 'theme': Black (default) - White - League - Sky - Beige
# - Simple - Serif - Blood - Night - Moon - Solarized)
# 'transition': 'Concave', (None - Fade - Slide - Convex - Concave - Zoom)
# 'slideNumber': False,
# 'start_slideshow_at': 'selected',
If you want to deepen your theoretical knowledge of power laws, you can read (this is not mandatory):
Further readings:
A continuous real random variable $X$ has a power-law distribution if its tail fails according to some power: $$ \mathbb{P}\left\{ X > x \right\} = \left( \frac{x_{min}}{x} \right)^\beta, \quad \forall x \ge x_{min}. $$
In [ ]:
# Power law arguments
xmin = 1.
t = arange(xmin, 20, 0.01)
beta = 1.5
# Geometric law arguments
p = (beta - 1) / beta
n = arange(0, 20, 1)
# CCDF
fig = figure(figsize=(16, 12))
clf()
plot(t, pow(xmin/t, beta), label = r'Power law ($\beta = 3/2, x_{min} = 1$)', linewidth=2.0)
fill_between(t, 0, pow(xmin/t, beta), alpha = .4)
bar(n, pow(1-p, n), 1, color = "sandybrown", edgecolor = "white",
label = r'Geometric ($p = 1/3$)', linewidth=2.0)
axis([0, 20, 0, 1.])
xticks(fontsize = 20)
yticks(fontsize = 20)
xlabel("$x$", fontsize = 20)
ylabel("Survival function", fontsize = 20)
legend(fontsize = 20)
show()
In [ ]:
# CCDF semilog
t = arange(xmin, 100, 0.01)
n = arange(0, 100, 1)
fig = figure(figsize=(16, 12))
ax = fig.add_subplot(111)
ax.loglog(t, pow(xmin/t, beta), label = r'Power law ($\beta = 3/2, x_{min} = 1$)', linewidth=2.0)
fill_between(t, 0, pow(xmin/t, beta), alpha = .4)
ax.bar(n, pow(1-p, n), 1, color = "sandybrown", edgecolor = "white",
label = r'Geometric ($p = 1/3$)', linewidth=2.0)
ax.set_xscale("log", nonposy='clip')
ax.set_yscale("log", nonposy='clip')
axis([0, 100, 0, 1.])
xticks(fontsize = 20)
yticks(fontsize = 20)
xlabel("$x$", fontsize = 20)
ylabel("Survival function", fontsize = 20)
legend(loc = 3, fontsize = 20)
show()
In [ ]:
t = arange(xmin, 50, 0.01)
n = arange(1, 50, 1)
s = arange(0, 50, 0.01)
# PDF
fig = figure(figsize=(16, 12))
ax = fig.add_subplot(111)
ax.loglog(t, beta * pow(xmin, beta) / pow(t, beta + 1),
label = r'Power law ($\beta = 3/2, x_{min} = 1$)', linewidth=2.0)
fill_between(t, 0, beta * pow(xmin, beta) / pow(t, beta + 1), alpha = .4)
ax.bar(n, p * pow(1-p, n-1), 1, color = "sandybrown", edgecolor = "white",
label = r'Geometric ($p = 1/3$)', linewidth=2.0)
ax.set_xscale("log", nonposy='clip')
ax.set_yscale("log", nonposy='clip')
axis([1, 50, 0, 1.5])
xticks(fontsize = 20)
yticks(fontsize = 20)
xlabel("$t$", fontsize = 20)
ylabel("Probability distribution function (pdf)", fontsize = 20)
legend(fontsize = 20)
show()
In [ ]:
t = arange(xmin, 50, 0.01)
n = arange(1, 50, 1)
#CDF
fig = figure(figsize=(16, 12))
ax = fig.add_subplot(111)
ax.loglog(t, 1 - pow(xmin/t, beta), label = r'Power law ($\beta = 3/2, x_{min} = 1$)', linewidth=2.0)
fill_between(t, 0, 1 - pow(xmin/t, beta), alpha = .4)
ax.bar(n, 1 - pow(1-p, n), 1, color = "sandybrown", edgecolor = "white",
label = r'Geometric ($p = 1/3$)', linewidth=2.0)
ax.set_xscale("log", nonposy='clip')
ax.set_yscale("log", nonposy='clip')
axis([1, 50, 0, 1])
xticks(fontsize = 20)
yticks(fontsize = 20)
xlabel("$x$", fontsize = 20)
ylabel("Cumulative distribution function (cdf)", fontsize = 20)
legend(loc = 4, fontsize = 20)
show()
\mathbb{P}\{ X = k \}
= \frac{ \frac1{k^\tau} }{ \sum_{n=1}^N \frac1{n^\tau} },
\quad \forall k = 1,\ldots,N.
$$A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, Graph structure in the Web. Comput. Netw. 33, 1-6, 309-320, 2000.
Concretely:
"... there are many small elements contained within the Web, but few large ones. A few sites consist of millions of pages, but millions of sites only contain a handful of pages. Few sites contain millions of links, but many sites have one or two. Millions of users flock to a few select sites, giving little attention to millions of others."
L.A. Adamic and A.H. Bernardo, Zipf’s law and the Internet. Glottometrics 3.1, 143-150 2002.
Word frequency
"Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.: the rank-frequency distribution is an inverse relation." See Wikipedia, Zipf's law.
A.-L. Barabasi and R. Albert, Emergence of scaling in random networks, Science, 286, 509–512, 1999.
\frac{c_i}{c_{i-1}} = \frac{\alpha + \frac{1-\alpha}2 (i-1)}{1 + \alpha + \frac{1-\alpha}2 i},
\quad \forall i > 1.
$$\frac{c_i}{c_{i-1}}
= 1 - \frac{3 - \alpha}{2 + 2 \alpha + (1 - \alpha)i}
= 1 - \frac1i \frac{3 - \alpha}{1 - \alpha} + O\left( \frac1{i^2} \right).
\end{align*}c_i
&= c_1 \prod_{j=2}^i \left( 1 - \frac\beta{j} + O\left( \frac1{j^2} \right) \right).
\end{align*}&\ln\left( \prod_{j=1}^i \left( 1 - \frac\beta{j} \right) \right) \\
&= \sum_{j=1}^i \ln\left( 1 - \frac\beta{j} \right)
\approx \sum_{j=1}^i - \frac\beta{j}, \\
&= - \beta \sum_{j=1}^i \frac1j
\approx - \beta \ln(i).
\end{align*}G. U. Yule, A mathematical theory of evolution based on the conclusions of Dr. J. C. Willis, Philosophical Transactions of the Royal Society of London B, 213, 21–87, 1925.