Python Language ID


In [21]:
from nltk import ngrams

In [53]:
n = 3
text = """
Avram Noam Chomsky (US Listeni/æˈvrɑːm ˈnoʊm ˈtʃɒmski/ a-VRAHM nohm CHOM-skee; born December 7, 1928) is an American linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes described as "the father of modern linguistics", Chomsky is also a major figure in analytic philosophy, and one of the founders of the field of cognitive science. He is Institute Professor Emeritus at the Massachusetts Institute of Technology (MIT), where he has worked since 1955, and is the author of over 100 books on topics such as linguistics, war, politics, and mass media. Ideologically, he aligns with anarcho-syndicalism and libertarian socialism.

Born to middle-class Ashkenazi Jewish immigrants in Philadelphia, Chomsky developed an early interest in anarchism from alternative bookstores in New York City. Both Chomsky's parents were eminent Hebrew Scholars.[22] At the age of sixteen he began studies at the University of Pennsylvania, taking courses in linguistics, mathematics, and philosophy. He married fellow linguist Carol Schatz in 1949. From 1951 to 1955 he was appointed to Harvard University's Society of Fellows, where he developed the theory of transformational grammar for which he was awarded his doctorate in 1955. That year he began teaching at MIT, in 1957 emerging as a significant figure in the field of linguistics for his landmark work Syntactic Structures, which remodeled the scientific study of language, while from 1958 to 1959 he was a National Science Foundation fellow at the Institute for Advanced Study. He is credited as the creator or co-creator of the universal grammar theory, the generative grammar theory, the Chomsky hierarchy, and the minimalist program. Chomsky also played a pivotal role in the decline of behaviorism, being particularly critical of the work of B. F. Skinner.

An outspoken opponent of U.S. involvement in the Vietnam War, which he saw as an act of American imperialism, in 1967 Chomsky attracted widespread public attention for his anti-war essay "The Responsibility of Intellectuals". Associated with the New Left, he was arrested multiple times for his activism and placed on President Richard Nixon's Enemies List. While expanding his work in linguistics over subsequent decades, he also became involved in the Linguistics Wars. In collaboration with Edward S. Herman, Chomsky later co-wrote Manufacturing Consent: The Political Economy of the Mass Media, an analysis articulating the propaganda model of media criticism, and worked to expose the Indonesian occupation of East Timor. However, his defense of unconditional freedom of speech—including for Holocaust deniers—generated significant controversy in the Faurisson affair of the early 1980s. Following his retirement from active teaching, he has continued his vocal political activism, including opposing the War on Terror and supporting the Occupy movement.

One of the most cited scholars in history, Chomsky has influenced a broad array of academic fields. He is widely recognized as a paradigm shifter who helped spark a major revolution in the human sciences, contributing to the development of a new cognitivistic framework for the study of language and the mind. In addition to his continued scholarly research, he remains a leading critic of U.S. foreign policy, neoliberalism and contemporary state capitalism, the Israeli–Palestinian conflict, and mainstream news media. His ideas have proved highly significant within the anti-capitalist and anti-imperialist movements, but have also drawn criticism, with some accusing Chomsky of anti-Americanism and alleging that he is sympathetic to terrorism and, in some cases, genocide denial.
"""

enTrigrams = [ "".join(x) for x in ngrams(text, n) ]
# trigrams = ngrams(text, 3)
for i in enTrigrams:
    print( i )


Av
Avr
vra
ram
am 
m N
 No
Noa
oam
am 
m C
 Ch
Cho
hom
oms
msk
sky
ky 
y (
 (U
(US
US 
S L
 Li
Lis
ist
ste
ten
eni
ni/
i/æ
/æˈ
æˈv
ˈvr
vrɑ
rɑː
ɑːm
ːm 
m ˈ
 ˈn
ˈno
noʊ
oʊm
ʊm 
m ˈ
 ˈt
ˈtʃ
tʃɒ
ʃɒm
ɒms
msk
ski
ki/
i/ 
/ a
 a-
a-V
-VR
VRA
RAH
AHM
HM 
M n
 no
noh
ohm
hm 
m C
 CH
CHO
HOM
OM-
M-s
-sk
ske
kee
ee;
e; 
; b
 bo
bor
orn
rn 
n D
 De
Dec
ece
cem
emb
mbe
ber
er 
r 7
 7,
7, 
, 1
 19
192
928
28)
8) 
) i
 is
is 
s a
 an
an 
n A
 Am
Ame
mer
eri
ric
ica
can
an 
n l
 li
lin
ing
ngu
gui
uis
ist
st,
t, 
, p
 ph
phi
hil
ilo
los
oso
sop
oph
phe
her
er,
r, 
, c
 co
cog
ogn
gni
nit
iti
tiv
ive
ve 
e s
 sc
sci
cie
ien
ent
nti
tis
ist
st,
t, 
, h
 hi
his
ist
sto
tor
ori
ria
ian
an,
n, 
, s
 so
soc
oci
cia
ial
al 
l c
 cr
cri
rit
iti
tic
ic,
c, 
, a
 an
and
nd 
d p
 po
pol
oli
lit
iti
tic
ica
cal
al 
l a
 ac
act
cti
tiv
ivi
vis
ist
st.
t. 
. S
 So
Som
ome
met
eti
tim
ime
mes
es 
s d
 de
des
esc
scr
cri
rib
ibe
bed
ed 
d a
 as
as 
s "
 "t
"th
the
he 
e f
 fa
fat
ath
the
her
er 
r o
 of
of 
f m
 mo
mod
ode
der
ern
rn 
n l
 li
lin
ing
ngu
gui
uis
ist
sti
tic
ics
cs"
s",
", 
, C
 Ch
Cho
hom
oms
msk
sky
ky 
y i
 is
is 
s a
 al
als
lso
so 
o a
 a 
a m
 ma
maj
ajo
jor
or 
r f
 fi
fig
igu
gur
ure
re 
e i
 in
in 
n a
 an
ana
nal
aly
lyt
yti
tic
ic 
c p
 ph
phi
hil
ilo
los
oso
sop
oph
phy
hy,
y, 
, a
 an
and
nd 
d o
 on
one
ne 
e o
 of
of 
f t
 th
the
he 
e f
 fo
fou
oun
und
nde
der
ers
rs 
s o
 of
of 
f t
 th
the
he 
e f
 fi
fie
iel
eld
ld 
d o
 of
of 
f c
 co
cog
ogn
gni
nit
iti
tiv
ive
ve 
e s
 sc
sci
cie
ien
enc
nce
ce.
e. 
. H
 He
He 
e i
 is
is 
s I
 In
Ins
nst
sti
tit
itu
tut
ute
te 
e P
 Pr
Pro
rof
ofe
fes
ess
sso
sor
or 
r E
 Em
Eme
mer
eri
rit
itu
tus
us 
s a
 at
at 
t t
 th
the
he 
e M
 Ma
Mas
ass
ssa
sac
ach
chu
hus
use
set
ett
tts
ts 
s I
 In
Ins
nst
sti
tit
itu
tut
ute
te 
e o
 of
of 
f T
 Te
Tec
ech
chn
hno
nol
olo
log
ogy
gy 
y (
 (M
(MI
MIT
IT)
T),
), 
, w
 wh
whe
her
ere
re 
e h
 he
he 
e h
 ha
has
as 
s w
 wo
wor
ork
rke
ked
ed 
d s
 si
sin
inc
nce
ce 
e 1
 19
195
955
55,
5, 
, a
 an
and
nd 
d i
 is
is 
s t
 th
the
he 
e a
 au
aut
uth
tho
hor
or 
r o
 of
of 
f o
 ov
ove
ver
er 
r 1
 10
100
00 
0 b
 bo
boo
ook
oks
ks 
s o
 on
on 
n t
 to
top
opi
pic
ics
cs 
s s
 su
suc
uch
ch 
h a
 as
as 
s l
 li
lin
ing
ngu
gui
uis
ist
sti
tic
ics
cs,
s, 
, w
 wa
war
ar,
r, 
, p
 po
pol
oli
lit
iti
tic
ics
cs,
s, 
, a
 an
and
nd 
d m
 ma
mas
ass
ss 
s m
 me
med
edi
dia
ia.
a. 
. I
 Id
Ide
deo
eol
olo
log
ogi
gic
ica
cal
all
lly
ly,
y, 
, h
 he
he 
e a
 al
ali
lig
ign
gns
ns 
s w
 wi
wit
ith
th 
h a
 an
ana
nar
arc
rch
cho
ho-
o-s
-sy
syn
ynd
ndi
dic
ica
cal
ali
lis
ism
sm 
m a
 an
and
nd 
d l
 li
lib
ibe
ber
ert
rta
tar
ari
ria
ian
an 
n s
 so
soc
oci
cia
ial
ali
lis
ism
sm.
m.

.




B

Bo
Bor
orn
rn 
n t
 to
to 
o m
 mi
mid
idd
ddl
dle
le-
e-c
-cl
cla
las
ass
ss 
s A
 As
Ash
shk
hke
ken
ena
naz
azi
zi 
i J
 Je
Jew
ewi
wis
ish
sh 
h i
 im
imm
mmi
mig
igr
gra
ran
ant
nts
ts 
s i
 in
in 
n P
 Ph
Phi
hil
ila
lad
ade
del
elp
lph
phi
hia
ia,
a, 
, C
 Ch
Cho
hom
oms
msk
sky
ky 
y d
 de
dev
eve
vel
elo
lop
ope
ped
ed 
d a
 an
an 
n e
 ea
ear
arl
rly
ly 
y i
 in
int
nte
ter
ere
res
est
st 
t i
 in
in 
n a
 an
ana
nar
arc
rch
chi
his
ism
sm 
m f
 fr
fro
rom
om 
m a
 al
alt
lte
ter
ern
rna
nat
ati
tiv
ive
ve 
e b
 bo
boo
ook
oks
kst
sto
tor
ore
res
es 
s i
 in
in 
n N
 Ne
New
ew 
w Y
 Yo
Yor
ork
rk 
k C
 Ci
Cit
ity
ty.
y. 
. B
 Bo
Bot
oth
th 
h C
 Ch
Cho
hom
oms
msk
sky
ky'
y's
's 
s p
 pa
par
are
ren
ent
nts
ts 
s w
 we
wer
ere
re 
e e
 em
emi
min
ine
nen
ent
nt 
t H
 He
Heb
ebr
bre
rew
ew 
w S
 Sc
Sch
cho
hol
ola
lar
ars
rs.
s.[
.[2
[22
22]
2] 
] A
 At
At 
t t
 th
the
he 
e a
 ag
age
ge 
e o
 of
of 
f s
 si
six
ixt
xte
tee
een
en 
n h
 he
he 
e b
 be
beg
ega
gan
an 
n s
 st
stu
tud
udi
die
ies
es 
s a
 at
at 
t t
 th
the
he 
e U
 Un
Uni
niv
ive
ver
ers
rsi
sit
ity
ty 
y o
 of
of 
f P
 Pe
Pen
enn
nns
nsy
syl
ylv
lva
van
ani
nia
ia,
a, 
, t
 ta
tak
aki
kin
ing
ng 
g c
 co
cou
our
urs
rse
ses
es 
s i
 in
in 
n l
 li
lin
ing
ngu
gui
uis
ist
sti
tic
ics
cs,
s, 
, m
 ma
mat
ath
the
hem
ema
mat
ati
tic
ics
cs,
s, 
, a
 an
and
nd 
d p
 ph
phi
hil
ilo
los
oso
sop
oph
phy
hy.
y. 
. H
 He
He 
e m
 ma
mar
arr
rri
rie
ied
ed 
d f
 fe
fel
ell
llo
low
ow 
w l
 li
lin
ing
ngu
gui
uis
ist
st 
t C
 Ca
Car
aro
rol
ol 
l S
 Sc
Sch
cha
hat
atz
tz 
z i
 in
in 
n 1
 19
194
949
49.
9. 
. F
 Fr
Fro
rom
om 
m 1
 19
195
951
51 
1 t
 to
to 
o 1
 19
195
955
55 
5 h
 he
he 
e w
 wa
was
as 
s a
 ap
app
ppo
poi
oin
int
nte
ted
ed 
d t
 to
to 
o H
 Ha
Har
arv
rva
var
ard
rd 
d U
 Un
Uni
niv
ive
ver
ers
rsi
sit
ity
ty'
y's
's 
s S
 So
Soc
oci
cie
iet
ety
ty 
y o
 of
of 
f F
 Fe
Fel
ell
llo
low
ows
ws,
s, 
, w
 wh
whe
her
ere
re 
e h
 he
he 
e d
 de
dev
eve
vel
elo
lop
ope
ped
ed 
d t
 th
the
he 
e t
 th
the
heo
eor
ory
ry 
y o
 of
of 
f t
 tr
tra
ran
ans
nsf
sfo
for
orm
rma
mat
ati
tio
ion
ona
nal
al 
l g
 gr
gra
ram
amm
mma
mar
ar 
r f
 fo
for
or 
r w
 wh
whi
hic
ich
ch 
h h
 he
he 
e w
 wa
was
as 
s a
 aw
awa
war
ard
rde
ded
ed 
d h
 hi
his
is 
s d
 do
doc
oct
cto
tor
ora
rat
ate
te 
e i
 in
in 
n 1
 19
195
955
55.
5. 
. T
 Th
Tha
hat
at 
t y
 ye
yea
ear
ar 
r h
 he
he 
e b
 be
beg
ega
gan
an 
n t
 te
tea
eac
ach
chi
hin
ing
ng 
g a
 at
at 
t M
 MI
MIT
IT,
T, 
, i
 in
in 
n 1
 19
195
957
57 
7 e
 em
eme
mer
erg
rgi
gin
ing
ng 
g a
 as
as 
s a
 a 
a s
 si
sig
ign
gni
nif
ifi
fic
ica
can
ant
nt 
t f
 fi
fig
igu
gur
ure
re 
e i
 in
in 
n t
 th
the
he 
e f
 fi
fie
iel
eld
ld 
d o
 of
of 
f l
 li
lin
ing
ngu
gui
uis
ist
sti
tic
ics
cs 
s f
 fo
for
or 
r h
 hi
his
is 
s l
 la
lan
and
ndm
dma
mar
ark
rk 
k w
 wo
wor
ork
rk 
k S
 Sy
Syn
ynt
nta
tac
act
cti
tic
ic 
c S
 St
Str
tru
ruc
uct
ctu
tur
ure
res
es,
s, 
, w
 wh
whi
hic
ich
ch 
h r
 re
rem
emo
mod
ode
del
ele
led
ed 
d t
 th
the
he 
e s
 sc
sci
cie
ien
ent
nti
tif
ifi
fic
ic 
c s
 st
stu
tud
udy
dy 
y o
 of
of 
f l
 la
lan
ang
ngu
gua
uag
age
ge,
e, 
, w
 wh
whi
hil
ile
le 
e f
 fr
fro
rom
om 
m 1
 19
195
958
58 
8 t
 to
to 
o 1
 19
195
959
59 
9 h
 he
he 
e w
 wa
was
as 
s a
 a 
a N
 Na
Nat
ati
tio
ion
ona
nal
al 
l S
 Sc
Sci
cie
ien
enc
nce
ce 
e F
 Fo
Fou
oun
und
nda
dat
ati
tio
ion
on 
n f
 fe
fel
ell
llo
low
ow 
w a
 at
at 
t t
 th
the
he 
e I
 In
Ins
nst
sti
tit
itu
tut
ute
te 
e f
 fo
for
or 
r A
 Ad
Adv
dva
van
anc
nce
ced
ed 
d S
 St
Stu
tud
udy
dy.
y. 
. H
 He
He 
e i
 is
is 
s c
 cr
cre
red
edi
dit
ite
ted
ed 
d a
 as
as 
s t
 th
the
he 
e c
 cr
cre
rea
eat
ato
tor
or 
r o
 or
or 
r c
 co
co-
o-c
-cr
cre
rea
eat
ato
tor
or 
r o
 of
of 
f t
 th
the
he 
e u
 un
uni
niv
ive
ver
ers
rsa
sal
al 
l g
 gr
gra
ram
amm
mma
mar
ar 
r t
 th
the
heo
eor
ory
ry,
y, 
, t
 th
the
he 
e g
 ge
gen
ene
ner
era
rat
ati
tiv
ive
ve 
e g
 gr
gra
ram
amm
mma
mar
ar 
r t
 th
the
heo
eor
ory
ry,
y, 
, t
 th
the
he 
e C
 Ch
Cho
hom
oms
msk
sky
ky 
y h
 hi
hie
ier
era
rar
arc
rch
chy
hy,
y, 
, a
 an
and
nd 
d t
 th
the
he 
e m
 mi
min
ini
nim
ima
mal
ali
lis
ist
st 
t p
 pr
pro
rog
ogr
gra
ram
am.
m. 
. C
 Ch
Cho
hom
oms
msk
sky
ky 
y a
 al
als
lso
so 
o p
 pl
pla
lay
aye
yed
ed 
d a
 a 
a p
 pi
piv
ivo
vot
ota
tal
al 
l r
 ro
rol
ole
le 
e i
 in
in 
n t
 th
the
he 
e d
 de
dec
ecl
cli
lin
ine
ne 
e o
 of
of 
f b
 be
beh
eha
hav
avi
vio
ior
ori
ris
ism
sm,
m, 
, b
 be
bei
ein
ing
ng 
g p
 pa
par
art
rti
tic
icu
cul
ula
lar
arl
rly
ly 
y c
 cr
cri
rit
iti
tic
ica
cal
al 
l o
 of
of 
f t
 th
the
he 
e w
 wo
wor
ork
rk 
k o
 of
of 
f B
 B.
B. 
. F
 F.
F. 
. S
 Sk
Ski
kin
inn
nne
ner
er.
r.

.




A

An
An 
n o
 ou
out
uts
tsp
spo
pok
oke
ken
en 
n o
 op
opp
ppo
pon
one
nen
ent
nt 
t o
 of
of 
f U
 U.
U.S
.S.
S. 
. i
 in
inv
nvo
vol
olv
lve
vem
eme
men
ent
nt 
t i
 in
in 
n t
 th
the
he 
e V
 Vi
Vie
iet
etn
tna
nam
am 
m W
 Wa
War
ar,
r, 
, w
 wh
whi
hic
ich
ch 
h h
 he
he 
e s
 sa
saw
aw 
w a
 as
as 
s a
 an
an 
n a
 ac
act
ct 
t o
 of
of 
f A
 Am
Ame
mer
eri
ric
ica
can
an 
n i
 im
imp
mpe
per
eri
ria
ial
ali
lis
ism
sm,
m, 
, i
 in
in 
n 1
 19
196
967
67 
7 C
 Ch
Cho
hom
oms
msk
sky
ky 
y a
 at
att
ttr
tra
rac
act
cte
ted
ed 
d w
 wi
wid
ide
des
esp
spr
pre
rea
ead
ad 
d p
 pu
pub
ubl
bli
lic
ic 
c a
 at
att
tte
ten
ent
nti
tio
ion
on 
n f
 fo
for
or 
r h
 hi
his
is 
s a
 an
ant
nti
ti-
i-w
-wa
war
ar 
r e
 es
ess
ssa
say
ay 
y "
 "T
"Th
The
he 
e R
 Re
Res
esp
spo
pon
ons
nsi
sib
ibi
bil
ili
lit
ity
ty 
y o
 of
of 
f I
 In
Int
nte
tel
ell
lle
lec
ect
ctu
tua
ual
als
ls"
s".
". 
. A
 As
Ass
sso
soc
oci
cia
iat
ate
ted
ed 
d w
 wi
wit
ith
th 
h t
 th
the
he 
e N
 Ne
New
ew 
w L
 Le
Lef
eft
ft,
t, 
, h
 he
he 
e w
 wa
was
as 
s a
 ar
arr
rre
res
est
ste
ted
ed 
d m
 mu
mul
ult
lti
tip
ipl
ple
le 
e t
 ti
tim
ime
mes
es 
s f
 fo
for
or 
r h
 hi
his
is 
s a
 ac
act
cti
tiv
ivi
vis
ism
sm 
m a
 an
and
nd 
d p
 pl
pla
lac
ace
ced
ed 
d o
 on
on 
n P
 Pr
Pre
res
esi
sid
ide
den
ent
nt 
t R
 Ri
Ric
ich
cha
har
ard
rd 
d N
 Ni
Nix
ixo
xon
on'
n's
's 
s E
 En
Ene
nem
emi
mie
ies
es 
s L
 Li
Lis
ist
st.
t. 
. W
 Wh
Whi
hil
ile
le 
e e
 ex
exp
xpa
pan
and
ndi
din
ing
ng 
g h
 hi
his
is 
s w
 wo
wor
ork
rk 
k i
 in
in 
n l
 li
lin
ing
ngu
gui
uis
ist
sti
tic
ics
cs 
s o
 ov
ove
ver
er 
r s
 su
sub
ubs
bse
seq
equ
que
uen
ent
nt 
t d
 de
dec
eca
cad
ade
des
es,
s, 
, h
 he
he 
e a
 al
als
lso
so 
o b
 be
bec
eca
cam
ame
me 
e i
 in
inv
nvo
vol
olv
lve
ved
ed 
d i
 in
in 
n t
 th
the
he 
e L
 Li
Lin
ing
ngu
gui
uis
ist
sti
tic
ics
cs 
s W
 Wa
War
ars
rs.
s. 
. I
 In
In 
n c
 co
col
oll
lla
lab
abo
bor
ora
rat
ati
tio
ion
on 
n w
 wi
wit
ith
th 
h E
 Ed
Edw
dwa
war
ard
rd 
d S
 S.
S. 
. H
 He
Her
erm
rma
man
an,
n, 
, C
 Ch
Cho
hom
oms
msk
sky
ky 
y l
 la
lat
ate
ter
er 
r c
 co
co-
o-w
-wr
wro
rot
ote
te 
e M
 Ma
Man
anu
nuf
ufa
fac
act
ctu
tur
uri
rin
ing
ng 
g C
 Co
Con
ons
nse
sen
ent
nt:
t: 
: T
 Th
The
he 
e P
 Po
Pol
oli
lit
iti
tic
ica
cal
al 
l E
 Ec
Eco
con
ono
nom
omy
my 
y o
 of
of 
f t
 th
the
he 
e M
 Ma
Mas
ass
ss 
s M
 Me
Med
edi
dia
ia,
a, 
, a
 an
an 
n a
 an
ana
nal
aly
lys
ysi
sis
is 
s a
 ar
art
rti
tic
icu
cul
ula
lat
ati
tin
ing
ng 
g t
 th
the
he 
e p
 pr
pro
rop
opa
pag
aga
gan
and
nda
da 
a m
 mo
mod
ode
del
el 
l o
 of
of 
f m
 me
med
edi
dia
ia 
a c
 cr
cri
rit
iti
tic
ici
cis
ism
sm,
m, 
, a
 an
and
nd 
d w
 wo
wor
ork
rke
ked
ed 
d t
 to
to 
o e
 ex
exp
xpo
pos
ose
se 
e t
 th
the
he 
e I
 In
Ind
ndo
don
one
nes
esi
sia
ian
an 
n o
 oc
occ
ccu
cup
upa
pat
ati
tio
ion
on 
n o
 of
of 
f E
 Ea
Eas
ast
st 
t T
 Ti
Tim
imo
mor
or.
r. 
. H
 Ho
How
owe
wev
eve
ver
er,
r, 
, h
 hi
his
is 
s d
 de
def
efe
fen
ens
nse
se 
e o
 of
of 
f u
 un
unc
nco
con
ond
ndi
dit
iti
tio
ion
ona
nal
al 
l f
 fr
fre
ree
eed
edo
dom
om 
m o
 of
of 
f s
 sp
spe
pee
eec
ech
ch—
h—i
—in
inc
ncl
clu
lud
udi
din
ing
ng 
g f
 fo
for
or 
r H
 Ho
Hol
olo
loc
oca
cau
aus
ust
st 
t d
 de
den
eni
nie
ier
ers
rs—
s—g
—ge
gen
ene
ner
era
rat
ate
ted
ed 
d s
 si
sig
ign
gni
nif
ifi
fic
ica
can
ant
nt 
t c
 co
con
ont
ntr
tro
rov
ove
ver
ers
rsy
sy 
y i
 in
in 
n t
 th
the
he 
e F
 Fa
Fau
aur
uri
ris
iss
sso
son
on 
n a
 af
aff
ffa
fai
air
ir 
r o
 of
of 
f t
 th
the
he 
e e
 ea
ear
arl
rly
ly 
y 1
 19
198
980
80s
0s.
s. 
. F
 Fo
Fol
oll
llo
low
owi
win
ing
ng 
g h
 hi
his
is 
s r
 re
ret
eti
tir
ire
rem
eme
men
ent
nt 
t f
 fr
fro
rom
om 
m a
 ac
act
cti
tiv
ive
ve 
e t
 te
tea
eac
ach
chi
hin
ing
ng,
g, 
, h
 he
he 
e h
 ha
has
as 
s c
 co
con
ont
nti
tin
inu
nue
ued
ed 
d h
 hi
his
is 
s v
 vo
voc
oca
cal
al 
l p
 po
pol
oli
lit
iti
tic
ica
cal
al 
l a
 ac
act
cti
tiv
ivi
vis
ism
sm,
m, 
, i
 in
inc
ncl
clu
lud
udi
din
ing
ng 
g o
 op
opp
ppo
pos
osi
sin
ing
ng 
g t
 th
the
he 
e W
 Wa
War
ar 
r o
 on
on 
n T
 Te
Ter
err
rro
ror
or 
r a
 an
and
nd 
d s
 su
sup
upp
ppo
por
ort
rti
tin
ing
ng 
g t
 th
the
he 
e O
 Oc
Occ
ccu
cup
upy
py 
y m
 mo
mov
ove
vem
eme
men
ent
nt.
t.

.




O

On
One
ne 
e o
 of
of 
f t
 th
the
he 
e m
 mo
mos
ost
st 
t c
 ci
cit
ite
ted
ed 
d s
 sc
sch
cho
hol
ola
lar
ars
rs 
s i
 in
in 
n h
 hi
his
ist
sto
tor
ory
ry,
y, 
, C
 Ch
Cho
hom
oms
msk
sky
ky 
y h
 ha
has
as 
s i
 in
inf
nfl
flu
lue
uen
enc
nce
ced
ed 
d a
 a 
a b
 br
bro
roa
oad
ad 
d a
 ar
arr
rra
ray
ay 
y o
 of
of 
f a
 ac
aca
cad
ade
dem
emi
mic
ic 
c f
 fi
fie
iel
eld
lds
ds.
s. 
. H
 He
He 
e i
 is
is 
s w
 wi
wid
ide
del
ely
ly 
y r
 re
rec
eco
cog
ogn
gni
niz
ize
zed
ed 
d a
 as
as 
s a
 a 
a p
 pa
par
ara
rad
adi
dig
igm
gm 
m s
 sh
shi
hif
ift
fte
ter
er 
r w
 wh
who
ho 
o h
 he
hel
elp
lpe
ped
ed 
d s
 sp
spa
par
ark
rk 
k a
 a 
a m
 ma
maj
ajo
jor
or 
r r
 re
rev
evo
vol
olu
lut
uti
tio
ion
on 
n i
 in
in 
n t
 th
the
he 
e h
 hu
hum
uma
man
an 
n s
 sc
sci
cie
ien
enc
nce
ces
es,
s, 
, c
 co
con
ont
ntr
tri
rib
ibu
but
uti
tin
ing
ng 
g t
 to
to 
o t
 th
the
he 
e d
 de
dev
eve
vel
elo
lop
opm
pme
men
ent
nt 
t o
 of
of 
f a
 a 
a n
 ne
new
ew 
w c
 co
cog
ogn
gni
nit
iti
tiv
ivi
vis
ist
sti
tic
ic 
c f
 fr
fra
ram
ame
mew
ewo
wor
ork
rk 
k f
 fo
for
or 
r t
 th
the
he 
e s
 st
stu
tud
udy
dy 
y o
 of
of 
f l
 la
lan
ang
ngu
gua
uag
age
ge 
e a
 an
and
nd 
d t
 th
the
he 
e m
 mi
min
ind
nd.
d. 
. I
 In
In 
n a
 ad
add
ddi
dit
iti
tio
ion
on 
n t
 to
to 
o h
 hi
his
is 
s c
 co
con
ont
nti
tin
inu
nue
ued
ed 
d s
 sc
sch
cho
hol
ola
lar
arl
rly
ly 
y r
 re
res
ese
sea
ear
arc
rch
ch,
h, 
, h
 he
he 
e r
 re
rem
ema
mai
ain
ins
ns 
s a
 a 
a l
 le
lea
ead
adi
din
ing
ng 
g c
 cr
cri
rit
iti
tic
ic 
c o
 of
of 
f U
 U.
U.S
.S.
S. 
. f
 fo
for
ore
rei
eig
ign
gn 
n p
 po
pol
oli
lic
icy
cy,
y, 
, n
 ne
neo
eol
oli
lib
ibe
ber
era
ral
ali
lis
ism
sm 
m a
 an
and
nd 
d c
 co
con
ont
nte
tem
emp
mpo
por
ora
rar
ary
ry 
y s
 st
sta
tat
ate
te 
e c
 ca
cap
api
pit
ita
tal
ali
lis
ism
sm,
m, 
, t
 th
the
he 
e I
 Is
Isr
sra
rae
ael
eli
li–
i–P
–Pa
Pal
ale
les
est
sti
tin
ini
nia
ian
an 
n c
 co
con
onf
nfl
fli
lic
ict
ct,
t, 
, a
 an
and
nd 
d m
 ma
mai
ain
ins
nst
str
tre
rea
eam
am 
m n
 ne
new
ews
ws 
s m
 me
med
edi
dia
ia.
a. 
. H
 Hi
His
is 
s i
 id
ide
dea
eas
as 
s h
 ha
hav
ave
ve 
e p
 pr
pro
rov
ove
ved
ed 
d h
 hi
hig
igh
ghl
hly
ly 
y s
 si
sig
ign
gni
nif
ifi
fic
ica
can
ant
nt 
t w
 wi
wit
ith
thi
hin
in 
n t
 th
the
he 
e a
 an
ant
nti
ti-
i-c
-ca
cap
api
pit
ita
tal
ali
lis
ist
st 
t a
 an
and
nd 
d a
 an
ant
nti
ti-
i-i
-im
imp
mpe
per
eri
ria
ial
ali
lis
ist
st 
t m
 mo
mov
ove
vem
eme
men
ent
nts
ts,
s, 
, b
 bu
but
ut 
t h
 ha
hav
ave
ve 
e a
 al
als
lso
so 
o d
 dr
dra
raw
awn
wn 
n c
 cr
cri
rit
iti
tic
ici
cis
ism
sm,
m, 
, w
 wi
wit
ith
th 
h s
 so
som
ome
me 
e a
 ac
acc
ccu
cus
usi
sin
ing
ng 
g C
 Ch
Cho
hom
oms
msk
sky
ky 
y o
 of
of 
f a
 an
ant
nti
ti-
i-A
-Am
Ame
mer
eri
ric
ica
can
ani
nis
ism
sm 
m a
 an
and
nd 
d a
 al
all
lle
leg
egi
gin
ing
ng 
g t
 th
tha
hat
at 
t h
 he
he 
e i
 is
is 
s s
 sy
sym
ymp
mpa
pat
ath
the
het
eti
tic
ic 
c t
 to
to 
o t
 te
ter
err
rro
ror
ori
ris
ism
sm 
m a
 an
and
nd,
d, 
, i
 in
in 
n s
 so
som
ome
me 
e c
 ca
cas
ase
ses
es,
s, 
, g
 ge
gen
eno
noc
oci
cid
ide
de 
e d
 de
den
eni
nia
ial
al.
l.


In [40]:
from collections import Counter

In [54]:
enTfp = Counter(enTrigrams + [ ';;;' ])
#print(enTfp)
totalEn = sum(enTfp.values())
enProbs = dict( [ (x[0], x[1]/totalEn) for x in enTfp.items() ] )
print(enProbs)


{' re': 0.0016220600162206002, 'Man': 0.0002703433360367667, 'mat': 0.0008110300081103001, 'o t': 0.0005406866720735334, 'can': 0.0016220600162206002, 'VRA': 0.0002703433360367667, ' 10': 0.0002703433360367667, 'olo': 0.0008110300081103001, 'ft,': 0.0002703433360367667, ' fa': 0.0002703433360367667, 'Lin': 0.0002703433360367667, 'ess': 0.0005406866720735334, '951': 0.0002703433360367667, 'f c': 0.0002703433360367667, 'was': 0.0010813733441470668, 'sor': 0.0002703433360367667, 'rgi': 0.0002703433360367667, 'ld ': 0.0005406866720735334, ' pi': 0.0002703433360367667, ' de': 0.0024330900243309003, 'ghl': 0.0002703433360367667, 'neo': 0.0002703433360367667, 'bor': 0.0005406866720735334, 'nt ': 0.002703433360367667, 'Eco': 0.0002703433360367667, 'xpa': 0.0002703433360367667, 'tee': 0.0002703433360367667, 'pan': 0.0002703433360367667, ', p': 0.0005406866720735334, 'dwa': 0.0002703433360367667, 'w c': 0.0002703433360367667, ' Te': 0.0005406866720735334, 'adi': 0.0005406866720735334, 'sal': 0.0002703433360367667, 'cte': 0.0002703433360367667, ' Sk': 0.0002703433360367667, ' Am': 0.0005406866720735334, 'dig': 0.0002703433360367667, 'ewi': 0.0002703433360367667, 'van': 0.0005406866720735334, 'pro': 0.0008110300081103001, 'o-w': 0.0002703433360367667, 'Noa': 0.0002703433360367667, 'itu': 0.0010813733441470668, 'd U': 0.0002703433360367667, 'ach': 0.0008110300081103001, 'r s': 0.0002703433360367667, 'is ': 0.004866180048661801, 'sib': 0.0002703433360367667, ' at': 0.0016220600162206002, 'oun': 0.0005406866720735334, 'ode': 0.0008110300081103001, 'OM-': 0.0002703433360367667, 'rsy': 0.0002703433360367667, 'ary': 0.0002703433360367667, 'li–': 0.0002703433360367667, 'el ': 0.0002703433360367667, 'lit': 0.0013517166801838335, 'e g': 0.0005406866720735334, 'ati': 0.0024330900243309003, 'ure': 0.0008110300081103001, 'nd ': 0.0040551500405515, 'ich': 0.0010813733441470668, 'Ins': 0.0008110300081103001, 'y l': 0.0002703433360367667, 'e C': 0.0002703433360367667, ' we': 0.0002703433360367667, 'Uni': 0.0005406866720735334, 'pos': 0.0005406866720735334, 'Soc': 0.0002703433360367667, 'noh': 0.0002703433360367667, 's M': 0.0002703433360367667, 'f I': 0.0002703433360367667, 'o 1': 0.0005406866720735334, 'hol': 0.0008110300081103001, ' al': 0.001892403352257367, ' su': 0.0008110300081103001, 'IT,': 0.0002703433360367667, 'y 1': 0.0002703433360367667, 'lte': 0.0002703433360367667, 'ign': 0.0013517166801838335, ';;;': 0.0002703433360367667, ' a ': 0.0024330900243309003, 'ita': 0.0005406866720735334, 'iet': 0.0005406866720735334, 'a-V': 0.0002703433360367667, 'Ide': 0.0002703433360367667, ' ou': 0.0002703433360367667, 'ibu': 0.0002703433360367667, 'n t': 0.0029737766964044337, '(MI': 0.0002703433360367667, 'Cho': 0.002703433360367667, ' Oc': 0.0002703433360367667, 'c o': 0.0002703433360367667, ' fi': 0.0013517166801838335, 'or.': 0.0002703433360367667, 'Phi': 0.0002703433360367667, 'oke': 0.0002703433360367667, 'his': 0.0032441200324412004, ' Li': 0.0008110300081103001, 'ead': 0.0005406866720735334, 'but': 0.0005406866720735334, '/æˈ': 0.0002703433360367667, 'ːm ': 0.0002703433360367667, 'gni': 0.001892403352257367, 'c S': 0.0002703433360367667, 'owi': 0.0002703433360367667, 'lpe': 0.0002703433360367667, 'c t': 0.0002703433360367667, 'sig': 0.0008110300081103001, 'ris': 0.0008110300081103001, 'e t': 0.0010813733441470668, 'zi ': 0.0002703433360367667, 't h': 0.0005406866720735334, 'ws,': 0.0002703433360367667, 'bre': 0.0002703433360367667, 'l.\n': 0.0002703433360367667, 'res': 0.0016220600162206002, 'ky ': 0.0024330900243309003, 'uis': 0.0021627466882941336, 'war': 0.0010813733441470668, 'ish': 0.0002703433360367667, 'occ': 0.0002703433360367667, 'yed': 0.0002703433360367667, 'ein': 0.0002703433360367667, ' hi': 0.003514463368477967, '194': 0.0002703433360367667, 'mic': 0.0002703433360367667, 'ric': 0.0008110300081103001, 'l a': 0.0005406866720735334, 'nts': 0.0008110300081103001, 'am.': 0.0002703433360367667, 'den': 0.0008110300081103001, 'd N': 0.0002703433360367667, 'm C': 0.0005406866720735334, 'iat': 0.0002703433360367667, ' po': 0.0010813733441470668, 'hke': 0.0002703433360367667, '—ge': 0.0002703433360367667, 'sci': 0.0010813733441470668, 'ngu': 0.002703433360367667, 'Sci': 0.0002703433360367667, '2] ': 0.0002703433360367667, 'ity': 0.0010813733441470668, 'm.\n': 0.0002703433360367667, 'say': 0.0002703433360367667, 'ce ': 0.0005406866720735334, 'orn': 0.0005406866720735334, 'e h': 0.0013517166801838335, 'mew': 0.0002703433360367667, 'sch': 0.0005406866720735334, 'At ': 0.0002703433360367667, 'wis': 0.0002703433360367667, 'ecl': 0.0002703433360367667, 'cas': 0.0002703433360367667, '. I': 0.0008110300081103001, '-im': 0.0002703433360367667, ' Un': 0.0005406866720735334, 'eas': 0.0002703433360367667, 'nvo': 0.0005406866720735334, 'Syn': 0.0002703433360367667, 'tha': 0.0002703433360367667, 'hat': 0.0008110300081103001, 'for': 0.0024330900243309003, 'fie': 0.0008110300081103001, '\n\nB': 0.0002703433360367667, 'hav': 0.0008110300081103001, 'iss': 0.0002703433360367667, 'ola': 0.0008110300081103001, ' 19': 0.0029737766964044337, 'er,': 0.0005406866720735334, 'fre': 0.0002703433360367667, 'del': 0.0010813733441470668, '\nAv': 0.0002703433360367667, ' B.': 0.0002703433360367667, 'air': 0.0002703433360367667, 'ge,': 0.0002703433360367667, 'rea': 0.0010813733441470668, 'cre': 0.0008110300081103001, 'Pro': 0.0002703433360367667, 'n s': 0.0010813733441470668, 's A': 0.0002703433360367667, 'ece': 0.0002703433360367667, 'd m': 0.0008110300081103001, ' ma': 0.0016220600162206002, ' th': 0.010273046769397134, 'ror': 0.0005406866720735334, 'ota': 0.0002703433360367667, 'r.\n': 0.0002703433360367667, ', c': 0.0005406866720735334, ' te': 0.0008110300081103001, 'ini': 0.0005406866720735334, 'ct ': 0.0002703433360367667, 'hia': 0.0002703433360367667, 'sm.': 0.0002703433360367667, 'y m': 0.0002703433360367667, 'ort': 0.0002703433360367667, 's s': 0.0005406866720735334, 'onf': 0.0002703433360367667, '.\n\n': 0.0008110300081103001, 'elp': 0.0005406866720735334, 'y s': 0.0005406866720735334, 'ili': 0.0002703433360367667, 'sy ': 0.0002703433360367667, '. H': 0.001892403352257367, 'c f': 0.0005406866720735334, 'tin': 0.0016220600162206002, 'oci': 0.0013517166801838335, 'pok': 0.0002703433360367667, 'r f': 0.0005406866720735334, 'f a': 0.0008110300081103001, 'ufa': 0.0002703433360367667, 'rva': 0.0002703433360367667, 'ogr': 0.0002703433360367667, ' Pe': 0.0002703433360367667, 'mod': 0.0008110300081103001, 'bil': 0.0002703433360367667, 'es ': 0.0016220600162206002, 'e o': 0.0016220600162206002, 'ter': 0.0013517166801838335, ' oc': 0.0002703433360367667, 'ral': 0.0002703433360367667, 's t': 0.0005406866720735334, 'nuf': 0.0002703433360367667, ' to': 0.0024330900243309003, 'ide': 0.0013517166801838335, 'ofe': 0.0002703433360367667, 'upa': 0.0002703433360367667, 'ttr': 0.0002703433360367667, 'ti-': 0.0010813733441470668, 'arv': 0.0002703433360367667, 'ad ': 0.0005406866720735334, 'hel': 0.0002703433360367667, 'eam': 0.0002703433360367667, 'chn': 0.0002703433360367667, 'raw': 0.0002703433360367667, 'kst': 0.0002703433360367667, 'e r': 0.0002703433360367667, 'mmi': 0.0002703433360367667, ' Fe': 0.0002703433360367667, 'w a': 0.0005406866720735334, 'suc': 0.0002703433360367667, 'lle': 0.0005406866720735334, ' af': 0.0002703433360367667, 'ppo': 0.0010813733441470668, 'sac': 0.0002703433360367667, 'y, ': 0.001892403352257367, ', n': 0.0002703433360367667, 'rol': 0.0005406866720735334, 'te ': 0.0016220600162206002, 'pol': 0.0010813733441470668, 'e f': 0.0016220600162206002, ' ta': 0.0002703433360367667, 'eol': 0.0005406866720735334, ' bu': 0.0002703433360367667, ' Ch': 0.002703433360367667, ' fr': 0.0013517166801838335, ' bo': 0.0008110300081103001, 'esc': 0.0002703433360367667, 'gic': 0.0002703433360367667, 'ked': 0.0005406866720735334, 'so ': 0.0010813733441470668, 'st ': 0.0021627466882941336, ', b': 0.0005406866720735334, 'eed': 0.0002703433360367667, '.[2': 0.0002703433360367667, 'emp': 0.0002703433360367667, 'f b': 0.0002703433360367667, 'sh ': 0.0002703433360367667, 'who': 0.0002703433360367667, 'sub': 0.0002703433360367667, 'tro': 0.0002703433360367667, 'wit': 0.0013517166801838335, 'ut ': 0.0002703433360367667, 'ics': 0.0024330900243309003, '\nBo': 0.0002703433360367667, 'n N': 0.0002703433360367667, 'sto': 0.0008110300081103001, 'ffa': 0.0002703433360367667, 'rde': 0.0002703433360367667, 'ows': 0.0002703433360367667, ' sh': 0.0002703433360367667, ' ˈt': 0.0002703433360367667, 'all': 0.0005406866720735334, 'g, ': 0.0002703433360367667, 'k f': 0.0002703433360367667, 'ega': 0.0005406866720735334, 'cri': 0.0016220600162206002, 'ten': 0.0005406866720735334, 'al ': 0.0029737766964044337, 'f E': 0.0002703433360367667, 'ced': 0.0008110300081103001, 'aye': 0.0002703433360367667, ' Ha': 0.0002703433360367667, 't t': 0.0010813733441470668, 'ds.': 0.0002703433360367667, 'dle': 0.0002703433360367667, 'wor': 0.0016220600162206002, 'e-c': 0.0002703433360367667, 'da ': 0.0002703433360367667, 'pit': 0.0005406866720735334, 's, ': 0.002703433360367667, 'o a': 0.0002703433360367667, 'r r': 0.0002703433360367667, 'ast': 0.0002703433360367667, 'mas': 0.0002703433360367667, 't. ': 0.0005406866720735334, 'fra': 0.0002703433360367667, 'd f': 0.0002703433360367667, 'chu': 0.0002703433360367667, 'sup': 0.0002703433360367667, 'e b': 0.0008110300081103001, 'lig': 0.0002703433360367667, 's—g': 0.0002703433360367667, 'gur': 0.0005406866720735334, 'ns ': 0.0005406866720735334, 'pme': 0.0002703433360367667, 'mai': 0.0005406866720735334, 'lti': 0.0002703433360367667, 'hem': 0.0002703433360367667, ', C': 0.0010813733441470668, 'man': 0.0005406866720735334, 'are': 0.0002703433360367667, 'eli': 0.0002703433360367667, 'f U': 0.0005406866720735334, 'ese': 0.0002703433360367667, 'ist': 0.004866180048661801, 'rar': 0.0005406866720735334, ' ag': 0.0002703433360367667, 'der': 0.0005406866720735334, 'eor': 0.0008110300081103001, 's w': 0.0013517166801838335, 'abo': 0.0002703433360367667, 'mar': 0.0013517166801838335, 'hil': 0.0016220600162206002, ' At': 0.0002703433360367667, 'st.': 0.0005406866720735334, 'ʊm ': 0.0002703433360367667, ' Bo': 0.0002703433360367667, 'd t': 0.0016220600162206002, 'ted': 0.001892403352257367, 'rna': 0.0002703433360367667, 'oca': 0.0005406866720735334, 'eha': 0.0002703433360367667, 'g h': 0.0005406866720735334, 'unc': 0.0002703433360367667, 'e m': 0.0010813733441470668, 'rre': 0.0002703433360367667, 'sti': 0.0029737766964044337, 'aga': 0.0002703433360367667, 'lis': 0.0021627466882941336, '59 ': 0.0002703433360367667, 'rof': 0.0002703433360367667, 'mul': 0.0002703433360367667, 'etn': 0.0002703433360367667, ' em': 0.0005406866720735334, 'es,': 0.0010813733441470668, 'ona': 0.0008110300081103001, 'h a': 0.0005406866720735334, 'ndi': 0.0008110300081103001, '198': 0.0002703433360367667, 'ely': 0.0002703433360367667, 're ': 0.0013517166801838335, 'f o': 0.0002703433360367667, 'rn ': 0.0008110300081103001, 'e V': 0.0002703433360367667, 'igm': 0.0002703433360367667, 'tif': 0.0002703433360367667, 'uct': 0.0002703433360367667, 'exp': 0.0005406866720735334, 'Con': 0.0002703433360367667, 'hy.': 0.0002703433360367667, 'cie': 0.0016220600162206002, 'oin': 0.0002703433360367667, 'tip': 0.0002703433360367667, ', g': 0.0002703433360367667, 'fte': 0.0002703433360367667, 'ski': 0.0002703433360367667, 'xon': 0.0002703433360367667, 'ivi': 0.0010813733441470668, 'iti': 0.003784806704514734, 'Ene': 0.0002703433360367667, 'f P': 0.0002703433360367667, 'rs—': 0.0002703433360367667, 'cam': 0.0002703433360367667, 'lla': 0.0002703433360367667, 'ɑːm': 0.0002703433360367667, 'ima': 0.0002703433360367667, 'don': 0.0002703433360367667, 'ilo': 0.0008110300081103001, 'est': 0.0008110300081103001, 'on ': 0.002703433360367667, 'soc': 0.0008110300081103001, 'nst': 0.0010813733441470668, 'r h': 0.0010813733441470668, '7, ': 0.0002703433360367667, 'd s': 0.0016220600162206002, ' pu': 0.0002703433360367667, '00 ': 0.0002703433360367667, 'ki/': 0.0002703433360367667, 'whe': 0.0005406866720735334, 'e U': 0.0002703433360367667, ' Ni': 0.0002703433360367667, 'des': 0.0008110300081103001, 'vot': 0.0002703433360367667, 'ena': 0.0002703433360367667, 'gy ': 0.0002703433360367667, ' He': 0.0016220600162206002, ' aw': 0.0002703433360367667, '—in': 0.0002703433360367667, 'Eme': 0.0002703433360367667, 'hie': 0.0002703433360367667, 'emo': 0.0002703433360367667, ' Ne': 0.0005406866720735334, 'cho': 0.0010813733441470668, 'ubs': 0.0002703433360367667, ' ca': 0.0005406866720735334, 'n 1': 0.0010813733441470668, 'urs': 0.0002703433360367667, 'olv': 0.0005406866720735334, 'ncl': 0.0005406866720735334, ' si': 0.0013517166801838335, 'k S': 0.0002703433360367667, 'por': 0.0005406866720735334, ' dr': 0.0002703433360367667, ' Po': 0.0002703433360367667, 'Hol': 0.0002703433360367667, ' Ca': 0.0002703433360367667, 'Lis': 0.0005406866720735334, 'pat': 0.0005406866720735334, 'enn': 0.0002703433360367667, 'ose': 0.0002703433360367667, ' ov': 0.0005406866720735334, 'd c': 0.0002703433360367667, 'e 1': 0.0002703433360367667, 'ysi': 0.0002703433360367667, 'lys': 0.0002703433360367667, ' (U': 0.0002703433360367667, 'ndm': 0.0002703433360367667, 'ath': 0.0008110300081103001, 'ute': 0.0008110300081103001, 'med': 0.0008110300081103001, 'pee': 0.0002703433360367667, 's. ': 0.0008110300081103001, '-Am': 0.0002703433360367667, 'ila': 0.0002703433360367667, 'F. ': 0.0002703433360367667, 'owe': 0.0002703433360367667, 's I': 0.0005406866720735334, 'wer': 0.0002703433360367667, ' S.': 0.0002703433360367667, 'con': 0.0021627466882941336, 'rra': 0.0002703433360367667, 'nt.': 0.0002703433360367667, 'Har': 0.0002703433360367667, 'as ': 0.003784806704514734, 'e F': 0.0005406866720735334, ' Id': 0.0002703433360367667, 'Car': 0.0002703433360367667, 'sid': 0.0002703433360367667, 'Ter': 0.0002703433360367667, 'kin': 0.0005406866720735334, 'y r': 0.0005406866720735334, 'ork': 0.001892403352257367, 'ubl': 0.0002703433360367667, 'wid': 0.0005406866720735334, 'i/æ': 0.0002703433360367667, 'cs"': 0.0002703433360367667, 'de ': 0.0002703433360367667, 'oks': 0.0005406866720735334, 'ved': 0.0005406866720735334, 'w L': 0.0002703433360367667, 'rs.': 0.0005406866720735334, 'niz': 0.0002703433360367667, 'met': 0.0002703433360367667, 'rta': 0.0002703433360367667, "ky'": 0.0002703433360367667, ' an': 0.007569613409029468, 'sky': 0.002703433360367667, 'osi': 0.0002703433360367667, 'dev': 0.0008110300081103001, 'f B': 0.0002703433360367667, 'inf': 0.0002703433360367667, 'rie': 0.0002703433360367667, 's i': 0.0016220600162206002, 'ixt': 0.0002703433360367667, 'rec': 0.0002703433360367667, 'ken': 0.0005406866720735334, 'nis': 0.0002703433360367667, ' ˈn': 0.0002703433360367667, 'nsy': 0.0002703433360367667, 'fro': 0.0008110300081103001, 'gns': 0.0002703433360367667, 'w l': 0.0002703433360367667, 'edi': 0.0013517166801838335, 'w Y': 0.0002703433360367667, 'o e': 0.0002703433360367667, 'sm ': 0.0016220600162206002, 'que': 0.0002703433360367667, 'HM ': 0.0002703433360367667, 't.\n': 0.0002703433360367667, 'yti': 0.0002703433360367667, 'gra': 0.0013517166801838335, 'cup': 0.0005406866720735334, 'ipl': 0.0002703433360367667, ' Co': 0.0002703433360367667, 'ain': 0.0005406866720735334, 'g t': 0.0013517166801838335, 'r E': 0.0002703433360367667, '-wr': 0.0002703433360367667, ' in': 0.006217896728845634, 'cia': 0.0008110300081103001, 'tts': 0.0002703433360367667, 's c': 0.0008110300081103001, 'ia,': 0.0008110300081103001, 'tsp': 0.0002703433360367667, 'avi': 0.0002703433360367667, '67 ': 0.0002703433360367667, ', s': 0.0002703433360367667, 'str': 0.0002703433360367667, '58 ': 0.0002703433360367667, ' op': 0.0005406866720735334, 'ars': 0.0008110300081103001, 't, ': 0.0010813733441470668, 'ia.': 0.0005406866720735334, 'rd ': 0.0008110300081103001, 'fou': 0.0002703433360367667, ' pr': 0.0008110300081103001, 'S. ': 0.0008110300081103001, 'm o': 0.0002703433360367667, 'igu': 0.0005406866720735334, 'Tha': 0.0002703433360367667, 'nne': 0.0002703433360367667, 'cha': 0.0005406866720735334, ' co': 0.003514463368477967, ' Na': 0.0002703433360367667, 'Yor': 0.0002703433360367667, ' ex': 0.0005406866720735334, 't C': 0.0002703433360367667, 'bli': 0.0002703433360367667, 'ruc': 0.0002703433360367667, 'par': 0.0010813733441470668, 'zed': 0.0002703433360367667, 'xte': 0.0002703433360367667, 'tur': 0.0005406866720735334, 'anc': 0.0002703433360367667, 'ar ': 0.0016220600162206002, 'tak': 0.0002703433360367667, 'ˈno': 0.0002703433360367667, ' Ea': 0.0002703433360367667, 'U.S': 0.0005406866720735334, 'us ': 0.0002703433360367667, 'd p': 0.0010813733441470668, 'c, ': 0.0002703433360367667, 'aca': 0.0002703433360367667, 'dic': 0.0002703433360367667, 'lic': 0.0008110300081103001, 'thi': 0.0002703433360367667, 'uts': 0.0002703433360367667, 'phy': 0.0005406866720735334, 'rot': 0.0002703433360367667, 'eti': 0.0008110300081103001, 'New': 0.0005406866720735334, ' ti': 0.0002703433360367667, 'tra': 0.0005406866720735334, 'e O': 0.0002703433360367667, 'vel': 0.0008110300081103001, 'an ': 0.0032441200324412004, 'hif': 0.0002703433360367667, 'tho': 0.0002703433360367667, 'rov': 0.0005406866720735334, 'ued': 0.0005406866720735334, 'ema': 0.0005406866720735334, 'eno': 0.0002703433360367667, ' Le': 0.0002703433360367667, 'n, ': 0.0005406866720735334, '959': 0.0002703433360367667, 'ory': 0.0010813733441470668, 'son': 0.0002703433360367667, 'gui': 0.0021627466882941336, 'B. ': 0.0002703433360367667, 'our': 0.0002703433360367667, 'o-c': 0.0002703433360367667, ' ap': 0.0002703433360367667, 'som': 0.0005406866720735334, 'e I': 0.0008110300081103001, 'nol': 0.0002703433360367667, 'icy': 0.0002703433360367667, ' CH': 0.0002703433360367667, 'col': 0.0002703433360367667, 'doc': 0.0002703433360367667, 'ave': 0.0005406866720735334, 'ind': 0.0002703433360367667, 'g c': 0.0005406866720735334, ' pl': 0.0005406866720735334, 'the': 0.011084076777507435, '; b': 0.0002703433360367667, '-cl': 0.0002703433360367667, 'nco': 0.0002703433360367667, 'k w': 0.0002703433360367667, 'nns': 0.0002703433360367667, ' he': 0.0040551500405515, 'AHM': 0.0002703433360367667, 'tre': 0.0002703433360367667, '57 ': 0.0002703433360367667, 'din': 0.0010813733441470668, 'h, ': 0.0002703433360367667, 'emi': 0.0008110300081103001, 'nam': 0.0002703433360367667, ' MI': 0.0002703433360367667, 'Str': 0.0002703433360367667, 'Adv': 0.0002703433360367667, 'h s': 0.0002703433360367667, 'e; ': 0.0002703433360367667, 'of ': 0.008380643417139767, 'r 1': 0.0002703433360367667, 'udy': 0.0008110300081103001, 's h': 0.0002703433360367667, 'eld': 0.0008110300081103001, '(US': 0.0002703433360367667, 'led': 0.0002703433360367667, 'o-s': 0.0002703433360367667, ' sa': 0.0002703433360367667, 'ibi': 0.0002703433360367667, 'nse': 0.0005406866720735334, 'Int': 0.0002703433360367667, 'r t': 0.0008110300081103001, 'opp': 0.0005406866720735334, 'o b': 0.0002703433360367667, '-sy': 0.0002703433360367667, ' sc': 0.0016220600162206002, 'ewo': 0.0002703433360367667, 'nda': 0.0005406866720735334, '0 b': 0.0002703433360367667, 'lva': 0.0002703433360367667, 'rti': 0.0008110300081103001, ', a': 0.0024330900243309003, 'ato': 0.0005406866720735334, 't i': 0.0005406866720735334, 'leg': 0.0002703433360367667, 'ow ': 0.0005406866720735334, 'ls"': 0.0002703433360367667, 'gn ': 0.0002703433360367667, 'cla': 0.0002703433360367667, 'kee': 0.0002703433360367667, 'lad': 0.0002703433360367667, 'per': 0.0005406866720735334, 'm N': 0.0002703433360367667, 'igr': 0.0002703433360367667, 'ddl': 0.0002703433360367667, 'min': 0.0008110300081103001, 'cap': 0.0005406866720735334, 'omy': 0.0002703433360367667, 'nt:': 0.0002703433360367667, 'ynd': 0.0002703433360367667, ' De': 0.0002703433360367667, 'evo': 0.0002703433360367667, 'lat': 0.0005406866720735334, 'flu': 0.0002703433360367667, 'esi': 0.0005406866720735334, 'c a': 0.0002703433360367667, 'dma': 0.0002703433360367667, 'eri': 0.0016220600162206002, 's f': 0.0005406866720735334, '5. ': 0.0002703433360367667, 'har': 0.0002703433360367667, 'anu': 0.0002703433360367667, 'uth': 0.0002703433360367667, 'ssa': 0.0005406866720735334, 'bro': 0.0002703433360367667, '5, ': 0.0002703433360367667, ' Fo': 0.0005406866720735334, 'ens': 0.0002703433360367667, 'att': 0.0005406866720735334, 'ctu': 0.0008110300081103001, 'nie': 0.0002703433360367667, 'olu': 0.0002703433360367667, 'opm': 0.0002703433360367667, 'imm': 0.0002703433360367667, ' ac': 0.001892403352257367, 'inn': 0.0002703433360367667, 'Whi': 0.0002703433360367667, ' br': 0.0002703433360367667, 's.[': 0.0002703433360367667, 'Lef': 0.0002703433360367667, 'one': 0.0008110300081103001, 'out': 0.0002703433360367667, 'ore': 0.0005406866720735334, 'an,': 0.0005406866720735334, 'udi': 0.0008110300081103001, '7 C': 0.0002703433360367667, 'Ash': 0.0002703433360367667, ' Ph': 0.0002703433360367667, 'ies': 0.0005406866720735334, '955': 0.0008110300081103001, 'oli': 0.0016220600162206002, 'n T': 0.0002703433360367667, ' Th': 0.0005406866720735334, ' In': 0.001892403352257367, 'nia': 0.0008110300081103001, 'gen': 0.0008110300081103001, 'rɑː': 0.0002703433360367667, 'ogn': 0.0010813733441470668, 't T': 0.0002703433360367667, ' Ci': 0.0002703433360367667, 'o H': 0.0002703433360367667, 'ult': 0.0002703433360367667, 'rk ': 0.001892403352257367, 'mes': 0.0005406866720735334, 'co-': 0.0005406866720735334, 'tna': 0.0002703433360367667, 'rei': 0.0002703433360367667, ' ph': 0.0008110300081103001, '80s': 0.0002703433360367667, '-VR': 0.0002703433360367667, ' cr': 0.001892403352257367, 'pre': 0.0002703433360367667, 'ee;': 0.0002703433360367667, 'h h': 0.0005406866720735334, 'ome': 0.0008110300081103001, 'py ': 0.0002703433360367667, 'ia ': 0.0002703433360367667, ' Ma': 0.0008110300081103001, 'f t': 0.0021627466882941336, 'poi': 0.0002703433360367667, 'd S': 0.0005406866720735334, 'Heb': 0.0002703433360367667, 'rsi': 0.0005406866720735334, 'm n': 0.0002703433360367667, ' Ec': 0.0002703433360367667, ' or': 0.0002703433360367667, '\nOn': 0.0002703433360367667, ' sy': 0.0002703433360367667, 'y i': 0.0008110300081103001, 'vem': 0.0008110300081103001, 'ty ': 0.0008110300081103001, 's S': 0.0002703433360367667, ' Ti': 0.0002703433360367667, 'beg': 0.0005406866720735334, 'a n': 0.0002703433360367667, 'tea': 0.0005406866720735334, 'uen': 0.0005406866720735334, '. W': 0.0002703433360367667, 'Ass': 0.0002703433360367667, 'e c': 0.0008110300081103001, 'st,': 0.0005406866720735334, 'rma': 0.0005406866720735334, 'ng,': 0.0002703433360367667, 'my ': 0.0002703433360367667, ' au': 0.0002703433360367667, '. C': 0.0002703433360367667, 'beh': 0.0002703433360367667, 'k a': 0.0002703433360367667, 'vis': 0.0010813733441470668, 'o m': 0.0002703433360367667, 'lds': 0.0002703433360367667, 'err': 0.0005406866720735334, '0s.': 0.0002703433360367667, 'tri': 0.0002703433360367667, 'vrɑ': 0.0002703433360367667, 'ial': 0.0013517166801838335, 'spe': 0.0002703433360367667, 'noʊ': 0.0002703433360367667, 'ry,': 0.0008110300081103001, 'cli': 0.0002703433360367667, 'cad': 0.0005406866720735334, 'g o': 0.0002703433360367667, 's l': 0.0005406866720735334, 'mma': 0.0008110300081103001, 'sm,': 0.0016220600162206002, 'l c': 0.0002703433360367667, 'arr': 0.0008110300081103001, 'idd': 0.0002703433360367667, ' Ad': 0.0002703433360367667, ' U.': 0.0005406866720735334, 'vio': 0.0002703433360367667, 'æˈv': 0.0002703433360367667, 'z i': 0.0002703433360367667, 'tio': 0.0024330900243309003, 'nen': 0.0005406866720735334, 'Bor': 0.0002703433360367667, 'nom': 0.0002703433360367667, '–Pa': 0.0002703433360367667, ' Ho': 0.0005406866720735334, 'iel': 0.0008110300081103001, 's p': 0.0002703433360367667, 'ift': 0.0002703433360367667, 'sop': 0.0008110300081103001, ' vo': 0.0002703433360367667, '100': 0.0002703433360367667, 'scr': 0.0002703433360367667, 'rsa': 0.0002703433360367667, 'def': 0.0002703433360367667, 'tte': 0.0002703433360367667, 'die': 0.0002703433360367667, 'Ind': 0.0002703433360367667, 'r e': 0.0002703433360367667, 'ono': 0.0002703433360367667, 'wro': 0.0002703433360367667, 'r. ': 0.0002703433360367667, 'US ': 0.0002703433360367667, 'tem': 0.0002703433360367667, 'cti': 0.0013517166801838335, 's E': 0.0002703433360367667, 'ic ': 0.0021627466882941336, 'm s': 0.0002703433360367667, 'dia': 0.0010813733441470668, 'Isr': 0.0002703433360367667, 'gan': 0.0008110300081103001, 'fes': 0.0002703433360367667, 'fen': 0.0002703433360367667, 'nue': 0.0005406866720735334, 'ali': 0.0024330900243309003, 'nat': 0.0002703433360367667, 'cid': 0.0002703433360367667, 'hm ': 0.0002703433360367667, 's".': 0.0002703433360367667, 'tud': 0.0010813733441470668, 'tac': 0.0002703433360367667, 'nit': 0.0008110300081103001, 'ard': 0.0010813733441470668, 'ews': 0.0002703433360367667, 'ch—': 0.0002703433360367667, 'le ': 0.0010813733441470668, 'aur': 0.0002703433360367667, 'dit': 0.0008110300081103001, 'ond': 0.0002703433360367667, 'sia': 0.0002703433360367667, 'ohm': 0.0002703433360367667, 'Stu': 0.0002703433360367667, 'am ': 0.0010813733441470668, 'ell': 0.0010813733441470668, 'cem': 0.0002703433360367667, '5 h': 0.0002703433360367667, 'aut': 0.0002703433360367667, 'dra': 0.0002703433360367667, ' of': 0.008380643417139767, 'tir': 0.0002703433360367667, 'shi': 0.0002703433360367667, 'en ': 0.0005406866720735334, 'nfl': 0.0005406866720735334, '28)': 0.0002703433360367667, '. T': 0.0002703433360367667, '-wa': 0.0002703433360367667, 'syl': 0.0002703433360367667, 'n a': 0.0016220600162206002, 'ni/': 0.0002703433360367667, 't R': 0.0002703433360367667, 'rae': 0.0002703433360367667, 'he ': 0.013787510137875101, 'and': 0.005136523384698567, 'l r': 0.0002703433360367667, '957': 0.0002703433360367667, 'use': 0.0002703433360367667, 'gin': 0.0005406866720735334, 'uti': 0.0005406866720735334, '. i': 0.0002703433360367667, 'tua': 0.0002703433360367667, ' be': 0.0013517166801838335, 'mie': 0.0002703433360367667, 'ol ': 0.0002703433360367667, 'syn': 0.0002703433360367667, 'Dec': 0.0002703433360367667, 'd l': 0.0002703433360367667, 'eni': 0.0008110300081103001, 'bei': 0.0002703433360367667, 'Avr': 0.0002703433360367667, 'Edw': 0.0002703433360367667, 'cs ': 0.0010813733441470668, 'or ': 0.0040551500405515, 'nsi': 0.0002703433360367667, 'i-A': 0.0002703433360367667, 'a l': 0.0002703433360367667, 'h t': 0.0002703433360367667, 'Vie': 0.0002703433360367667, ' fe': 0.0005406866720735334, '-cr': 0.0002703433360367667, 'Pal': 0.0002703433360367667, 'ped': 0.0008110300081103001, 'sen': 0.0002703433360367667, 'lly': 0.0002703433360367667, 'e d': 0.0010813733441470668, 'lib': 0.0005406866720735334, 'ins': 0.0005406866720735334, 'roa': 0.0002703433360367667, 'ram': 0.0016220600162206002, 'mal': 0.0002703433360367667, 'stu': 0.0008110300081103001, 'm, ': 0.0016220600162206002, 'inc': 0.0008110300081103001, ' no': 0.0002703433360367667, 'ew ': 0.0010813733441470668, '1 t': 0.0002703433360367667, 'f F': 0.0002703433360367667, 'lso': 0.0010813733441470668, 'er ': 0.0016220600162206002, 'low': 0.0010813733441470668, 'h i': 0.0002703433360367667, 'ize': 0.0002703433360367667, 'ect': 0.0002703433360367667, 'wev': 0.0002703433360367667, 'y d': 0.0002703433360367667, ', m': 0.0002703433360367667, 'm. ': 0.0002703433360367667, 'equ': 0.0002703433360367667, 'gua': 0.0005406866720735334, 'ts,': 0.0002703433360367667, 'Occ': 0.0002703433360367667, 'nim': 0.0002703433360367667, 'usi': 0.0002703433360367667, 'sit': 0.0005406866720735334, 'ymp': 0.0002703433360367667, 'ied': 0.0002703433360367667, 'eft': 0.0002703433360367667, 'era': 0.0010813733441470668, ' Vi': 0.0002703433360367667, '967': 0.0002703433360367667, 'ajo': 0.0005406866720735334, ' ne': 0.0008110300081103001, 'sta': 0.0002703433360367667, 'ere': 0.0010813733441470668, 'lut': 0.0002703433360367667, 'ly,': 0.0002703433360367667, 'pic': 0.0002703433360367667, ' Hi': 0.0002703433360367667, 'imp': 0.0005406866720735334, '8) ': 0.0002703433360367667, 'hum': 0.0002703433360367667, 'n c': 0.0008110300081103001, 'fai': 0.0002703433360367667, '.S.': 0.0005406866720735334, 'nif': 0.0008110300081103001, 'm W': 0.0002703433360367667, 'ske': 0.0002703433360367667, 'ile': 0.0005406866720735334, ' St': 0.0005406866720735334, 'rse': 0.0002703433360367667, 'r w': 0.0005406866720735334, 'seq': 0.0002703433360367667, 'hig': 0.0002703433360367667, 'ʃɒm': 0.0002703433360367667, 'new': 0.0005406866720735334, 'nti': 0.0024330900243309003, 'k i': 0.0002703433360367667, 'ase': 0.0002703433360367667, 'fac': 0.0002703433360367667, 'lin': 0.0021627466882941336, 'elo': 0.0008110300081103001, 've ': 0.001892403352257367, 'f A': 0.0002703433360367667, 'oam': 0.0002703433360367667, 'ded': 0.0002703433360367667, ' as': 0.0016220600162206002, 'k o': 0.0002703433360367667, 'ive': 0.0021627466882941336, 'clu': 0.0005406866720735334, 'llo': 0.0010813733441470668, 'd w': 0.0008110300081103001, 'lec': 0.0002703433360367667, 'ste': 0.0005406866720735334, '195': 0.001892403352257367, 'e p': 0.0005406866720735334, 'i-c': 0.0002703433360367667, 'ynt': 0.0002703433360367667, '949': 0.0002703433360367667, 'lop': 0.0008110300081103001, 'ara': 0.0002703433360367667, 'rom': 0.0010813733441470668, ' wa': 0.0013517166801838335, 'lyt': 0.0002703433360367667, 'edo': 0.0002703433360367667, ' on': 0.0010813733441470668, 'ons': 0.0005406866720735334, 'm ˈ': 0.0005406866720735334, 'mos': 0.0002703433360367667, 'Fel': 0.0002703433360367667, 'War': 0.0008110300081103001, ' Is': 0.0002703433360367667, 'rit': 0.0016220600162206002, 'n P': 0.0005406866720735334, 'ccu': 0.0008110300081103001, 'ian': 0.0010813733441470668, 'inv': 0.0005406866720735334, ' wo': 0.0013517166801838335, 'ora': 0.0008110300081103001, 'gm ': 0.0002703433360367667, 'ust': 0.0002703433360367667, 'rew': 0.0002703433360367667, '. S': 0.0005406866720735334, 'ori': 0.0008110300081103001, 'y o': 0.0024330900243309003, 'tru': 0.0002703433360367667, 'ost': 0.0002703433360367667, 'l f': 0.0002703433360367667, 'dy.': 0.0002703433360367667, 'hom': 0.002703433360367667, 'ret': 0.0002703433360367667, 'ael': 0.0002703433360367667, 'oso': 0.0008110300081103001, 'ry ': 0.0005406866720735334, 'lea': 0.0002703433360367667, "n's": 0.0002703433360367667, 'In ': 0.0005406866720735334, 'lar': 0.0010813733441470668, 'd o': 0.0010813733441470668, 'ety': 0.0002703433360367667, '-sk': 0.0002703433360367667, 'aus': 0.0002703433360367667, 'He ': 0.0010813733441470668, 'i/ ': 0.0002703433360367667, 'azi': 0.0002703433360367667, ' ge': 0.0005406866720735334, 'erg': 0.0002703433360367667, 'How': 0.0002703433360367667, 'spr': 0.0002703433360367667, 'mov': 0.0005406866720735334, 'int': 0.0005406866720735334, ' Me': 0.0002703433360367667, 'o h': 0.0005406866720735334, 'n A': 0.0002703433360367667, 'n D': 0.0002703433360367667, 'pub': 0.0002703433360367667, 'ir ': 0.0002703433360367667, 'erm': 0.0002703433360367667, 'tat': 0.0002703433360367667, 'e N': 0.0002703433360367667, 'y a': 0.0005406866720735334, ' "T': 0.0002703433360367667, 'men': 0.0013517166801838335, 'inu': 0.0005406866720735334, 'oʊm': 0.0002703433360367667, 'rad': 0.0002703433360367667, 'tut': 0.0008110300081103001, 'ch ': 0.0010813733441470668, 'chi': 0.0008110300081103001, ': T': 0.0002703433360367667, ' sp': 0.0005406866720735334, 'i–P': 0.0002703433360367667, 'nal': 0.0013517166801838335, 'oct': 0.0002703433360367667, 'Jew': 0.0002703433360367667, ' st': 0.0010813733441470668, ' ci': 0.0002703433360367667, 'e. ': 0.0002703433360367667, ' mu': 0.0002703433360367667, ' Pr': 0.0005406866720735334, 'het': 0.0002703433360367667, 'a. ': 0.0005406866720735334, 'fat': 0.0002703433360367667, 'upy': 0.0002703433360367667, 'r, ': 0.0010813733441470668, 'tor': 0.0016220600162206002, 'api': 0.0005406866720735334, 'rly': 0.0010813733441470668, 'pla': 0.0005406866720735334, 'ber': 0.0008110300081103001, 'ing': 0.006758583400919167, ' Wa': 0.0008110300081103001, 'spa': 0.0002703433360367667, 'app': 0.0002703433360367667, 'i J': 0.0002703433360367667, 'aki': 0.0002703433360367667, 'ope': 0.0005406866720735334, 'ebr': 0.0002703433360367667, 't o': 0.0008110300081103001, 'a, ': 0.0008110300081103001, 'ici': 0.0005406866720735334, 'e P': 0.0005406866720735334, 'ifi': 0.0010813733441470668, 'opa': 0.0002703433360367667, 'ˈtʃ': 0.0002703433360367667, 'ce.': 0.0002703433360367667, 'mid': 0.0002703433360367667, 'me ': 0.0008110300081103001, 'n p': 0.0002703433360367667, '958': 0.0002703433360367667, 'nsf': 0.0002703433360367667, 't m': 0.0002703433360367667, 'l o': 0.0005406866720735334, 'shk': 0.0002703433360367667, 'Fro': 0.0002703433360367667, 'in ': 0.005136523384698567, 'ng ': 0.004325493376588267, 'cit': 0.0002703433360367667, 'Res': 0.0002703433360367667, 'ele': 0.0002703433360367667, 'eat': 0.0005406866720735334, 'ear': 0.0010813733441470668, 'bed': 0.0002703433360367667, 's m': 0.0005406866720735334, 'y. ': 0.0008110300081103001, 'igh': 0.0002703433360367667, 'ace': 0.0002703433360367667, 'al.': 0.0002703433360367667, ' ha': 0.0013517166801838335, 'ier': 0.0005406866720735334, 'Mas': 0.0005406866720735334, 'mor': 0.0002703433360367667, 'vra': 0.0002703433360367667, 'ani': 0.0005406866720735334, 'lud': 0.0005406866720735334, 'at ': 0.0016220600162206002, 'var': 0.0002703433360367667, 'uri': 0.0005406866720735334, 'saw': 0.0002703433360367667, '51 ': 0.0002703433360367667, 'tim': 0.0005406866720735334, 'rev': 0.0002703433360367667, 'MIT': 0.0005406866720735334, 'mig': 0.0002703433360367667, 'pag': 0.0002703433360367667, 'ien': 0.0013517166801838335, 'vol': 0.0008110300081103001, 'Cit': 0.0002703433360367667, 'ert': 0.0002703433360367667, 'f T': 0.0002703433360367667, 'deo': 0.0002703433360367667, ' Sc': 0.0008110300081103001, 'ylv': 0.0002703433360367667, '7 e': 0.0002703433360367667, 'l g': 0.0005406866720735334, ' Yo': 0.0002703433360367667, 'tz ': 0.0002703433360367667, ' do': 0.0002703433360367667, '", ': 0.0002703433360367667, 'd i': 0.0005406866720735334, 'eve': 0.0010813733441470668, ' pa': 0.0008110300081103001, 'imo': 0.0002703433360367667, 'y "': 0.0002703433360367667, ' As': 0.0005406866720735334, 't y': 0.0002703433360367667, '\n\nA': 0.0002703433360367667, 'ree': 0.0002703433360367667, 'RAH': 0.0002703433360367667, 'oad': 0.0002703433360367667, 'One': 0.0002703433360367667, 'opi': 0.0002703433360367667, 'Eas': 0.0002703433360367667, 'sis': 0.0002703433360367667, 'r o': 0.0016220600162206002, 'nd.': 0.0002703433360367667, 'lve': 0.0005406866720735334, 'ale': 0.0002703433360367667, 'ana': 0.0010813733441470668, 'lac': 0.0002703433360367667, 'rib': 0.0005406866720735334, 'ge ': 0.0005406866720735334, 'Ric': 0.0002703433360367667, 'los': 0.0008110300081103001, 'uch': 0.0002703433360367667, 'fel': 0.0005406866720735334, 'Pen': 0.0002703433360367667, 'jor': 0.0005406866720735334, 'se ': 0.0005406866720735334, 'hor': 0.0002703433360367667, 'ame': 0.0005406866720735334, 'dva': 0.0002703433360367667, 'hus': 0.0002703433360367667, 'eca': 0.0005406866720735334, 'n i': 0.0005406866720735334, 's d': 0.0008110300081103001, 'nes': 0.0002703433360367667, 'egi': 0.0002703433360367667, 'ite': 0.0005406866720735334, '/ a': 0.0002703433360367667, 'fic': 0.0010813733441470668, 'Sch': 0.0005406866720735334, 'ver': 0.001892403352257367, 'aly': 0.0005406866720735334, ' ea': 0.0005406866720735334, 'sin': 0.0008110300081103001, 'The': 0.0005406866720735334, "on'": 0.0002703433360367667, 'His': 0.0002703433360367667, 'h r': 0.0002703433360367667, 'log': 0.0005406866720735334, 'boo': 0.0005406866720735334, 'hno': 0.0002703433360367667, 'n e': 0.0002703433360367667, 'ade': 0.0008110300081103001, ' wi': 0.001892403352257367, 's a': 0.0040551500405515, 'ene': 0.0005406866720735334, 'add': 0.0002703433360367667, 'atz': 0.0002703433360367667, 'nte': 0.0010813733441470668, 'ne ': 0.0008110300081103001, 'awa': 0.0002703433360367667, 'noc': 0.0002703433360367667, 'i-w': 0.0002703433360367667, 'm 1': 0.0005406866720735334, ' ro': 0.0002703433360367667, 'ch,': 0.0002703433360367667, '-ca': 0.0002703433360367667, '. F': 0.0008110300081103001, 'ty.': 0.0002703433360367667, 'voc': 0.0002703433360367667, 'top': 0.0002703433360367667, 'act': 0.0021627466882941336, 'rs ': 0.0005406866720735334, 'cog': 0.0010813733441470668, 'ent': 0.003784806704514734, ' is': 0.001892403352257367, 'e w': 0.0013517166801838335, 'om ': 0.0013517166801838335, 'cau': 0.0002703433360367667, 'f u': 0.0002703433360367667, 'tiv': 0.0024330900243309003, '[22': 0.0002703433360367667, '. A': 0.0002703433360367667, 'y c': 0.0002703433360367667, 'als': 0.0013517166801838335, 'm a': 0.001892403352257367, 't f': 0.0005406866720735334, 'hy,': 0.0005406866720735334, 'chy': 0.0002703433360367667, 'uma': 0.0002703433360367667, 'cou': 0.0002703433360367667, 's r': 0.0002703433360367667, 'ibe': 0.0008110300081103001, "ty'": 0.0002703433360367667, 's "': 0.0002703433360367667, 'l p': 0.0002703433360367667, ') i': 0.0002703433360367667, 'tʃɒ': 0.0002703433360367667, 'bse': 0.0002703433360367667, 'ay ': 0.0005406866720735334, 'hly': 0.0002703433360367667, "y's": 0.0005406866720735334, 'emb': 0.0002703433360367667, 'mpe': 0.0005406866720735334, 'e s': 0.0013517166801838335, 'pon': 0.0005406866720735334, 'cul': 0.0005406866720735334, ' le': 0.0002703433360367667, 'Nix': 0.0002703433360367667, 'r c': 0.0005406866720735334, 'enc': 0.0010813733441470668, 'IT)': 0.0002703433360367667, ' Fr': 0.0002703433360367667, 'sso': 0.0008110300081103001, 'CHO': 0.0002703433360367667, 'hic': 0.0008110300081103001, 'ers': 0.0016220600162206002, 'a N': 0.0002703433360367667, 'n h': 0.0005406866720735334, '55,': 0.0002703433360367667, 'rin': 0.0002703433360367667, 'l S': 0.0005406866720735334, ' Wh': 0.0002703433360367667, 'a c': 0.0002703433360367667, ' Ed': 0.0002703433360367667, 'ern': 0.0005406866720735334, 'oms': 0.002703433360367667, ' Je': 0.0002703433360367667, 'ote': 0.0002703433360367667, '\nAn': 0.0002703433360367667, ' ad': 0.0002703433360367667, 'ion': 0.0024330900243309003, 'e L': 0.0002703433360367667, 'ic,': 0.0002703433360367667, 'ntr': 0.0005406866720735334, ' id': 0.0002703433360367667, 'Bot': 0.0002703433360367667, 'i-i': 0.0002703433360367667, 't a': 0.0002703433360367667, 'e W': 0.0002703433360367667, 'cal': 0.001892403352257367, 'd, ': 0.0002703433360367667, 'Nat': 0.0002703433360367667, 'rro': 0.0005406866720735334, ' la': 0.0010813733441470668, 'tar': 0.0002703433360367667, 'a b': 0.0002703433360367667, 'ɒms': 0.0002703433360367667, 'nde': 0.0002703433360367667, 'n l': 0.0010813733441470668, 'Som': 0.0002703433360367667, 'spo': 0.0005406866720735334, 'ate': 0.0013517166801838335, 'ddi': 0.0002703433360367667, 'ook': 0.0005406866720735334, ' tr': 0.0002703433360367667, 'Ame': 0.0008110300081103001, ', 1': 0.0002703433360367667, 'ech': 0.0005406866720735334, 'icu': 0.0005406866720735334, 'rac': 0.0002703433360367667, 'ark': 0.0005406866720735334, 'ole': 0.0002703433360367667, 'n f': 0.0005406866720735334, ' Fa': 0.0002703433360367667, ' ye': 0.0002703433360367667, 'ner': 0.0008110300081103001, 'eac': 0.0005406866720735334, 'ire': 0.0002703433360367667, 'ime': 0.0005406866720735334, 'f l': 0.0008110300081103001, 'tic': 0.005947553392808867, 'ray': 0.0002703433360367667, 'maj': 0.0005406866720735334, 'set': 0.0002703433360367667, 'h C': 0.0002703433360367667, 'ct,': 0.0002703433360367667, 'efe': 0.0002703433360367667, 'Fol': 0.0002703433360367667, ' es': 0.0002703433360367667, 'An ': 0.0002703433360367667, 'r a': 0.0002703433360367667, 'msk': 0.0029737766964044337, 'g C': 0.0005406866720735334, 'Pol': 0.0002703433360367667, 'nta': 0.0002703433360367667, 't M': 0.0002703433360367667, 'piv': 0.0002703433360367667, 'le-': 0.0002703433360367667, 'aw ': 0.0002703433360367667, '), ': 0.0002703433360367667, 'cs,': 0.0010813733441470668, '928': 0.0002703433360367667, 'ior': 0.0002703433360367667, 'S L': 0.0002703433360367667, 'ivo': 0.0002703433360367667, 'aff': 0.0002703433360367667, 's v': 0.0002703433360367667, ' Em': 0.0002703433360367667, 'her': 0.0010813733441470668, 's",': 0.0002703433360367667, 'Fau': 0.0002703433360367667, '9. ': 0.0002703433360367667, 'ndo': 0.0002703433360367667, 'orm': 0.0002703433360367667, 'Ski': 0.0002703433360367667, 'lph': 0.0002703433360367667, 'acc': 0.0002703433360367667, 'ula': 0.0005406866720735334, ', h': 0.001892403352257367, 'rog': 0.0002703433360367667, ' li': 0.0021627466882941336, 'ho-': 0.0002703433360367667, 'mpa': 0.0002703433360367667, 'ar,': 0.0005406866720735334, 'a s': 0.0002703433360367667, 'nd,': 0.0002703433360367667, '49.': 0.0002703433360367667, 'six': 0.0002703433360367667, 'uni': 0.0002703433360367667, 'T),': 0.0002703433360367667, 'e M': 0.0008110300081103001, 'win': 0.0002703433360367667, ' F.': 0.0002703433360367667, 'ove': 0.0016220600162206002, 'y h': 0.0005406866720735334, 'c p': 0.0002703433360367667, 'tit': 0.0008110300081103001, '196': 0.0002703433360367667, 'w S': 0.0002703433360367667, 'oll': 0.0005406866720735334, 't p': 0.0002703433360367667, 'ont': 0.0013517166801838335, '192': 0.0002703433360367667, 'bec': 0.0002703433360367667, 'ts ': 0.0008110300081103001, 'dy ': 0.0005406866720735334, 'und': 0.0005406866720735334, 'ks ': 0.0002703433360367667, '"Th': 0.0002703433360367667, 'ed ': 0.006758583400919167, 'yea': 0.0002703433360367667, 'er.': 0.0002703433360367667, 'k C': 0.0002703433360367667, 'g a': 0.0005406866720735334, 'nar': 0.0005406866720735334, 'ria': 0.0010813733441470668, 'aro': 0.0002703433360367667, ', w': 0.001892403352257367, 'arc': 0.0010813733441470668, ' un': 0.0005406866720735334, 'les': 0.0002703433360367667, 'tis': 0.0002703433360367667, 'ica': 0.0032441200324412004, 'uag': 0.0005406866720735334, 'dem': 0.0002703433360367667, 'sra': 0.0002703433360367667, 'g f': 0.0002703433360367667, '980': 0.0002703433360367667, 'c s': 0.0002703433360367667, 'cus': 0.0002703433360367667, 'Pre': 0.0002703433360367667, 's L': 0.0002703433360367667, 'rem': 0.0008110300081103001, 'ant': 0.0021627466882941336, '] A': 0.0002703433360367667, 'Fou': 0.0002703433360367667, 'a p': 0.0005406866720735334, 'n w': 0.0002703433360367667, 'M-s': 0.0002703433360367667, ' Sy': 0.0002703433360367667, 'lue': 0.0002703433360367667, 'e e': 0.0008110300081103001, 'wn ': 0.0002703433360367667, ', t': 0.0010813733441470668, ' En': 0.0002703433360367667, 'tal': 0.0008110300081103001, '22]': 0.0002703433360367667, 'ith': 0.0013517166801838335, 'xpo': 0.0002703433360367667, 't w': 0.0002703433360367667, 'o d': 0.0002703433360367667, 'nce': 0.0016220600162206002, '9 h': 0.0002703433360367667, 'sea': 0.0002703433360367667, ' mo': 0.0013517166801838335, 'art': 0.0005406866720735334, 'mbe': 0.0002703433360367667, 'h—i': 0.0002703433360367667, ' gr': 0.0008110300081103001, 'nem': 0.0002703433360367667, 'ws ': 0.0002703433360367667, 'eme': 0.0013517166801838335, '8 t': 0.0002703433360367667, 't: ': 0.0002703433360367667, 'fli': 0.0002703433360367667, 'rke': 0.0005406866720735334, 'a m': 0.0008110300081103001, 'oth': 0.0002703433360367667, ' mi': 0.0008110300081103001, 'awn': 0.0002703433360367667, 'd a': 0.0024330900243309003, 'een': 0.0002703433360367667, 'tel': 0.0002703433360367667, ' No': 0.0002703433360367667, 'e a': 0.0021627466882941336, 'sym': 0.0002703433360367667, ' wh': 0.001892403352257367, 'phi': 0.0010813733441470668, 'g p': 0.0002703433360367667, 'ans': 0.0002703433360367667, 's W': 0.0002703433360367667, '"th': 0.0002703433360367667, 'lab': 0.0002703433360367667, 'rat': 0.0010813733441470668, 'y (': 0.0005406866720735334, 'ho ': 0.0002703433360367667, 'r A': 0.0002703433360367667, 'cy,': 0.0002703433360367667, 'ual': 0.0002703433360367667, 'loc': 0.0002703433360367667, 'niv': 0.0008110300081103001, "'s ": 0.0008110300081103001, 'T, ': 0.0002703433360367667, ', i': 0.0010813733441470668, 'lay': 0.0002703433360367667, ' 7,': 0.0002703433360367667, 't c': 0.0005406866720735334, 'Tec': 0.0002703433360367667, 'd h': 0.0008110300081103001, ' so': 0.0010813733441470668, 'fig': 0.0005406866720735334, 'to ': 0.0021627466882941336, 'rri': 0.0002703433360367667, 'ict': 0.0002703433360367667, 'ran': 0.0005406866720735334, 'alt': 0.0002703433360367667, '\n\nO': 0.0002703433360367667, 'red': 0.0002703433360367667, 'dom': 0.0002703433360367667, 'cis': 0.0005406866720735334, 'las': 0.0002703433360367667, 'sfo': 0.0002703433360367667, 'eec': 0.0002703433360367667, 'n o': 0.0010813733441470668, 'ine': 0.0005406866720735334, ' a-': 0.0002703433360367667, 'e, ': 0.0002703433360367667, 'ly ': 0.0016220600162206002, 'amm': 0.0008110300081103001, '55 ': 0.0002703433360367667, 'ari': 0.0002703433360367667, 'heo': 0.0008110300081103001, 'l E': 0.0002703433360367667, 'mer': 0.0013517166801838335, ' "t': 0.0002703433360367667, 't H': 0.0002703433360367667, 'mpo': 0.0002703433360367667, ' Ri': 0.0002703433360367667, '. B': 0.0002703433360367667, 'rop': 0.0002703433360367667, 'oph': 0.0008110300081103001, ' im': 0.0005406866720735334, 'ˈvr': 0.0002703433360367667, 's o': 0.0008110300081103001, 'tus': 0.0002703433360367667, 'ism': 0.003514463368477967, 'o p': 0.0002703433360367667, '". ': 0.0002703433360367667, 'ixo': 0.0002703433360367667, 'dat': 0.0002703433360367667, 'lan': 0.0008110300081103001, ' So': 0.0005406866720735334, 'Tim': 0.0002703433360367667, 'HOM': 0.0002703433360367667, 'ang': 0.0005406866720735334, 'ass': 0.0010813733441470668, 'dea': 0.0002703433360367667, 'whi': 0.0010813733441470668, 'Med': 0.0002703433360367667, ' ar': 0.0008110300081103001, 'M n': 0.0002703433360367667, 'h E': 0.0002703433360367667, 'e u': 0.0002703433360367667, 'ces': 0.0002703433360367667, 'f m': 0.0005406866720735334, 'r 7': 0.0002703433360367667, 'm f': 0.0002703433360367667, 'ogy': 0.0002703433360367667, 'naz': 0.0002703433360367667, 'hin': 0.0008110300081103001, 'dec': 0.0005406866720735334, 'ogi': 0.0002703433360367667, 'age': 0.0008110300081103001, 'd. ': 0.0002703433360367667, 'ses': 0.0005406866720735334, 'eig': 0.0002703433360367667, ' hu': 0.0002703433360367667, 'th ': 0.0013517166801838335, 'phe': 0.0002703433360367667, 'r H': 0.0002703433360367667, 'rch': 0.0010813733441470668, '. f': 0.0002703433360367667, '55.': 0.0002703433360367667, 'cto': 0.0002703433360367667, ' fo': 0.0024330900243309003, 'ett': 0.0002703433360367667, 'f s': 0.0005406866720735334, 'has': 0.0008110300081103001, 'ss ': 0.0008110300081103001, ' (M': 0.0002703433360367667, 'e R': 0.0002703433360367667, 'arl': 0.0010813733441470668, ' Re': 0.0002703433360367667, 'Her': 0.0002703433360367667, ' me': 0.0008110300081103001, 't d': 0.0005406866720735334, 'esp': 0.0005406866720735334, 'upp': 0.0002703433360367667, 'eco': 0.0002703433360367667, 'ple': 0.0002703433360367667, 'ren': 0.0002703433360367667, 'e i': 0.0024330900243309003}

In [61]:
from math import log
import copy

entropy = 0
for x in enProbs.items():
    entropy += x[1] * log(x[1], 2)
print(-entropy)

BTrigrams = Counter(enTrigrams + [ ';;;' ] + [ 'krk' ])
BModel = dict( [ (x[0], x[1]/(totalEn + 1)) for x in BTrigrams.items() ] )
#print(BModel)

kld = 0
for x in BModel.items():
    kld += x[1] * log(x[1]/enProbs.get(x[0], enProbs[';;;']), 2)
print(kld)
print(BModel['krk'] * log(enProbs[';;;']))


9.920303277129225
-0.00038997027967924337
-0.002220491295089852

In [50]:
text = """
巴塞羅那(加泰羅尼亞語:Barcelona)是加泰羅尼亞自治區首府、以及巴塞羅那省省會,位於伊比利亞半島的東北面,瀕臨地中海,全市人口約160萬,都會區人口則約5百萬。加泰羅尼亞自治區議會、行政機構、高等法院均設於此,1999年,巴塞羅那被美國國家地理雜誌選為五十個人生必遊景點之一。

相傳巴塞隆納由迦太基將領、漢尼拔的父親哈米爾卡·巴卡所興建,在其漫長的歷史上還曾作爲巴塞羅那伯爵領地和阿拉貢王國的都城。巴塞羅那因其衆多歷史建築和文化景點成爲衆多旅遊者的目的地,其中之代表是被列入聯合國世界遺產的安東尼·高第和路易·多門內克·蒙塔內的建築作品。安東尼·高第一直在巴塞羅那生活和工作,在這裏有他很多的作品,其中最著名的包括桂爾宮、桂爾公園和聖家堂。巴塞羅那尚有兩個知名的足球俱樂部:巴塞羅那和西班牙人,其中巴塞羅那是世界最著名的足球俱樂部之一。
關於城市的建立有兩種觀點,但均與哈米爾卡·巴卡有關。第一種認爲該城由古代英雄赫拉克勒斯在羅馬建立前400年(大約在西元前1153年[1])建立,後哈米爾卡·巴卡於西元前三世紀重建並以自己的姓氏命名。而第二種說法認爲哈米爾卡·巴卡直接建立了巴塞羅那。

大約在西元前15年,羅馬人以“台伯山”(現市政廳附近的一座小丘)爲中心將城市重新規劃爲一座羅馬兵營。其後巴塞羅那作爲羅馬人的殖民地被稱爲Colonia Faventia Julia Augusta Pia Barcino或Colonia Julia Augusta Faventia Paterna Barcino。

在羅馬地理學家梅拉的描述中巴塞羅那還是一些小鎮組成的地區,但在後世學者眼中,巴塞羅那因自古羅馬時期便是貿易中心,主要是因這個地中海的城市擁有其獨特的位置和天然良港得以逐漸繁榮起來,並且不需要負擔沈重的帝國財政,還可以自己鑄造貨幣。

時至今日很多羅馬時期的重要建築都已經嚴重損毀,但在著名的歷史名勝區“哥特區”仍可看出昔日的格狀規劃。一些殘存的羅馬時期的城牆被集中於西元343年始建的拉蘇大教堂中。

西元五世紀早期(418年)巴塞羅那被西哥特王国人征服,八世紀早期被摩爾人征服,但西元801年查理曼大帝之子路易將其佔領並使之成爲加洛林王朝的西班牙邊疆區,由巴塞羅那伯爵統治。在西元985年被阿爾-曼蘇爾洗劫之前巴塞羅那都是基督教在伊比利亞半島的前沿堡壘。後來巴塞羅那伯爵日漸顯示出其獨立性並擴張領地直至統治整個加泰羅尼亞。西元12世紀由於王室聯姻的結果巴塞羅那伯爵頭銜由阿拉貢國王拉米羅二世繼承,加泰羅尼亞並入阿拉貢王國,到13世紀阿拉貢國王統治了包括那不勒斯、西西里在內的西地中海,並一度統治雅典。隨後阿拉貢王室與卡斯蒂利亞王室聯姻,兩國合併形成了今日西班牙的主體,加泰羅尼亞成爲西班牙的一部分,獨立性日漸減少。但時至今日仍有加泰羅尼亞獨立運動存在。
"""

frTrigrams = [ "".join(x) for x in ngrams(text, 3) ]
frFp = Counter(frTrigrams + [ ';;;' ])
# print(frFp)
totalFr = sum(frFp.values())
# print(totalFr)
frProbs = dict( [ (x[0], x[1] / totalFr) for x in frFp.items() ] )
print(frProbs)


{'蒙塔內': 0.0008417508417508417, '父親哈': 0.0008417508417508417, 'ave': 0.0016835016835016834, ' Pi': 0.0008417508417508417, '的都城': 0.0008417508417508417, '年,巴': 0.0008417508417508417, '隨後阿': 0.0008417508417508417, '-曼蘇': 0.0008417508417508417, '議會、': 0.0008417508417508417, '並且不': 0.0008417508417508417, '羅那尚': 0.0008417508417508417, '。在西': 0.0008417508417508417, '阿拉貢': 0.004208754208754209, '度統治': 0.0008417508417508417, '內的建': 0.0008417508417508417, '家地理': 0.0008417508417508417, '治區議': 0.0008417508417508417, '日漸顯': 0.0008417508417508417, '60萬': 0.0008417508417508417, '被稱爲': 0.0008417508417508417, '立性日': 0.0008417508417508417, '其衆多': 0.0008417508417508417, '財政,': 0.0008417508417508417, '爲一座': 0.0008417508417508417, '近的一': 0.0008417508417508417, '153': 0.0008417508417508417, '合國世': 0.0008417508417508417, 'a J': 0.0016835016835016834, '馬地理': 0.0008417508417508417, '\n\n在': 0.0008417508417508417, '統治了': 0.0008417508417508417, '\n巴塞': 0.0008417508417508417, '築作品': 0.0008417508417508417, '拉貢國': 0.0016835016835016834, '羅尼亞': 0.005892255892255892, '西元前': 0.0025252525252525255, 'Bar': 0.0025252525252525255, '邊疆區': 0.0008417508417508417, '牙人,': 0.0008417508417508417, '面,瀕': 0.0008417508417508417, '、西西': 0.0008417508417508417, '省省會': 0.0008417508417508417, '部分,': 0.0008417508417508417, '市政廳': 0.0008417508417508417, '馬人的': 0.0008417508417508417, '學者眼': 0.0008417508417508417, '領地和': 0.0008417508417508417, '併形成': 0.0008417508417508417, '衆多旅': 0.0008417508417508417, '西元3': 0.0008417508417508417, '聯姻的': 0.0008417508417508417, '桂爾公': 0.0008417508417508417, '尼亞並': 0.0008417508417508417, '兵營。': 0.0008417508417508417, '爾洗劫': 0.0008417508417508417, '他很多': 0.0008417508417508417, '曾作爲': 0.0008417508417508417, '中最著': 0.0008417508417508417, '個地中': 0.0008417508417508417, '狀規劃': 0.0008417508417508417, '長的歷': 0.0008417508417508417, '一些小': 0.0008417508417508417, '認爲哈': 0.0008417508417508417, '那生活': 0.0008417508417508417, '世紀阿': 0.0008417508417508417, '都是基': 0.0008417508417508417, '姻,兩': 0.0008417508417508417, '高等法': 0.0008417508417508417, '有他很': 0.0008417508417508417, '規劃。': 0.0008417508417508417, '學家梅': 0.0008417508417508417, '的結果': 0.0008417508417508417, '在羅馬': 0.0016835016835016834, '由古代': 0.0008417508417508417, '約16': 0.0008417508417508417, '前40': 0.0008417508417508417, '會、行': 0.0008417508417508417, '殖民地': 0.0008417508417508417, '並一度': 0.0008417508417508417, '必遊景': 0.0008417508417508417, '羅馬建': 0.0008417508417508417, '是加泰': 0.0008417508417508417, '貨幣。': 0.0008417508417508417, '後世學': 0.0008417508417508417, '中於西': 0.0008417508417508417, '馬建立': 0.0008417508417508417, '帝國財': 0.0008417508417508417, '帝之子': 0.0008417508417508417, '羅二世': 0.0008417508417508417, '的姓氏': 0.0008417508417508417, ',但西': 0.0008417508417508417, '宮、桂': 0.0008417508417508417, '林王朝': 0.0008417508417508417, '特王国': 0.0008417508417508417, '景點之': 0.0008417508417508417, '\n關於': 0.0008417508417508417, '那因其': 0.0008417508417508417, '內克·': 0.0008417508417508417, '領並使': 0.0008417508417508417, '今日很': 0.0008417508417508417, '拉貢王': 0.0025252525252525255, '米爾卡': 0.003367003367003367, '約5百': 0.0008417508417508417, '其漫長': 0.0008417508417508417, '點成爲': 0.0008417508417508417, '顯示出': 0.0008417508417508417, '作品。': 0.0008417508417508417, '一度統': 0.0008417508417508417, '因自古': 0.0008417508417508417, ')爲中': 0.0008417508417508417, '多歷史': 0.0008417508417508417, '中心將': 0.0008417508417508417, '拉蘇大': 0.0008417508417508417, '爲衆多': 0.0008417508417508417, 'gus': 0.0016835016835016834, '00年': 0.0008417508417508417, '目的地': 0.0008417508417508417, '接建立': 0.0008417508417508417, '13世': 0.0008417508417508417, 'ino': 0.0016835016835016834, '摩爾人': 0.0008417508417508417, '亞王室': 0.0008417508417508417, '羅那被': 0.0016835016835016834, '拉米羅': 0.0008417508417508417, ',獨立': 0.0008417508417508417, '都會區': 0.0008417508417508417, '羅那是': 0.0008417508417508417, '表是被': 0.0008417508417508417, '400': 0.0008417508417508417, '羅那作': 0.0008417508417508417, 'ia ': 0.005892255892255892, '始建的': 0.0008417508417508417, 'a)是': 0.0008417508417508417, '第一種': 0.0008417508417508417, ';;;': 0.0008417508417508417, '遊者的': 0.0008417508417508417, '年,羅': 0.0008417508417508417, '。其後': 0.0008417508417508417, '省會,': 0.0008417508417508417, '的足球': 0.0016835016835016834, '構、高': 0.0008417508417508417, '少。但': 0.0008417508417508417, '斯在羅': 0.0008417508417508417, '已經嚴': 0.0008417508417508417, '港得以': 0.0008417508417508417, '後哈米': 0.0008417508417508417, '是因這': 0.0008417508417508417, '政廳附': 0.0008417508417508417, '卡直接': 0.0008417508417508417, '者的目': 0.0008417508417508417, '球俱樂': 0.0016835016835016834, '兩國合': 0.0008417508417508417, '其後巴': 0.0008417508417508417, '樂部之': 0.0008417508417508417, '動存在': 0.0008417508417508417, '其中巴': 0.0008417508417508417, '佔領並': 0.0008417508417508417, '\n\n時': 0.0008417508417508417, '的作品': 0.0008417508417508417, '區”仍': 0.0008417508417508417, '可以自': 0.0008417508417508417, '3年[': 0.0008417508417508417, '985': 0.0008417508417508417, '地和阿': 0.0008417508417508417, '的一部': 0.0008417508417508417, '起來,': 0.0008417508417508417, '亞並入': 0.0008417508417508417, '羅那省': 0.0008417508417508417, '15年': 0.0008417508417508417, '區人口': 0.0008417508417508417, '人征服': 0.0016835016835016834, '世紀重': 0.0008417508417508417, '的主體': 0.0008417508417508417, '一些殘': 0.0008417508417508417, '\n大約': 0.0008417508417508417, '、以及': 0.0008417508417508417, '並以自': 0.0008417508417508417, '名的足': 0.0016835016835016834, '結果巴': 0.0008417508417508417, ',在其': 0.0008417508417508417, '尚有兩': 0.0008417508417508417, '。第一': 0.0008417508417508417, '哥特區': 0.0008417508417508417, '的羅馬': 0.0008417508417508417, '伯爵統': 0.0008417508417508417, '年被阿': 0.0008417508417508417, 'na ': 0.0008417508417508417, '個知名': 0.0008417508417508417, '史上還': 0.0008417508417508417, '至今日': 0.0016835016835016834, '民地被': 0.0008417508417508417, '並入阿': 0.0008417508417508417, '1年查': 0.0008417508417508417, '有關。': 0.0008417508417508417, '出昔日': 0.0008417508417508417, '歷史建': 0.0008417508417508417, '座羅馬': 0.0008417508417508417, '01年': 0.0008417508417508417, '後巴塞': 0.0008417508417508417, '日西班': 0.0008417508417508417, '首府、': 0.0008417508417508417, '43年': 0.0008417508417508417, '百萬。': 0.0008417508417508417, '哥特王': 0.0008417508417508417, '上還曾': 0.0008417508417508417, '蘇爾洗': 0.0008417508417508417, '伯山”': 0.0008417508417508417, '繁榮起': 0.0008417508417508417, 'sta': 0.0016835016835016834, '曼大帝': 0.0008417508417508417, '區議會': 0.0008417508417508417, '劫之前': 0.0008417508417508417, '巴卡於': 0.0008417508417508417, '一。\n': 0.0016835016835016834, 'lon': 0.0025252525252525255, '被西哥': 0.0008417508417508417, '時期的': 0.0016835016835016834, '典。隨': 0.0008417508417508417, '命名。': 0.0008417508417508417, '\n\n相': 0.0008417508417508417, '了包括': 0.0008417508417508417, '。後來': 0.0008417508417508417, '代英雄': 0.0008417508417508417, '爾宮、': 0.0008417508417508417, '將領、': 0.0008417508417508417, '易·多': 0.0008417508417508417, 'nti': 0.0016835016835016834, '世紀由': 0.0008417508417508417, '那因自': 0.0008417508417508417, '與卡斯': 0.0008417508417508417, 'elo': 0.0008417508417508417, '嚴重損': 0.0008417508417508417, '羅那都': 0.0008417508417508417, '領地直': 0.0008417508417508417, '要建築': 0.0008417508417508417, '築都已': 0.0008417508417508417, '爲中心': 0.0008417508417508417, '、漢尼': 0.0008417508417508417, '可看出': 0.0008417508417508417, '418': 0.0008417508417508417, '卡於西': 0.0008417508417508417, '巴卡直': 0.0008417508417508417, '53年': 0.0008417508417508417, '中。\n': 0.0008417508417508417, '85年': 0.0008417508417508417, '人口則': 0.0008417508417508417, '作,在': 0.0008417508417508417, '西元五': 0.0008417508417508417, '漫長的': 0.0008417508417508417, '然良港': 0.0008417508417508417, '立,後': 0.0008417508417508417, '但西元': 0.0008417508417508417, '服,但': 0.0008417508417508417, '作品,': 0.0008417508417508417, '期便是': 0.0008417508417508417, '榮起來': 0.0008417508417508417, '山”(': 0.0008417508417508417, '貿易中': 0.0008417508417508417, '地理雜': 0.0008417508417508417, '知名的': 0.0008417508417508417, '名。而': 0.0008417508417508417, '教堂中': 0.0008417508417508417, ' Pa': 0.0008417508417508417, '性日漸': 0.0008417508417508417, '樂部:': 0.0008417508417508417, '的安東': 0.0008417508417508417, '爵領地': 0.0008417508417508417, '漸顯示': 0.0008417508417508417, '99年': 0.0008417508417508417, '”(現': 0.0008417508417508417, '天然良': 0.0008417508417508417, 'na)': 0.0008417508417508417, '世學者': 0.0008417508417508417, '時至今': 0.0016835016835016834, '築和文': 0.0008417508417508417, '觀點,': 0.0008417508417508417, '。安東': 0.0008417508417508417, '馬時期': 0.0025252525252525255, '五十個': 0.0008417508417508417, '述中巴': 0.0008417508417508417, '中巴塞': 0.0016835016835016834, '至統治': 0.0008417508417508417, '十個人': 0.0008417508417508417, '損毀,': 0.0008417508417508417, 'Col': 0.0016835016835016834, '爲加洛': 0.0008417508417508417, '的拉蘇': 0.0008417508417508417, 'ter': 0.0008417508417508417, '多羅馬': 0.0008417508417508417, '有其獨': 0.0008417508417508417, '建立前': 0.0008417508417508417, '西哥特': 0.0008417508417508417, '體,加': 0.0008417508417508417, '爾-曼': 0.0008417508417508417, ',都會': 0.0008417508417508417, '集中於': 0.0008417508417508417, '看出昔': 0.0008417508417508417, '劃爲一': 0.0008417508417508417, '小丘)': 0.0008417508417508417, '第二種': 0.0008417508417508417, '後來巴': 0.0008417508417508417, '列入聯': 0.0008417508417508417, '有加泰': 0.0008417508417508417, '良港得': 0.0008417508417508417, '而第二': 0.0008417508417508417, '牆被集': 0.0008417508417508417, '年)巴': 0.0008417508417508417, '牙的主': 0.0008417508417508417, '部之一': 0.0008417508417508417, 'rce': 0.0008417508417508417, '括桂爾': 0.0008417508417508417, '2世紀': 0.0008417508417508417, '存在。': 0.0008417508417508417, '蘇大教': 0.0008417508417508417, '立運動': 0.0008417508417508417, '門內克': 0.0008417508417508417, '化景點': 0.0008417508417508417, '牙的一': 0.0008417508417508417, '尼亞自': 0.0016835016835016834, '整個加': 0.0008417508417508417, '18年': 0.0008417508417508417, '名的歷': 0.0008417508417508417, 'rci': 0.0016835016835016834, '被集中': 0.0008417508417508417, '拉克勒': 0.0008417508417508417, '座小丘': 0.0008417508417508417, '國世界': 0.0008417508417508417, '心將城': 0.0008417508417508417, '加洛林': 0.0008417508417508417, '時期便': 0.0008417508417508417, '在伊比': 0.0008417508417508417, '語:B': 0.0008417508417508417, '名勝區': 0.0008417508417508417, '的建築': 0.0008417508417508417, '點之一': 0.0008417508417508417, '聯姻,': 0.0008417508417508417, '毀,但': 0.0008417508417508417, '經嚴重': 0.0008417508417508417, '裏有他': 0.0008417508417508417, '利亞王': 0.0008417508417508417, '爵日漸': 0.0008417508417508417, '尼·高': 0.0016835016835016834, '納由迦': 0.0008417508417508417, ',兩國': 0.0008417508417508417, '雄赫拉': 0.0008417508417508417, '因其衆': 0.0008417508417508417, '承,加': 0.0008417508417508417, '3世紀': 0.0008417508417508417, ':Ba': 0.0008417508417508417, '成爲加': 0.0008417508417508417, '於王室': 0.0008417508417508417, '中之代': 0.0008417508417508417, '品,其': 0.0008417508417508417, '這個地': 0.0008417508417508417, '重的帝': 0.0008417508417508417, '的前沿': 0.0008417508417508417, '於此,': 0.0008417508417508417, '作爲巴': 0.0008417508417508417, '·高第': 0.0016835016835016834, '一座羅': 0.0008417508417508417, '很多的': 0.0008417508417508417, '萬。加': 0.0008417508417508417, '區,但': 0.0008417508417508417, '但時至': 0.0008417508417508417, '地區,': 0.0008417508417508417, '立有兩': 0.0008417508417508417, '二種說': 0.0008417508417508417, 'o或C': 0.0008417508417508417, 'no。': 0.0008417508417508417, '歷史名': 0.0008417508417508417, '(現市': 0.0008417508417508417, '鎮組成': 0.0008417508417508417, '的位置': 0.0008417508417508417, '199': 0.0008417508417508417, '形成了': 0.0008417508417508417, '5年被': 0.0008417508417508417, '之成爲': 0.0008417508417508417, '太基將': 0.0008417508417508417, '包括桂': 0.0008417508417508417, '得以逐': 0.0008417508417508417, '隆納由': 0.0008417508417508417, '。西元': 0.0008417508417508417, '了巴塞': 0.0008417508417508417, ',主要': 0.0008417508417508417, '115': 0.0008417508417508417, '造貨幣': 0.0008417508417508417, '的建立': 0.0008417508417508417, '亞成爲': 0.0008417508417508417, '其佔領': 0.0008417508417508417, '迦太基': 0.0008417508417508417, '督教在': 0.0008417508417508417, '自己的': 0.0008417508417508417, '興建,': 0.0008417508417508417, '由於王': 0.0008417508417508417, '日漸減': 0.0008417508417508417, '機構、': 0.0008417508417508417, '在內的': 0.0008417508417508417, '旅遊者': 0.0008417508417508417, '卡有關': 0.0008417508417508417, '及巴塞': 0.0008417508417508417, '建的拉': 0.0008417508417508417, '於西元': 0.0016835016835016834, '合併形': 0.0008417508417508417, '801': 0.0008417508417508417, '紀重建': 0.0008417508417508417, '其中最': 0.0008417508417508417, '家堂。': 0.0008417508417508417, '兩種觀': 0.0008417508417508417, '卡所興': 0.0008417508417508417, '逐漸繁': 0.0008417508417508417, '人,其': 0.0008417508417508417, ')建立': 0.0008417508417508417, '赫拉克': 0.0008417508417508417, '的包括': 0.0008417508417508417, '理曼大': 0.0008417508417508417, '使之成': 0.0008417508417508417, '王室與': 0.0008417508417508417, '幣。\n': 0.0008417508417508417, '卡·巴': 0.003367003367003367, ',由巴': 0.0008417508417508417, '早期被': 0.0008417508417508417, ',巴塞': 0.0016835016835016834, '親哈米': 0.0008417508417508417, '後阿拉': 0.0008417508417508417, '以及巴': 0.0008417508417508417, '馬兵營': 0.0008417508417508417, '都城。': 0.0008417508417508417, '俱樂部': 0.0016835016835016834, '高第一': 0.0008417508417508417, '會,位': 0.0008417508417508417, ',八世': 0.0008417508417508417, '該城由': 0.0008417508417508417, '擴張領': 0.0008417508417508417, ')巴塞': 0.0008417508417508417, '伯爵日': 0.0008417508417508417, '那是世': 0.0008417508417508417, '爾卡·': 0.003367003367003367, '八世紀': 0.0008417508417508417, '被列入': 0.0008417508417508417, '是基督': 0.0008417508417508417, '蒂利亞': 0.0008417508417508417, '人以“': 0.0008417508417508417, '存的羅': 0.0008417508417508417, '洗劫之': 0.0008417508417508417, '。巴塞': 0.0016835016835016834, '重新規': 0.0008417508417508417, '繼承,': 0.0008417508417508417, ',到1': 0.0008417508417508417, '政,還': 0.0008417508417508417, '示出其': 0.0008417508417508417, '擁有其': 0.0008417508417508417, '作爲羅': 0.0008417508417508417, '元34': 0.0008417508417508417, '海,並': 0.0008417508417508417, '稱爲C': 0.0008417508417508417, '心,主': 0.0008417508417508417, '立性並': 0.0008417508417508417, ',全市': 0.0008417508417508417, 'rna': 0.0008417508417508417, '亞語:': 0.0008417508417508417, '北面,': 0.0008417508417508417, '氏命名': 0.0008417508417508417, '獨立運': 0.0008417508417508417, '地直至': 0.0008417508417508417, '前11': 0.0008417508417508417, '人的殖': 0.0008417508417508417, '個人生': 0.0008417508417508417, '瀕臨地': 0.0008417508417508417, '12世': 0.0008417508417508417, '格狀規': 0.0008417508417508417, '尼亞語': 0.0008417508417508417, '種認爲': 0.0008417508417508417, '府、以': 0.0008417508417508417, '1])': 0.0008417508417508417, '拉的描': 0.0008417508417508417, 'Pat': 0.0008417508417508417, '塞隆納': 0.0008417508417508417, '政機構': 0.0008417508417508417, '大教堂': 0.0008417508417508417, '易中心': 0.0008417508417508417, '3年始': 0.0008417508417508417, '、行政': 0.0008417508417508417, ' Ba': 0.0016835016835016834, '並使之': 0.0008417508417508417, '那都是': 0.0008417508417508417, 'ona': 0.0008417508417508417, '加泰羅': 0.005892255892255892, '品。安': 0.0008417508417508417, '易將其': 0.0008417508417508417, '”仍可': 0.0008417508417508417, '今日西': 0.0008417508417508417, '羅馬人': 0.0016835016835016834, '\n\n大': 0.0008417508417508417, '其中之': 0.0008417508417508417, '將城市': 0.0008417508417508417, '建立,': 0.0008417508417508417, '些小鎮': 0.0008417508417508417, '巴塞羅': 0.015151515151515152, '關。第': 0.0008417508417508417, '的地區': 0.0008417508417508417, '有兩種': 0.0008417508417508417, '等法院': 0.0008417508417508417, '雜誌選': 0.0008417508417508417, '元80': 0.0008417508417508417, '班牙的': 0.0016835016835016834, '斯、西': 0.0008417508417508417, '與哈米': 0.0008417508417508417, '文化景': 0.0008417508417508417, '在著名': 0.0008417508417508417, '在巴塞': 0.0008417508417508417, '以“台': 0.0008417508417508417, '性並擴': 0.0008417508417508417, '城市擁': 0.0008417508417508417, '區首府': 0.0008417508417508417, '地被稱': 0.0008417508417508417, 'no或': 0.0008417508417508417, 'Pia': 0.0008417508417508417, '克·蒙': 0.0008417508417508417, '自古羅': 0.0008417508417508417, 'a B': 0.0016835016835016834, '認爲該': 0.0008417508417508417, '伊比利': 0.0016835016835016834, '塔內的': 0.0008417508417508417, '部:巴': 0.0008417508417508417, ',還可': 0.0008417508417508417, '壘。後': 0.0008417508417508417, '姓氏命': 0.0008417508417508417, '於伊比': 0.0008417508417508417, '西地中': 0.0008417508417508417, '負擔沈': 0.0008417508417508417, '年查理': 0.0008417508417508417, ' Ju': 0.0016835016835016834, '是世界': 0.0008417508417508417, '那尚有': 0.0008417508417508417, '理學家': 0.0008417508417508417, '昔日的': 0.0008417508417508417, '五世紀': 0.0008417508417508417, '規劃爲': 0.0008417508417508417, '擔沈重': 0.0008417508417508417, '主體,': 0.0008417508417508417, '城市重': 0.0008417508417508417, '領、漢': 0.0008417508417508417, '343': 0.0008417508417508417, '的重要': 0.0008417508417508417, '立前4': 0.0008417508417508417, '的描述': 0.0008417508417508417, 'cel': 0.0008417508417508417, '全市人': 0.0008417508417508417, '的地,': 0.0008417508417508417, '羅那和': 0.0008417508417508417, '和工作': 0.0008417508417508417, '位於伊': 0.0008417508417508417, '西里在': 0.0008417508417508417, '多旅遊': 0.0008417508417508417, ',加泰': 0.0016835016835016834, '爲西班': 0.0008417508417508417, '仍可看': 0.0008417508417508417, '前巴塞': 0.0008417508417508417, '期被摩': 0.0008417508417508417, '日的格': 0.0008417508417508417, '9年,': 0.0008417508417508417, 'Fav': 0.0016835016835016834, ',位於': 0.0008417508417508417, 'ugu': 0.0016835016835016834, '口則約': 0.0008417508417508417, '約在西': 0.0016835016835016834, '被阿爾': 0.0008417508417508417, '斯蒂利': 0.0008417508417508417, 'ent': 0.0016835016835016834, '為五十': 0.0008417508417508417, '島的東': 0.0008417508417508417, '成了今': 0.0008417508417508417, 'o。\n': 0.0008417508417508417, '海,全': 0.0008417508417508417, ',其中': 0.0025252525252525255, '者眼中': 0.0008417508417508417, '漢尼拔': 0.0008417508417508417, '勒斯、': 0.0008417508417508417, '世繼承': 0.0008417508417508417, '獨特的': 0.0008417508417508417, '在。\n': 0.0008417508417508417, '建築和': 0.0008417508417508417, '頭銜由': 0.0008417508417508417, '克勒斯': 0.0008417508417508417, '園和聖': 0.0008417508417508417, '、高等': 0.0008417508417508417, '在西元': 0.0025252525252525255, '漸繁榮': 0.0008417508417508417, '口約1': 0.0008417508417508417, '且不需': 0.0008417508417508417, 'a F': 0.0016835016835016834, '西元8': 0.0008417508417508417, '建,在': 0.0008417508417508417, '牙邊疆': 0.0008417508417508417, '需要負': 0.0008417508417508417, '自己鑄': 0.0008417508417508417, '人生必': 0.0008417508417508417, '5年,': 0.0008417508417508417, '伯爵領': 0.0008417508417508417, ',瀕臨': 0.0008417508417508417, '要是因': 0.0008417508417508417, '二世繼': 0.0008417508417508417, '區,由': 0.0008417508417508417, '亞。西': 0.0008417508417508417, ',羅馬': 0.0008417508417508417, ',在這': 0.0008417508417508417, '前沿堡': 0.0008417508417508417, '自治區': 0.0016835016835016834, '王室聯': 0.0016835016835016834, '則約5': 0.0008417508417508417, '國國家': 0.0008417508417508417, '一部分': 0.0008417508417508417, '史建築': 0.0008417508417508417, '羅馬兵': 0.0008417508417508417, '現市政': 0.0008417508417508417, '或Co': 0.0008417508417508417, '大約在': 0.0016835016835016834, '萬,都': 0.0008417508417508417, '尼拔的': 0.0008417508417508417, '區“哥': 0.0008417508417508417, '遺產的': 0.0008417508417508417, '海的城': 0.0008417508417508417, '\n時至': 0.0008417508417508417, '·蒙塔': 0.0008417508417508417, '活和工': 0.0008417508417508417, '說法認': 0.0008417508417508417, '行政機': 0.0008417508417508417, 'arc': 0.0025252525252525255, '。一些': 0.0008417508417508417, '羅馬地': 0.0008417508417508417, 'cin': 0.0016835016835016834, '個加泰': 0.0008417508417508417, '特的位': 0.0008417508417508417, '·巴卡': 0.003367003367003367, '公園和': 0.0008417508417508417, 'ern': 0.0008417508417508417, '在後世': 0.0008417508417508417, '院均設': 0.0008417508417508417, '安東尼': 0.0016835016835016834, '日仍有': 0.0008417508417508417, '那還是': 0.0008417508417508417, '(加泰': 0.0008417508417508417, '小鎮組': 0.0008417508417508417, '些殘存': 0.0008417508417508417, '一座小': 0.0008417508417508417, '會區人': 0.0008417508417508417, '獨立性': 0.0016835016835016834, '但在後': 0.0008417508417508417, '以逐漸': 0.0008417508417508417, '描述中': 0.0008417508417508417, '班牙邊': 0.0008417508417508417, '教在伊': 0.0008417508417508417, '和西班': 0.0008417508417508417, '那被西': 0.0008417508417508417, '將其佔': 0.0008417508417508417, '大帝之': 0.0008417508417508417, 'ven': 0.0016835016835016834, '由巴塞': 0.0008417508417508417, '米羅二': 0.0008417508417508417, '那。\n': 0.0008417508417508417, '廳附近': 0.0008417508417508417, '\n相傳': 0.0008417508417508417, '美國國': 0.0008417508417508417, 'ate': 0.0008417508417508417, '亞獨立': 0.0008417508417508417, 'Jul': 0.0016835016835016834, '島的前': 0.0008417508417508417, '市重新': 0.0008417508417508417, '因這個': 0.0008417508417508417, '市的建': 0.0008417508417508417, '臨地中': 0.0008417508417508417, '關於城': 0.0008417508417508417, '治區首': 0.0008417508417508417, '前15': 0.0008417508417508417, '羅那還': 0.0008417508417508417, '泰羅尼': 0.005892255892255892, '括那不': 0.0008417508417508417, '便是貿': 0.0008417508417508417, '城。巴': 0.0008417508417508417, ' Fa': 0.0016835016835016834, '國的都': 0.0008417508417508417, '世界遺': 0.0008417508417508417, '古代英': 0.0008417508417508417, '高第和': 0.0008417508417508417, ',並一': 0.0008417508417508417, ',19': 0.0008417508417508417, '是被列': 0.0008417508417508417, 'oni': 0.0016835016835016834, '由迦太': 0.0008417508417508417, '王統治': 0.0008417508417508417, '半島的': 0.0016835016835016834, '期的城': 0.0008417508417508417, '生必遊': 0.0008417508417508417, '的格狀': 0.0008417508417508417, '遊景點': 0.0008417508417508417, '·多門': 0.0008417508417508417, '元前三': 0.0008417508417508417, '世紀早': 0.0016835016835016834, '在這裏': 0.0008417508417508417, 'ust': 0.0016835016835016834, '貢國王': 0.0016835016835016834, '法認爲': 0.0008417508417508417, 'tia': 0.0016835016835016834, '室聯姻': 0.0016835016835016834, '國財政': 0.0008417508417508417, '並擴張': 0.0008417508417508417, '傳巴塞': 0.0008417508417508417, '貢王室': 0.0008417508417508417, '被摩爾': 0.0008417508417508417, '疆區,': 0.0008417508417508417, '0萬,': 0.0008417508417508417, '999': 0.0008417508417508417, '地,其': 0.0008417508417508417, '還曾作': 0.0008417508417508417, ',並且': 0.0008417508417508417, '治整個': 0.0008417508417508417, '(大約': 0.0008417508417508417, '“哥特': 0.0008417508417508417, '0年(': 0.0008417508417508417, '曼蘇爾': 0.0008417508417508417, '其獨立': 0.0008417508417508417, '姻的結': 0.0008417508417508417, ',後哈': 0.0008417508417508417, '堡壘。': 0.0008417508417508417, ',但在': 0.0016835016835016834, ')是加': 0.0008417508417508417, '那伯爵': 0.003367003367003367, '桂爾宮': 0.0008417508417508417, '的西班': 0.0008417508417508417, '爵頭銜': 0.0008417508417508417, '路易將': 0.0008417508417508417, '組成的': 0.0008417508417508417, '史名勝': 0.0008417508417508417, '今日仍': 0.0008417508417508417, '王國的': 0.0008417508417508417, '其獨特': 0.0008417508417508417, '堂。巴': 0.0008417508417508417, '朝的西': 0.0008417508417508417, '王國,': 0.0008417508417508417, '治了包': 0.0008417508417508417, '種說法': 0.0008417508417508417, '和路易': 0.0008417508417508417, '年[1': 0.0008417508417508417, '元12': 0.0008417508417508417, '主要是': 0.0008417508417508417, '法院均': 0.0008417508417508417, '新規劃': 0.0008417508417508417, '尼亞獨': 0.0008417508417508417, '的歷史': 0.0016835016835016834, '西元9': 0.0008417508417508417, '元98': 0.0008417508417508417, '所興建': 0.0008417508417508417, '建立有': 0.0008417508417508417, '羅那。': 0.0008417508417508417, '由阿拉': 0.0008417508417508417, '重損毀': 0.0008417508417508417, '理雜誌': 0.0008417508417508417, '名的包': 0.0008417508417508417, '東北面': 0.0008417508417508417, '歷史上': 0.0008417508417508417, '沈重的': 0.0008417508417508417, ':巴塞': 0.0008417508417508417, '建築作': 0.0008417508417508417, '產的安': 0.0008417508417508417, '中海的': 0.0008417508417508417, 'a P': 0.0016835016835016834, '和聖家': 0.0008417508417508417, '期的重': 0.0008417508417508417, '來,並': 0.0008417508417508417, '。但時': 0.0008417508417508417, 'ta ': 0.0016835016835016834, '在其漫': 0.0008417508417508417, '一直在': 0.0008417508417508417, '爲巴塞': 0.0008417508417508417, '相傳巴': 0.0008417508417508417, '要負擔': 0.0008417508417508417, '尼亞成': 0.0008417508417508417, '成的地': 0.0008417508417508417, '還可以': 0.0008417508417508417, '的一座': 0.0008417508417508417, '尼亞。': 0.0008417508417508417, '中,巴': 0.0008417508417508417, '成爲西': 0.0008417508417508417, '那(加': 0.0008417508417508417, '建並以': 0.0008417508417508417, '出其獨': 0.0008417508417508417, '衆多歷': 0.0008417508417508417, '之子路': 0.0008417508417508417, '紀早期': 0.0016835016835016834, '均設於': 0.0008417508417508417, '重建並': 0.0008417508417508417, '內的西': 0.0008417508417508417, '統治雅': 0.0008417508417508417, '建築都': 0.0008417508417508417, '直在巴': 0.0008417508417508417, '來巴塞': 0.0008417508417508417, '直至統': 0.0008417508417508417, 'Aug': 0.0016835016835016834, '此,1': 0.0008417508417508417, '被美國': 0.0008417508417508417, '西元1': 0.0008417508417508417, '梅拉的': 0.0008417508417508417, '設於此': 0.0008417508417508417, '馬人以': 0.0008417508417508417, '服,八': 0.0008417508417508417, '亞自治': 0.0016835016835016834, '拔的父': 0.0008417508417508417, '眼中,': 0.0008417508417508417, '家梅拉': 0.0008417508417508417, '年始建': 0.0008417508417508417, '世界最': 0.0008417508417508417, '和阿拉': 0.0008417508417508417, '選為五': 0.0008417508417508417, '營。其': 0.0008417508417508417, '的城市': 0.0008417508417508417, '成爲衆': 0.0008417508417508417, '是一些': 0.0008417508417508417, '。加泰': 0.0008417508417508417, '王拉米': 0.0008417508417508417, '一種認': 0.0008417508417508417, '兩個知': 0.0008417508417508417, '誌選為': 0.0008417508417508417, '伯爵頭': 0.0008417508417508417, '地中海': 0.0025252525252525255, '多的作': 0.0008417508417508417, '雅典。': 0.0008417508417508417, '羅那因': 0.0016835016835016834, '班牙人': 0.0008417508417508417, '元五世': 0.0008417508417508417, '羅馬時': 0.0025252525252525255, '王朝的': 0.0008417508417508417, '國家地': 0.0008417508417508417, '中海,': 0.0016835016835016834, '於城市': 0.0008417508417508417, '爲羅馬': 0.0008417508417508417, '是貿易': 0.0008417508417508417, '還是一': 0.0008417508417508417, '紀由於': 0.0008417508417508417, '治。在': 0.0008417508417508417, '聖家堂': 0.0008417508417508417, '界最著': 0.0008417508417508417, '立了巴': 0.0008417508417508417, '己鑄造': 0.0008417508417508417, '入阿拉': 0.0008417508417508417, '第一直': 0.0008417508417508417, '減少。': 0.0008417508417508417, '果巴塞': 0.0008417508417508417, '爲Co': 0.0008417508417508417, '英雄赫': 0.0008417508417508417, '國,到': 0.0008417508417508417, '期(4': 0.0008417508417508417, '多門內': 0.0008417508417508417, '地理學': 0.0008417508417508417, '殘存的': 0.0008417508417508417, '巴卡有': 0.0008417508417508417, '劃。一': 0.0008417508417508417, '爾人征': 0.0008417508417508417, '統治整': 0.0008417508417508417, '那省省': 0.0008417508417508417, '聯合國': 0.0008417508417508417, '都已經': 0.0008417508417508417, '。\n\n': 0.004208754208754209, '羅那生': 0.0008417508417508417, '王国人': 0.0008417508417508417, '早期(': 0.0008417508417508417, '重要建': 0.0008417508417508417, '前三世': 0.0008417508417508417, '勒斯在': 0.0008417508417508417, '洛林王': 0.0008417508417508417, '景點成': 0.0008417508417508417, '“台伯': 0.0008417508417508417, '三世紀': 0.0008417508417508417, '。而第': 0.0008417508417508417, '西西里': 0.0008417508417508417, '市擁有': 0.0008417508417508417, '的城牆': 0.0008417508417508417, 'uli': 0.0016835016835016834, '統治。': 0.0008417508417508417, '紀阿拉': 0.0008417508417508417, '市人口': 0.0008417508417508417, '東尼·': 0.0016835016835016834, '這裏有': 0.0008417508417508417, '很多羅': 0.0008417508417508417, '之代表': 0.0008417508417508417, '台伯山': 0.0008417508417508417, 'olo': 0.0016835016835016834, '爲該城': 0.0008417508417508417, '160': 0.0008417508417508417, '種觀點': 0.0008417508417508417, '沿堡壘': 0.0008417508417508417, '城牆被': 0.0008417508417508417, '漸減少': 0.0008417508417508417, '爵統治': 0.0008417508417508417, '有兩個': 0.0008417508417508417, '[1]': 0.0008417508417508417, '的目的': 0.0008417508417508417, '治雅典': 0.0008417508417508417, '運動存': 0.0008417508417508417, '和文化': 0.0008417508417508417, '鑄造貨': 0.0008417508417508417, 'nia': 0.0016835016835016834, '均與哈': 0.0008417508417508417, '\n在羅': 0.0008417508417508417, '元前1': 0.0016835016835016834, '包括那': 0.0008417508417508417, '點,但': 0.0008417508417508417, '\n西元': 0.0008417508417508417, '直接建': 0.0008417508417508417, '之一。': 0.0016835016835016834, '國王拉': 0.0008417508417508417, '建立了': 0.0008417508417508417, '])建': 0.0008417508417508417, '那不勒': 0.0008417508417508417, '和天然': 0.0008417508417508417, '足球俱': 0.0016835016835016834, '生活和': 0.0008417508417508417, '(41': 0.0008417508417508417, '征服,': 0.0016835016835016834, '到13': 0.0008417508417508417, '西班牙': 0.003367003367003367, '不勒斯': 0.0008417508417508417, '中心,': 0.0008417508417508417, '卡斯蒂': 0.0008417508417508417, '。隨後': 0.0008417508417508417, '那作爲': 0.0008417508417508417, '\n\n西': 0.0008417508417508417, '那和西': 0.0008417508417508417, '貢王國': 0.0016835016835016834, '基將領': 0.0008417508417508417, 'lia': 0.0016835016835016834, '子路易': 0.0008417508417508417, '哈米爾': 0.003367003367003367, '的父親': 0.0008417508417508417, '工作,': 0.0008417508417508417, '巴塞隆': 0.0008417508417508417, '入聯合': 0.0008417508417508417, '之前巴': 0.0008417508417508417, '著名的': 0.0025252525252525255, '仍有加': 0.0008417508417508417, '的殖民': 0.0008417508417508417, '路易·': 0.0008417508417508417, '第和路': 0.0008417508417508417, '城由古': 0.0008417508417508417, '。\n關': 0.0008417508417508417, '界遺產': 0.0008417508417508417, ' Au': 0.0016835016835016834, '年(大': 0.0008417508417508417, '羅那(': 0.0008417508417508417, '基督教': 0.0008417508417508417, '代表是': 0.0008417508417508417, '堂中。': 0.0008417508417508417, '古羅馬': 0.0008417508417508417, '室與卡': 0.0008417508417508417, '丘)爲': 0.0008417508417508417, '以自己': 0.0016835016835016834, '8年)': 0.0008417508417508417, '爲哈米': 0.0008417508417508417, '張領地': 0.0008417508417508417, '那被美': 0.0008417508417508417, '了今日': 0.0008417508417508417, '、桂爾': 0.0008417508417508417, '特區”': 0.0008417508417508417, '己的姓': 0.0008417508417508417, '國王統': 0.0008417508417508417, '不需要': 0.0008417508417508417, 'a A': 0.0016835016835016834, '利亞半': 0.0016835016835016834, '勝區“': 0.0008417508417508417, '國合併': 0.0008417508417508417, '的西地': 0.0008417508417508417, '但在著': 0.0008417508417508417, '人口約': 0.0008417508417508417, '位置和': 0.0008417508417508417, '羅那伯': 0.003367003367003367, '5百萬': 0.0008417508417508417, '日很多': 0.0008417508417508417, '阿爾-': 0.0008417508417508417, '的帝國': 0.0008417508417508417, '爾公園': 0.0008417508417508417, '巴卡所': 0.0008417508417508417, '的東北': 0.0008417508417508417, '城市的': 0.0008417508417508417, '查理曼': 0.0008417508417508417, '分,獨': 0.0008417508417508417, '塞羅那': 0.015151515151515152, '置和天': 0.0008417508417508417, '但均與': 0.0008417508417508417, '里在內': 0.0008417508417508417, ',但均': 0.0008417508417508417, '国人征': 0.0008417508417508417, '附近的': 0.0008417508417508417, '比利亞': 0.0016835016835016834, '亞半島': 0.0016835016835016834, '最著名': 0.0016835016835016834, '銜由阿': 0.0008417508417508417}

In [52]:
from math import log

test = "緊鄰城市東北部,其最高點第比達博峰海拔512米"
trigrams = [ "".join(x) for x in ngrams(test, 3) ]
frP = 0
enP = 0
for x in trigrams:
    frP += log(frProbs.get(x, frProbs[';;;']))
    enP += log(enProbs.get(x, enProbs[';;;']))

print("French:", frP)
print("English:", enP)


French: -148.68055649837441
English: -172.53217362848147

In [ ]:


In [ ]: