Matteo Renzi mentions across Italian and Portuguese Wikipedia

Remark: Since interactive plots are present open this link to read the Notebook correctly.

Find articles
Rank articles according to November pageviews
Make comparisons



In [1]:

    
import plotly
from pageviews import *
from wiki_parser import *
import plotly.tools as tls
from helpers_parser import *
from across_languages import *
plotly.tools.set_credentials_file(username='crimenghini', api_key='***')

1. Find articles

The fist goal to achieve is to find all the Italian and Portuguese Wikipedia articles that mention Matteo Renzi. In order to do so, we use the WikiHandler class which goes through the raw data and then keeps and stores the title and the text of the elements of the corpora that mention the Italian (almost) ex-Prime Minister.

In this example we focus on the sets of articles written in Italian and in Portugal, they are collected respectively until 20th November 2016 and 1st December 2016. In general, the code allows you to take into account more than two languages. In the README file, you can find the information related to the collection of data.



In [2]:

    
# Define the path of the corpora
path = '/Users/cristinamenghini/Downloads/'
# Xml file
xml_files = ['itwiki-20161120-pages-articles-multistream.xml', 
             'ptwiki-20161201-pages-articles-multistream.xml']

After having a quick peek at a snippet of the XML. The elements we are interested in are on the child page, which identifies an article. Then we want to get the contents of title and text.

Due to the big size of the XML we opted for a parser which registers callbacks for events of interest and then let the parser proceed through the document. The text of the article has not been preprocessed since for the purpose of our analysis we are not going to analyze the text in itself.

Hence, we proceed to parse the Italian corpus using the parse_articles function stored in the wiki_parser library - it basically activates the parser.



In [4]:

    
# Parse italian corpus
parse_articles('ita', path + xml_files[0], 'Matteo Renzi')

Then move towards the Portuguese one.



In [5]:

    
# Parse portuguese corpus
parse_articles('port', path + xml_files[1], 'Matteo Renzi')

The articles are filtered according to the presence of the mention to Matteo Renzi, those in Italian have been stored in a .json file whose each line corresponds to a page (title, text). The same holds for the articles in Portuguese. The two corpora are automatically stored in the folder Corpus.

                              {"title": "title_1", "text": "text_1"}
                                                    ...
                                                    ...
                              {"title": "title_n", "text": "text_n"}

2. Rank articles according to November pageviews

Once the data has been filtered, we proceed with a simple analysis of the pageviews. In particular, using the article_df_from_json function, all the article titles are extracted from the corpus and then stored in a DataFrame.



In [3]:

    
# Get the df for the Italian articles
df_it_titles = article_df_from_json('Corpus/wiki_ita_Matteo_Renzi.json')

# Get the df for the Portuguese articles
df_pt_titles = article_df_from_json('Corpus/wiki_port_Matteo_Renzi.json')

Take a look at the obtained DataFrame.



In [4]:

    
df_it_titles.sample(5)









    Out[4]:






  
    
      
      Title
    
  
  
    
      101
      Centro-sinistra
    
    
      413
      Viadotto Italia
    
    
      223
      TG5 Prima Pagina
    
    
      403
      Carcere di Santo Stefano
    
    
      498
      Referendum costituzionale del 2016 in Italia

Thus, we extract the number of monthly page views for each article related to the languages of interest (i.e. it and pt) from the page views file - Additional data in the README. To filter the file we use the filter_pageviews_file function and get a dictionary of dictionaries with the following structure (according to our example):

                            {'it':{'Title_1':'No pageviews',
                                           ...
                                   'Title_n':'No pageviews'},
                             'pt':{'Title_1':'No pageviews',
                                           ...
                                   'Title_k':'No pageviews'}}



In [5]:

    
# Page views file
pageviews_file = 'pagecounts-2016-11-views-ge-5-totals'

# Filter the page view file
articles_pageviews = filter_pageviews_file(path + pageviews_file, ['pt','it'])

Thus, a right join between the DataFrames, namely the one obtained from the pageviews and the other obtained from the corpus, is performed. It results that both for the Italian and Portuguese articles there are articles that mention Matteo Renzi that have not been visualized in November. The define_ranked_df function is stored in the pageviews library.



In [6]:

    
# Define the italian ranked article df according to the number of page views
ranked_df_ita = define_ranked_df(articles_pageviews, 'it', df_it_titles)
# Show the df head
ranked_df_ita.head(10)









    



Over the whole number of articles in the corpus  39  have not been visited during the considered period.






    Out[6]:






  
    
      
      Title
      Pageviews
    
  
  
    
      409
      Marco Travaglio
      19795.0
    
    
      146
      Pif (conduttore televisivo)
      19557.0
    
    
      474
      Partito Democratico (Italia)
      11653.0
    
    
      154
      Vittorio Sgarbi
      11324.0
    
    
      226
      Malala Yousafzai
      9452.0
    
    
      433
      Jobs Act
      8908.0
    
    
      274
      Enrico Letta
      7791.0
    
    
      343
      Startup (economia)
      7698.0
    
    
      312
      Marianna Madia
      7608.0
    
    
      155
      Nuovo Centro Congressi
      6894.0



In [7]:

    
# Define the italian ranked article df according to the number of page views
ranked_df_port = define_ranked_df(articles_pageviews, 'pt', df_pt_titles)
# Show the df
ranked_df_port.head(10)









    



Over the whole number of articles in the corpus  4  have not been visited during the considered period.






    Out[7]:






  
    
      
      Title
      Pageviews
    
  
  
    
      33
      Partido Democrático (Itália)
      567.0
    
    
      31
      Lista de chefes de Estado e de governo atuais
      410.0
    
    
      4
      G7
      259.0
    
    
      19
      Federica Mogherini
      215.0
    
    
      7
      G20
      185.0
    
    
      23
      Centro-esquerda
      141.0
    
    
      8
      Privatização
      93.0
    
    
      15
      Lista de chefes de Estado e de governo por dat...
      83.0
    
    
      21
      9.ª reunião de cúpula do G20
      81.0
    
    
      17
      10.ª reunião de cúpula do G20
      70.0

Having a quick glance at the two top 10, we notice:

The number of page views for the Italian articles which mention Matteo Renzi is considerably higher than for those that are written in Portuguese.
The only article that is present in both the top ranking is Partito Democratico (Italia).
It seems that the pages differ in the content: the Portuguese ones are more related to topics that regard the international politics rather the Italians that refer to politics, journalists and public figures.

3. Make comparisons

We now move ahead exploring the data that we preprocessed and trying to figure out something interesting.

We take a look at the number of mentions received in each article. In this contest, it may be possible that Matteo Renzi received more than one mention just because of the presence of references. For instance on this page, if you look up for Matteo Renzi, you will find 2 mentions but one of those just refers to the first. For the moment we do not address this issue.

The DataFrame below- obtained using article_mentions function in this library- shows the number of mentions that Matteo Renzi has received in each article according to both for the Italian and Portuguese corpora. The DataFrames are sorted by the number of mentions so that we get the pages where Matteo Renzi is more "popular".



In [8]:

    
# Italian df of mentions per page
df_it_mentions = article_mentions('Corpus/wiki_ita_Matteo_Renzi.json', 'Matteo Renzi')

# Sort the df by the number of mentions and see the top 5
df_it_mentions = df_it_mentions.sort_values('Number of mentions', ascending = False)

# Show results
df_it_mentions.head(5)









    Out[8]:






  
    
      
      Title
      Number of mentions
    
  
  
    
      37
      Matteo Renzi
      62
    
    
      424
      Governo Renzi
      30
    
    
      195
      Partito Democratico (Italia)
      17
    
    
      492
      Riforma costituzionale Renzi-Boschi
      12
    
    
      214
      Storia del Partito Democratico (Italia)
      12



In [9]:

    
# Portuguese df of mentions per page
df_pt_mentions = article_mentions('Corpus/wiki_port_Matteo_Renzi.json', 'Matteo Renzi')

# Sort the df by the number of mentions and see the top 5
df_pt_mentions = df_pt_mentions.sort_values('Number of mentions', ascending = False)

# Show results
df_pt_mentions.head(5)









    Out[9]:






  
    
      
      Title
      Number of mentions
    
  
  
    
      20
      Matteo Renzi
      11
    
    
      7
      Partido Democrático (Itália)
      6
    
    
      9
      Itália
      3
    
    
      0
      Lista de primeiros-ministros da Itália
      2
    
    
      16
      Lista de viagens presidenciais de Dilma Rousseff
      2

Comparing the two DataFrames we immediately notice that even if the maximum number of mentions that Matteo Renzi received for Italian and Portuguese articles are very different. In the Portuguese corpus there are only two articles that have more than 5 mentions. Thus, can be interesting to visualize the distribution of the mentions both for the IT and PT corpora.

The distributions are represented using the boxplots. They show that for both the languages the 75% of the articles contain no more than 3 mentions of the Italian premier. For the Portuguese corpus stand out two outliers that correspond to Matteo Renzi 11 mentions and Partido Democrático (Itália) 6 mentions, rather for the Italians the number of outliers is bigger and the maximum number of mentions are contained in Matteo Renzi 62 mentions. Moreover, zooming in the boxes, we observe that the two distributions are skewed toward left (number of mentions equal to 1).



In [10]:

    
#boxplot_mentions(df_pt_mentions, df_it_mentions, 'PT', 'IT', 'Number of mentions')
tls.embed("https://plot.ly/~crimenghini/20")









    Out[10]:

In this direction, one aspect that can be considered is the following:

Define how important is Matteo Renzi in the articles that mention him. It requires defining the concept of importance. Intuitively, we would say that higher is the number of mentions more is the importance of our object in the article. Moreover, it may be useful to weight the number of mentions according to the number of words in the article. $$I_{string} = \frac{M}{|D|}$$ Where I is the importance, M is the number of mentions and D the number of words in the document. In this way, whether an article cited Renzi once but it is made up just by a few lines, the string of interest will result more significant.

Moreover, another aspect should be considered, especially when there is only one mention:

The string (i.e. Matteo Renzi) is a pointer to its main page (i.e. Matteo Renzi -> Matteo Renzi). Whether the pointer is present we can imagine that the figure is more important than a page where there is no a hyperlink.

Another thing that can be visualized is the realtionship between the Number of mentions and the Pageviews. In order to do that we first merge the two pageviews and mentions DataFrames.



In [11]:

    
# Merge pageviews and mentions DataFrames for IT
df_it_mension_pageview = pd.merge(df_it_mentions, ranked_df_ita, on=['Title'])

# Show it
df_it_mension_pageview.sample(5)









    Out[11]:






  
    
      
      Title
      Number of mentions
      Pageviews
    
  
  
    
      255
      Faccia a faccia (programma televisivo)
      1
      483.0
    
    
      457
      Fausto Brizzi
      1
      2161.0
    
    
      20
      Ivan Scalfarotto
      4
      1077.0
    
    
      32
      Elezioni amministrative italiane del 2009
      4
      723.0
    
    
      472
      Anonymous
      1
      6.0



In [12]:

    
# Merge pageviews and mentions DataFrames for PT
df_pt_mension_pageview = pd.merge(df_pt_mentions, ranked_df_port, on=['Title'])

# Show it
df_pt_mension_pageview.sample(5)









    Out[12]:






  
    
      
      Title
      Number of mentions
      Pageviews
    
  
  
    
      14
      42.ª reunião de cúpula do G7
      2
      33.0
    
    
      12
      Maria Elena Boschi
      2
      5.0
    
    
      3
      Lista de primeiros-ministros da Itália
      2
      12.0
    
    
      19
      G20
      1
      185.0
    
    
      20
      Lista de líderes do G20
      1
      8.0

A scatterplot is used to get how an article is positioned according to these two variables. The plot shows:

IT: when the mentions are equal to 1 the number of page views is spread between 0 and ~20k. Where the number of mentions increases the number of page visualizations belongs to a smaller range.
PT: also for Portuguese article the same is observed.



In [13]:

    
# def scatter_plot(df_it_mension_pageview, df_pt_mension_pageview, 'Number of mentions', 'Pageviews', 'Italian', 'Portuguese')
tls.embed('https://plot.ly/~crimenghini/36')









    Out[13]:

About these two features, we can think that another way to explore should be the following:

Consider how the number of pageviews of an article changes when the number of Matteo Renzi citations increases from a revision to another. In particular, the importance(I) is re-defined as: $$I = \sum_{t = 1}^{T} \frac{(p_t-p_{t-1}) \times m_t}{|D_t|}$$ Where t is the time of sequential revision of the article, p is the number of page views at time and m is the number of mentions.

Thus we proceed to look for the presence of same articles (in different languages) that mention Matteo Renzi. To do so we make a request for each Portugues Wikipedia page (that cites Renzi) than we parse the HTML source to extract - where available- the title of the IT article related to that the request has been sent. Precisely, the requests are sent for each title of the language that has less article that match Matteo Renzi. The function get_matches is stored in this library.



In [14]:

    
# Built the common articles matches
dict_italian = get_matches(df_pt_titles, 'it')
# Create the inverted one
inverted_dict = {v : k for k, v in dict_italian.items()}



In [15]:

    
print ('The Portuguese articles that mention Matteo Renzi and correspond to an Italian article are: ', len(dict_italian), 
       '. The number of PT articles that have not been matched is: ', len(df_pt_titles)-len(dict_italian), '.')









    



The Portuguese articles that mention Matteo Renzi and correspond to an Italian article are:  31 . The number of PT articles that have not been matched is:  11 .

Proceed to create a DataFrame that contains the information related to those articles.

We extract the titles of all involved articles (both IT and PT).



In [16]:

    
# From the dictionary get the titles of both languages
italian_titles = list(dict_italian.values())
portugues_titles = list(dict_italian.keys())

Before gooing further, we check whether all the matched IT articles mention Matteo Renzi. In order to do so, we run a query on the DataFrame that stores all the IT articles that cite Renzi.



In [17]:

    
# Run the query
match_with_mention = df_it_titles.query('Title in @italian_titles')

# Get the number
print ('There are ', len(portugues_titles)-len(match_with_mention), 'IT articles that do not mention Matteo Renzi.')









    



There are  10 IT articles that do not mention Matteo Renzi.



In [18]:

    
# Re-define the list of IT articles according to the aforementioned "issue"
it_titles_with_mention = list(match_with_mention.Title)

The dictionaries that match the PT and IT titles are re-defined taking into account the fact that some IT do not mention Renzi.



In [19]:

    
# Re-define the two dictionaries 
dict_italian_mentions = {k:v for k,v in dict_italian.items() if v in it_titles_with_mention}
# Define the inverted
inverted_dict_italian_mentions = {v : k for k, v in dict_italian.items()}

# Create the list of titles for PT articles according to the IT that don't mention Renzi
pt_titles_with_mention = list(dict_italian_mentions.keys())

Then, we create a unique DataFrame which contains the mentions in IT an PT articles for the tuple of articles.



In [20]:

    
# Create df for IT mentions
df_match_it_mentions = df_it_mentions.query('Title in @it_titles_with_mention').sort_values('Number of mentions', ascending = False)

# Create df for PT mentions
df_match_pt_mentions = df_pt_mentions.query('Title in @pt_titles_with_mention').sort_values('Number of mentions', ascending = False)

Add a column containing the matches to join the two dfs.



In [21]:

    
# Create new column
new_column_it = ['/'.join([k]+[v]) for i in df_match_it_mentions.Title for k,v in dict_italian_mentions.items()  if i == v]
new_column_pt = ['/'.join([k]+[v]) for i in df_match_pt_mentions.Title for k,v in dict_italian_mentions.items()  if i == k]

# Add the new column to the two dataframes
df_match_it_mentions['Matches'] = new_column_it
df_match_pt_mentions['Matches'] = new_column_pt

Perform the join on the Matches and plot the results.



In [22]:

    
# Join the two dfs on the correspondence tuples
matches_mention = pd.merge(df_match_it_mentions, df_match_pt_mentions, on = 'Matches', suffixes = ('_IT','_PT'))

# Show result
matches_mention.head()









    Out[22]:






  
    
      
      Title_IT
      Number of mentions_IT
      Matches
      Title_PT
      Number of mentions_PT
    
  
  
    
      0
      Matteo Renzi
      62
      Matteo Renzi/Matteo Renzi
      Matteo Renzi
      11
    
    
      1
      Partito Democratico (Italia)
      17
      Partido Democrático (Itália)/Partito Democrati...
      Partido Democrático (Itália)
      6
    
    
      2
      Maria Elena Boschi
      6
      Maria Elena Boschi/Maria Elena Boschi
      Maria Elena Boschi
      2
    
    
      3
      Federica Mogherini
      4
      Federica Mogherini/Federica Mogherini
      Federica Mogherini
      1
    
    
      4
      Presidenti del Consiglio dei ministri della Re...
      3
      Lista de primeiros-ministros da Itália/Preside...
      Lista de primeiros-ministros da Itália
      2



In [23]:

    
# bar_plot(df, 'Matches', 'Number of mentions_IT', 'Number of mentions_P', 'IT', 'PT', 'Compare IT and PT mentions', 
# 'Article','No. mentions', 'color-bar-prova')
tls.embed('https://plot.ly/~crimenghini/38')









    Out[23]:

From the plot:

Among this group of articles, the two that mention Matteo Renzi more result to be the same.

From this kind of analysis a question one can think about is the following:

Given articles in different languages that correspond one to each other, if we are interested in measuring the proximity of these articles, an element that may be considered is the number of common mentions. It is likely that the necessity of quoting s.o./s.t. derives from the fact that the two articles are talking about the same topics that need to refer to the same thing.

The same procedure is repeated for the page views.

Check whether some articles have not been visited.



In [24]:

    
# Run the query
match_with_pageviews_it = ranked_df_ita.query('Title in @italian_titles')
match_with_pageviews_pt = ranked_df_port.query('Title in @portugues_titles')
# Get the number
print ('There are ', len(portugues_titles)-len(match_with_pageviews_it), 'IT articles that have not been visited.')
print ('There are ', len(portugues_titles)-len(match_with_pageviews_pt), 'PT articles that have not been visited.')









    



There are  12 IT articles that have not been visited.
There are  3 PT articles that have not been visited.



In [25]:

    
# Define list of articles that have been visualized
it_titles_with_pageviews = list(match_with_pageviews_it.Title)
pt_titles_with_pageviews = list(match_with_pageviews_pt.Title)

Define the matching dictionaries according to what said above.



In [26]:

    
# Re-define the two dictionaries according to this evidence
dict_italian_pageviews = {k:v for k,v in dict_italian.items() if v in it_titles_with_pageviews}

# PT 
dict_pt_pageviews = {v : k for k, v in dict_italian.items() if k in pt_titles_with_pageviews}



In [27]:

    
# Create df for IT mentions
df_match_it_pageviews = ranked_df_ita.query('Title in @it_titles_with_pageviews').sort_values('Pageviews', ascending = False)

# Create df for PT mentions
df_match_pt_pageviews = ranked_df_port.query('Title in @pt_titles_with_pageviews').sort_values('Pageviews', ascending = False)

Add new variable to allow the join



In [28]:

    
# Create new column
new_column_it = ['/'.join([k]+[v]) for i in df_match_it_pageviews.Title for k,v in dict_italian_pageviews.items()  if i == v]
new_column_pt = ['/'.join([v]+[k]) for i in df_match_pt_pageviews.Title for k,v in dict_pt_pageviews.items()  if i == v]

# Add the new column to the two dataframes
df_match_it_pageviews['Matches'] = new_column_it
df_match_pt_pageviews['Matches'] = new_column_pt



In [29]:

    
df_match_it_pageviews.head()









    Out[29]:






  
    
      
      Title
      Pageviews
      Matches
    
  
  
    
      474
      Partito Democratico (Italia)
      11653.0
      Partido Democrático (Itália)/Partito Democrati...
    
    
      274
      Enrico Letta
      7791.0
      Enrico Letta/Enrico Letta
    
    
      312
      Marianna Madia
      7608.0
      Marianna Madia/Marianna Madia
    
    
      442
      G20 (paesi industrializzati)
      2545.0
      G20/G20 (paesi industrializzati)
    
    
      318
      Giuliano Poletti
      2021.0
      Giuliano Poletti/Giuliano Poletti



In [30]:

    
df_match_pt_pageviews.head()









    Out[30]:






  
    
      
      Title
      Pageviews
      Matches
    
  
  
    
      33
      Partido Democrático (Itália)
      567.0
      Partido Democrático (Itália)/Partito Democrati...
    
    
      31
      Lista de chefes de Estado e de governo atuais
      410.0
      Lista de chefes de Estado e de governo atuais/...
    
    
      4
      G7
      259.0
      G7/G7
    
    
      19
      Federica Mogherini
      215.0
      Federica Mogherini/Federica Mogherini
    
    
      7
      G20
      185.0
      G20/G20 (paesi industrializzati)

Join the two DatFrames with a right join, so that we see also the PT articles that have not been visualised in IT.



In [31]:

    
# Join the two dfs on the correspondence tuples
matches_pageviews = pd.merge(df_match_it_pageviews, df_match_pt_pageviews, how = 'right',on = 'Matches', suffixes = ('_IT','_PT'))
matches_pageviews.fillna(0, inplace =True)
# Show result
matches_pageviews.head()









    Out[31]:






  
    
      
      Title_IT
      Pageviews_IT
      Matches
      Title_PT
      Pageviews_PT
    
  
  
    
      0
      Partito Democratico (Italia)
      11653.0
      Partido Democrático (Itália)/Partito Democrati...
      Partido Democrático (Itália)
      567.0
    
    
      1
      Enrico Letta
      7791.0
      Enrico Letta/Enrico Letta
      Enrico Letta
      9.0
    
    
      2
      Marianna Madia
      7608.0
      Marianna Madia/Marianna Madia
      Marianna Madia
      6.0
    
    
      3
      G20 (paesi industrializzati)
      2545.0
      G20/G20 (paesi industrializzati)
      G20
      185.0
    
    
      4
      Giuliano Poletti
      2021.0
      Giuliano Poletti/Giuliano Poletti
      Giuliano Poletti
      7.0

We use a bar plot to visualize the results.



In [32]:

    
# bar_plot(df, 'Matches', 'Pageviews_IT', 'Pageviews_PT', 'IT', 'PT', 'Compare IT and PT pageviews', 'Article',
# 'No. pageviews', 'color-bar-pvs')
tls.embed('https://plot.ly/~crimenghini/40')









    Out[32]:

From the plot:

The page with the highest visits are the same.
In general, it seems that the PT pages that mention Matteo Renzi are related to general topic and politic figures on the international stage.

It can be interesting:

To present the same plot using the relative frequencies of the visit to see the importance of the page respect the list of articles (that mention Renzi) in that language.

To study the relationships between the articles that mention Renzi. In particular, whether they are connected and point to each other. It may be used for define the importance of Matteo Renzi in an article (i.e. Matteo Renzi mentioned on the page of a TV show (just because he has been a guest), whether the page doesn't result to be connected to other articles it is possible to assume that Renzi in not the main topic of the article). I'm not totally sure it can be done, since moving from an article to another (even if the talk about an extremely different topic) does not need many hops.

	Title
101	Centro-sinistra
413	Viadotto Italia
223	TG5 Prima Pagina
403	Carcere di Santo Stefano
498	Referendum costituzionale del 2016 in Italia

	Title	Pageviews
409	Marco Travaglio	19795.0
146	Pif (conduttore televisivo)	19557.0
474	Partito Democratico (Italia)	11653.0
154	Vittorio Sgarbi	11324.0
226	Malala Yousafzai	9452.0
433	Jobs Act	8908.0
274	Enrico Letta	7791.0
343	Startup (economia)	7698.0
312	Marianna Madia	7608.0
155	Nuovo Centro Congressi	6894.0

	Title	Pageviews
33	Partido Democrático (Itália)	567.0
31	Lista de chefes de Estado e de governo atuais	410.0
4	G7	259.0
19	Federica Mogherini	215.0
7	G20	185.0
23	Centro-esquerda	141.0
8	Privatização	93.0
15	Lista de chefes de Estado e de governo por dat...	83.0
21	9.ª reunião de cúpula do G20	81.0
17	10.ª reunião de cúpula do G20	70.0

	Title	Number of mentions
37	Matteo Renzi	62
424	Governo Renzi	30
195	Partito Democratico (Italia)	17
492	Riforma costituzionale Renzi-Boschi	12
214	Storia del Partito Democratico (Italia)	12

	Title	Number of mentions
20	Matteo Renzi	11
7	Partido Democrático (Itália)	6
9	Itália	3
0	Lista de primeiros-ministros da Itália	2
16	Lista de viagens presidenciais de Dilma Rousseff	2

	Title	Number of mentions	Pageviews
255	Faccia a faccia (programma televisivo)	1	483.0
457	Fausto Brizzi	1	2161.0
20	Ivan Scalfarotto	4	1077.0
32	Elezioni amministrative italiane del 2009	4	723.0
472	Anonymous	1	6.0

	Title	Number of mentions	Pageviews
14	42.ª reunião de cúpula do G7	2	33.0
12	Maria Elena Boschi	2	5.0
3	Lista de primeiros-ministros da Itália	2	12.0
19	G20	1	185.0
20	Lista de líderes do G20	1	8.0

	Title_IT	Number of mentions_IT	Matches	Title_PT	Number of mentions_PT
0	Matteo Renzi	62	Matteo Renzi/Matteo Renzi	Matteo Renzi	11
1	Partito Democratico (Italia)	17	Partido Democrático (Itália)/Partito Democrati...	Partido Democrático (Itália)	6
2	Maria Elena Boschi	6	Maria Elena Boschi/Maria Elena Boschi	Maria Elena Boschi	2
3	Federica Mogherini	4	Federica Mogherini/Federica Mogherini	Federica Mogherini	1
4	Presidenti del Consiglio dei ministri della Re...	3	Lista de primeiros-ministros da Itália/Preside...	Lista de primeiros-ministros da Itália	2

Matteo Renzi mentions across Italian and Portuguese Wikipedia

Table of contents

1. Find articles

2. Rank articles according to November pageviews

3. Make comparisons