Article centrality measures in the Wikipedia hyperlink network

In this post we have drawn the network of hyperlinks connecting Wikipedia articles related to climate change. Now we will focus on how to identify the most central articles in the network.

Applying different metrics we can study which issues are more central within our set of articles according to different criteria:

    • In/Out-degree: number of incoming/outgoing links. It measure centrality as the number of connections with other nodes.
    • Pagerank: like in-degree, but the weight of each incoming connection depends on the importance of the corresponding node; weights are computed iteratively. It can be seen as the probability of reaching a node when following a random walk in the graph.
    • Betweenness: number of shortest paths from all vertices to all others that pass through the given node (i.e.: how often the given node lies on the shortest path between a pair of nodes). It quantifies the importance of a node as a bridge between different nodes or groups in the network.
    • Closeness: Average distance from a node to all the other nodes in the network. It represents centrality as the ability to reach the other nodes in few steps.

Here is a table of the 20 most central articles according to pagerank:

The 20 most central articles about climate change sorted by pagerank together with the corresponding values and ranks of other centrality measures.

As we already observed when looking at the network, the most relevant articles appear to be Global warmingGreenhouse gas and Climate change.

It is interesting to note that, although climate change represents a more general topic, Global warming and Greenhouse gas are linked more prominently. Interestingly, the fourth place in the ranking is occupied by the Intergovernmental Panel on Climate Change, followed by the Kyoto Protocol; on position 10 in the list we find as well the UN-Framework Convention on Climate Change. The high relevance of these articles in the network (also reflected by the high number of in-links) witnesses the prominence of international politics and policies in the discourse on climate change.

The rest of the list is dominated by more neutral technical articles like Methane, Fossil fuel, Albedo, Climate, etc. with less potential for conflict. These articles are often mentioned and referenced in other pages, as they seem to correspond to the basic elements on which the debate is built.

When we reorder the table according to the betweenness centrality of the articles we obtain the following list:

The 20 most central articles about climate change sorted by betweenness centrality together with the corresponding values and ranks of other centrality measures.

We observe that the top two articles remain the same while index pages like Index of climate change articles and Glossary of Climate change enter the list. The high betweenness of these articles is due to their extremely high out-degree, as they are essentially collections of links to other (very diverse) articles related to climate change. This makes them bridges in the network, despite their very low in-degree.

On the other hand, we do no more find nodes characterised by high in-degree but very low out-degree, like Methane, Fossil fuel and Albedo, which are heavily referenced in the debate on climate change, but have few links to other articles in this area. Instead, we find other articles related to some debate, like Climate change mitigation, Emission trading, Global warming controversy or Individual and political action on climate change, which have remarkable values of both in- and out-degree. These articles represent relevant and not very specific topics. Therefore they are interlinked with many other articles from distinct areas of the network.

In summary, the pagerank metric seems to give a better measure of the relevance of an article, measured as the probability of ending on that page during an exploration of this section of Wikipedia, while betweenness represents the importance of acting as a bridge, and highlights especially articles which connect diverse topics.

The whole ranking can be found in this csv file:  article_centrality_metrics.csv.

Leave a Reply

Spam protection by WP Captcha-Free