Identifying more Wikipedia articles related to Climate change

In two previous posts (post 1, post 2) we have reported some metrics about a sample of 53 Wikipedia articles related to Climate Change, manually selected by Tommaso Venturini. The list contains probably the main articles related to this topic; however, given the wideness of Wikipedia we can suppose that many more articles concerning Climate Change exist, and it is hardly feasible to manually collect all of them. In this post we explain how we have expanded this list in a semi-automatic way, relying on Wikipedia’s category structure.

In Wikipedia, each article can be assigned to one or more categories, and each category can in turn be assigned to higher level categories. This can be achieved by any user just by inserting a special tag into a page.

Articles are usually not assigned directly to high level categories, to make the category structure usable: what if you had thousands of articles directly assigned to “Natural Sciences”? It would be impossible to make sense of the categories. Instead, most articles are only assigned to lower level categories, and these are in turn assigned to higher level categories.

For example, the article Impacts of Climate Change on Sri Lanka belongs to the category Climate Change in Sri Lanka; following the hierarchical links upwards, a path can be found until Natural sciences:
Climate change in Sri Lanka -> Climate change by country -> Climate change by region ->  Climate change -> Climatology -> Atmospheric sciences -> Earth sciences -> Natural sciences

So, it is natural to suppose that, starting from a given category, such as Climate Change, one could identify all the articles assigned to it or to its direct or indirect subcategories.
As the category graph is maintained by the community, and the sub-category relationship is interpreted in different ways (ontologic, thematic…) this can be problematic: the graph contains over 500 thousands categories, and has been shown to contain inconsistencies and even loops (90 strongly connected components, according to [Farina et al., 2011]). Moreover, topics are often overlapping,  so the task of isolating a whole category is unfeasible in the case of broad categories such as Politics or Culture, where boundaries with other categories such as History, Geography or Religion are fuzzy. However, for smaller categories corresponding to reasonably delimitable topics, it is possible to follow this approach.

The following figure  shows the Wikipedia page for Category Climate Change, where the hierarchy of subcategories has been partly expanded.

We automatically processed the sub-graph of the categories situated under Climate Change, and we collected all the articles belonging to these categories. In order to avoid including unrelated branches, we needed to remove a few categories:

National Oceanic and Atmospheric Administration‎
Climate forcing agents
Greenhouse gases
Carbonated drinks‎
Energy by region‎
Electric power by region

Note that we had to remove these categories because they included subcategories or articles which are not related to Climate Change; however, the articles which are related to Climate Change are very likely to belong to other categories, which we include instead. For example, we exclude the category Climatologists, because not all climatologists have taken a position about Climate Change, but the ones who have are probably included also in other categories, such as Climate change environmentalists.
We also removed all the categories whose name follows the pattern “Energy in <country>”

This way, we collected 915 article titles. Furthermore, we found a Wikipedia page containing a manually redacted list of articles related to Climate Change; out of the 245 titles contained, 105 were already in our list, while 140 were not, and we added them.

Finally, we compared the resulting list with the one prepared by Tommaso, and we found that out of 53 articles, 45 were already in our list, and 8 were missing:

Extreme weather
Carbon neutrality
Adaptive capacity
Compensation principle
Resilience (ecology)

After integrating them, we have a final list of 1063 articles related to Climate Change. Of course, the list can be enriched or cleaned further, and any suggestion is welcome.

As of May 23th, 2012, only 495 of the articles in our list had received at least 2 comments. The following Table lists the 58 articles with the largest number of discussion chains.

Top 58 articles with the largest number of Discussion chains

Several measures for Wikipedia articles related to the climate change controversy (in parenthesis the rank of the corresponding values in a set of 495 articles related to climate change). These results are extracted from the English Wikipedia as of May 23rd, 2012.

We notice several articles among the most discussion which were not covered by the previous list. Most notably:

- List of scientists opposing the mainstream scientific assessment of global warming
- Hockey stick controversy
- Gore Effect
- Greenhouse effect

Articles about books, films, documentaries related to the climate change as well about notable persons in the debate are also very prominently placed in this list.


The entire table can be found in pdf format here and as cls here.

The complete list of article titles (including the ones with no talk page) can be found here.


Farina, J., Tasso, R., Laniado, D. (2011).
Automatically assigning Wikipedia articles to macro-categories,
HT 2011: 22nd ACM Conference on Hypertext and Hypermedia, June 2011, Eindhoven, The Netherlands.

