Measuring controversy in Wikipedia via counting reply chains

In this post we present another measure of controversy, based on the chains of mutual responses between users. The metric has been introduced by Laniado et al (2011) in an article focused on conversations in Wikipedia.

Each Wikipedia article can have a talk page associated to it, i.e. a space for discussion on how to improve its content. Talk pages are just simple wiki pages, but they are used in a forum-like way, as it can be seen in a screenshot from the talk page related to the article Presidency of Barack Obama.

Talk page

Talk page for the Wikipedia article "Presidency of Barack Obama"

The discussion related to each article can be visualized as a tree, where the red root node symbolizes the article itself, and gray nodes represent structural elements such as subpages or thread headlines. Comments are represented as orange nodes (cyan if they are unsigned), having for parent the comment to which they reply, or the structural node representing the thread or subpage they are placed under.

Discussion tree

Discussion tree for the article "Presidency of Barack Obama"

Chains of mutual replies between a pair of users can be studied as one indicator of controversy in the discussions, following the intuition that this behavioral pattern tends to emerge in case of conflict.

Laniado et al (2011)  define as chains all subthreads composed of at least three consecutive comments involving only two users who reply to each other. For example, if user B replies to a comment by user A, and user A replies back, we have a chain of length 3: A ← B ← A. As an example, the following figure shows a thread from the discussion about the article Global Warming, containing a chain of length 5 (involving the users James S. and Kim D. Petersen).

A discussion thread from the article "Global warming". The thread contains a chain of length 5 involving users James S. and Kim D. Petersen.

While the total number of comments is the basic measure for the size of a discussion, the number of chains can be leveraged to quantify contention. Applying this metric to the whole English Wikipedia it is possible to identify articles characterized by conflictive discussions. Here is the list of the top 20 controversial Wikipedia articles according to their number of discussion chains:

Top 20 Wikipedia articles by number of discussion chains

Top 20 Wikipedia articles by number of discussion chains. Also other indicators are reported (in parenthesis the rank of each article according to the corresponding indicator). These results are based on a complete dump of the English Wikipedia dated March 2010.

For each article, also other metrics are reported (with the corresponding rank in parenthesis): the total number of comments, the number of distinct users participating in the discussion, the depth of the longest thread (max. depth) and the h-index of the discussion tree (see previous post). The last column shows the number of edits received by each article.

As it can be observed, the most disputed articles include topics which aroused wide discussions, such as Barack Obama or Gaza War, but also less known issues like Chiropractic, where a considerably lower number of users generated a huge amount of discussion chains. Regarding the EMAPS project it is also interesting to observe the presence of topics  like Global Warming and Climatic Research Unit hacking incident in this list.

Reference

Laniado D., Tasso R., Volkovich Y. and Kaltenbrunner A. (2011).
When the Wikipedians Talk: Network and Tree Structure of Wikipedia Discussion Pages.
ICWSM 2011 – 5th International AAAI Conference on Weblogs and Social Media, Barcelona, Spain.

3 Responses to “Measuring controversy in Wikipedia via counting reply chains”

  1. Counting reply-chains is an extremely interesting method to measure how controversial are different Wikipedia pages.
    The measure is, however, vulnerable to ‘flame wars’, long squabbles between few (often just two) users that get involved in personal quarrels. Although, flame wars are discouraged by the Wikipedia administrators, they may still false the results on some articles.
    I have had a look at the Laniado et al. article quoted in the post and (if I understand it correctly) authors have also taken into account the number of editors of the page. It is not clear to me if they used this information to ‘correct’ the above measure.
    The number of chain divided by the number of discussants would be a more solid indicator of controversiality.

  2. As discussed in our last skype meeting, it would be very interesting to have the measure discussed in this post for the pages concerning our controversies.
    Below is a list of them.
    If possible, it would also be nice to expand this list by scraping all the links present of these pages and directed to other wikipedia pages. Then we can add to the list all the pages that are cited by at least two pages of the original list. This would make sure that we considered all the relevant pages.

    Controversy
    http://en.wikipedia.org/wiki/Adaptation_to_global_warming
    http://en.wikipedia.org/wiki/Adaptive_capacity
    http://en.wikipedia.org/wiki/Extreme_weather
    http://en.wikipedia.org/wiki/Adaptability
    http://en.wikipedia.org/wiki/Maladaptation
    http://en.wikipedia.org/wiki/Resilience_(ecology)

    Related controversies
    http://en.wikipedia.org/wiki/Climate_change_mitigation
    http://en.wikipedia.org/wiki/Geoengineering
    http://en.wikipedia.org/wiki/Climate_bond
    http://en.wikipedia.org/wiki/Shutdown_of_thermohaline_circulation
    http://en.wikipedia.org/wiki/Ocean_acidification
    http://en.wikipedia.org/wiki/Clean_Development_Mechanism
    http://en.wikipedia.org/wiki/IPCC_Third_Assessment_Reporthttp://en.wikipedia.org/wiki/IPCC_Fourth_Assessment_Report
    http://en.wikipedia.org/wiki/Carbon_sink
    http://en.wikipedia.org/wiki/Carbon_emissions
    http://en.wikipedia.org/wiki/Carbon_neutrality
    http://en.wikipedia.org/wiki/Carbon_credit
    http://en.wikipedia.org/wiki/Emissions_trading
    http://en.wikipedia.org/wiki/Climate_change_mitigation_scenarios
    http://en.wikipedia.org/wiki/Kyoto_Protocol
    http://en.wikipedia.org/wiki/Greenhouse_gas
    http://en.wikipedia.org/wiki/Carbon_tax
    http://en.wikipedia.org/wiki/Low-carbon_economy
    http://en.wikipedia.org/wiki/Paleoclimatology
    http://en.wikipedia.org/wiki/Climatic_Research_Unit_email_controversy
    http://en.wikipedia.org/wiki/Anthropogenic_global_warming

    Meta-controversies
    http://en.wikipedia.org/wiki/Global_warming
    http://en.wikipedia.org/wiki/Climate_change
    http://en.wikipedia.org/wiki/Climate_variability
    http://en.wikipedia.org/wiki/Intergovernmental_Panel_on_Climate_Change
    http://en.wikipedia.org/wiki/UNFCCC
    http://en.wikipedia.org/wiki/Effects_of_global_warming
    http://en.wikipedia.org/wiki/Economics_of_global_warming
    http://en.wikipedia.org/wiki/Economics_of_climate_change_mitigation
    http://en.wikipedia.org/wiki/Scientific_opinion_on_global_warming
    http://en.wikipedia.org/wiki/Global_climate_model
    http://en.wikipedia.org/wiki/Politics_of_global_warming
    http://en.wikipedia.org/wiki/Global_warming_conspiracy_theory
    http://en.wikipedia.org/wiki/Climate_change_denial
    http://en.wikipedia.org/wiki/Long-term_effects_of_global_warming
    http://en.wikipedia.org/wiki/Global_warming_controversy
    http://en.wikipedia.org/wiki/Category:Global_warming
    http://en.wikipedia.org/wiki/Portal:Global_warming
    http://en.wikipedia.org/wiki/Index_of_climate_change_articles
    http://en.wikipedia.org/wiki/Glossary_of_climate_change

    Sub-controversies
    http://en.wikipedia.org/wiki/Climate_change_and_agriculture
    http://en.wikipedia.org/wiki/Environmental_migrant
    http://en.wikipedia.org/wiki/Climate_Vulnerability_Monitor
    http://en.wikipedia.org/wiki/Climate_Vulnerable_Forum
    http://en.wikipedia.org/wiki/Sea_level_rise
    http://en.wikipedia.org/wiki/Compensation_principle
    http://en.wikipedia.org/wiki/Regional_effects_of_global_warming

  3. Thank you for your comments.
    Concerning reply-chains, we have actually analysed two measures: the first one is the total number of messages belonging to reply-chains. We found this measure to be potentially sensitive to the presence of single long discussion threads, so we preferred to just count the number of chains as a more robust indicator. We believe that dividing this indicator by the number of users involved in the discussion would be counterproductive, as for example a page with just two users replying to each other would have a very high value.

Leave a Reply

Spam protection by WP Captcha-Free