Time Evolution of Controversies on Wikipedia

Following up on one of our first blog entries we will introduce here briefly a study about the growth in complexity of discussions. A preprint of the study is available at arXiv.

The idea is to use a similar definition to the m-index introduced already by  J. E. Hirsch in his seminal paper about the h-index (the m-index of  a researcher  with and h-index of θ and who has first published a paper n years ago is m = θ/n).

To measure the growth in complexity we use the inverse of this definition.  First we refresh the definition of the h-index of a nested discussion.

The h-index of a discussion is the maximal number θ such that there are at least θ comments at level (depth) θ, but not θ + 1 comments at level θ + 1. Another possible definition would be that there are θ sub-threads of depth at least θ. (See here for a visual example).

We define then Δh as the average time (measured in days) it takes a discussion to increase its h-index by one.

The following figure gives an example of the evolution of the increase of the h-index if the discussion about the three most recent US-presidents.

Evolution of the increase of the h-index for the three most recent US-presidents.

We observe a more or less constant growth of the discussions, validating the linearity assumed in the definition of Δh. Note that the date of the older comments in Wikipedia could not always be determined due to format issues. This explains why the curves do not start for all articles at h = 1.

We observe that the articles about the two presidents in office during the time since Wikipedia has been operative experience a considerable faster growth than the page about their predecessor Bill Clinton. For George W. Bush we observe an average ∆h of 70.7 days, for Barack Obama this value is 90.2 days, while the article about Bill Clinton takes on average 331.9 days to increase its h-index by one.

Now what is the result if we apply this measure to all discussion. Which will be the discussions gaining fastest in complexity and which the slowest? The following two tables answer these questions.

The 20 fastest discussions, ∆h and duration are given in days.

We observe that many of the fastest evolving discussions appear around articles related to events which received heavy news coverage, such as school shootings (the Virginia Tech massacre and its author which occupy ranks 1 and 5 in Table 6), the 2009 flu pandemic, terrorist attacks, air crashes, etc. Nevertheless we find also topics which reflect ideological or ethical motivated disputes among the Wikipedia editors which lead to discussion gaining complexity very fast. Such topics are the “Bronze Solder of Tallinn” (reflecting an Estonian conflict between ethnic Russians and Estonians), “the 2009 Honduran constitutional crisis” as well as discussions about the “Israeli occupied territories” and the “International status of Abkhazia and South Ossetia”. We also find with the “Climatic Research Unit hacking incident” a topic related to the climate change in the top 20 list.

The 20 slowest discussions, ∆h and duration are given in days.

Finally the list of the slowest evolving discussions is led by the articles about “Christopher Columbus” and “Pi” and contains many more articles about timeless content or content which has been subject of discussion over prolonged time such as “Harry Potter” or the “War on Terrorism”. Some of these topics may well be topics of century long dispute such as the “Scientific method” or “On the Origin of Species”.

These findings show considerable differences on the time dimension of a discussion on Wikipedia. Events  of actuality lead to very complex discussions within a few days, while on the other hand,  articles about historical or scientific facts which are not on the people’s minds it may take years to reach similar states.

References

Kaltenbrunner A., Laniado, D. (2012).
There is No Deadline – Time Evolution of Wikipedia Discussions,
(e-print arXiv:1204.3453)

J. E. Hirsch (2005)
An index to quantify an individual’s scientific research output.
PNAS 102 (46) 16569-16572

Leave a Reply

Spam protection by WP Captcha-Free