OLPC:Bot Interest Group/Tracking
I've looked at the patterns of vandalism while wiki patrolling and by
conducting an analysis of the block logs and researched how other wikis
are handling the vandalism problem. I wanted to present a few
observations.
Observation 1:
Unlike Wikipedia where very short blocks are used on named users to
provide a "cooling off period" during edit wars or other infringements
of community standards, nearly all blocks imposed on the OLPC wiki can
be attributed to vandalism. This may make our block log an interesting
resource for harvesting a dataset for counter-vandalism bot based on
artificial neural network techniques. Crispy is working on such a bot
as a successor to Cluebot.
Observation 2:
Have you ever wondered how it is that vandalbots pick the pages they
choose to vandalize? Many incidents of single gibberish or page
blanking vandalism seem to be using:
http://wiki.laptop.org/go/Special:Random
as their targetting mechanism, no special pattern emerges in what pages
get hit by these one-off attacks.
Observation 3:
Recently, there has been an increase multipage strafing runs by vandals,
interestingly it seems clear from reviewing the recent changes page for
the period of time just prior to the vandalism, that the vandals are
selecting pages from:
http://wiki.laptop.org/go/Special:Recentchanges
One particularly disturbing variant of this trend is that rather than
targetting the recently edited page, the vandals are attacking the User
and User Talk pages of recent editors.
This switch to multipage vandalism appears to be growing, suggesting an
increase in sophistication in the automation of attacks, more detailed
analysis of individual vandal edit counts will be needed to confirm this
impression, but I believe that counting pages vandalized (and not just
number of vandals blocked) would present an even more disturbing picture
of the attacks on the wiki as multipage vandalism would essential
multiply the recent numbers several fold.
Observation 4:
Vandals use many different IP addresses to make edits, on occasion,
there is a very slightly suspicious, but generally benign looking edit
made by an anonymous IP (say introducing an extra blank line) that is
followed some time later (often days later) by a quick spurt of
vandalism by a whole series of other IP addresses. It remains to be
seen if these "scouting missions" can provide a recognizable pattern for
anticipating further attacks.
As an example this is actually a highly suspicious pattern of edits:
http://wiki.laptop.org/index.php?title=OS_images_for_USB_disks&action=hi
story
One anonymous editor makes a little nonsense edit
http://wiki.laptop.org/go/Special:Contributions/200.243.151.151
which is later corrected by another anonymous editor
http://wiki.laptop.org/go/Special:Contributions/205.209.91.210
What makes this suspicious is that these editors have absolutely no
other edits and these have occurred on a page that is otherwise fairly
static. There may however be no purpose served by preemptive blocking
of these particular IP addresses as it is unlikely that these same IP
numbers will be used again.
Observation 5:
The vast majority of vandalism is performedby anonymous IP addresses
(unfortuantely so are many legitimate edits). Semi-protection is
sometimes a useful technique for pages that attract repeated attention
from vandals.
Observation 6:
There are enough different patterns to suggest that mulitple vandals are
involved.
Observation 7:
Recently there have been a number of multipage vandals that are not
anonymous IP addresses, but rather employ registered user names.
Observation 8:
Individual IP blocks are only temporarily effective. There has been
repeated vandalism by IP addresses when the intial block has expired. In
addition, there are some 4.2 billion IP addresses and a simple IP
blocking strategy must ultimately fail.
Observation 9:
It is generally held that the "community of editors" can address the
damage caused by vandalism and that the "community of sysops" will
collectively fight vandalism with blocks and other tools. The charting
of the block log data shows that this is simply not the case. The
majority of vandalism blocks (in any given time period) are performed
by one or two sysops. This imposes a significant burden on those that
are willing to take on the task. It should be noted that these wiki
"sheriffs" seem to have an unfortunately short term in office. It
would be very disturbing if the vandalism fight is "burning out" sysops
that a) could be making other contributions and b) may just give up when
a successor steps up to the fight.
One discouraging observation was the correction of a single vandalism
edit (that was part of a series) and a block by a sysop editor (that was
an interested editor on the page in question); however, no further
investigation / rollback of the vandal's other edits was performed.
This sort of "free-riding" is a bad sign for a community maintained
resource.
Observation 10:
Whereas one can assume that spam on lang-en Wikipedia seems to be mostly
in lang-en, the OLPC wiki does show signs of multilingual spam attacks.
This may place a high premium on employing techniques with more
sophisticated heuristics than recognition based on lang-en pattern
matching.
Observation 11:
There was a significant spike in vandalism temporally associated with
G1G1. Recent trends indicate an increasing number ofvandalism, I
believe further analysis would very possibly reveal an increase in the
number of pages vandalized per attack. That wouldsuggest that the
grandtotals bymothmay be an underestimate of the real scope of the
vandalism problem in recent months