Computer scientists in China have devised a software algorithm based on Bayesian statistics that can automatically check a Wikipedia entry and rank it by its quality. Bayesian analysis is commonly used to assess the content of emails and determine the probability that the content is spam or junk mail, and if so, filter it from the user’s inbox if the probability is high.
Writing in the International Journal of Information Quality, Jingyu Han and Kejia Chen of Nanjing University of Posts and Telecommunications say they have used a dynamic Bayesian network (DBN) to analyze the content of Wikipedia entries in a similar manner.
They also apply multivariate Gaussian distribution modeling to the DBN analysis, which gives them a distribution of the quality of each article so that entries can be ranked. Very-low-ranking entries could be flagged for editorial attention to raise the quality. By contrast, high-ranking entries could be marked as the definitive entry in some way, so that such an entry is not subsequently overwritten with lower quality information.
Outperforming human users:
The team has tested its algorithm on sets of several hundred articles, comparing their automated quality assessment with assessment by a human user. Their algorithm outperforms a human user by up to 23 percent in correctly classifying the quality rank of a given article in the set, the researchers report.
The use of a computerized system to provide a quality standard for Wikipedia entries would avoid the subjective need to have people classify each entry. That could improve the standard and provide a basis for an improved reputation for the online encyclopedia.
— Wikipedia (hopefully, this entry is ranked as quality.)