Wikipedia’s Automated Content Creation: A Double-Edged Sword
Wikipedia, the free online encyclopedia, has nearly 7 million articles in English. However, its second-largest edition is not in a widely spoken language like French, Spanish, or Chinese, but in Cebuano, a language spoken in the southern Philippines, with just over 6 million articles.
The Cebuano Wikipedia was not created by thousands of volunteer editors but primarily by one person: Swedish linguist Sverker Johansson. Dr. Johansson designed a program called “lsjbot,” which generated millions of articles in several languages, particularly Cebuano. This bot-driven content creation has sparked debates within the Wikipedia community about the role of automation in content generation.
How Lsjbot Works
Lsjbot generates articles by extracting information from online databases, mostly on biology and geography, and fitting the data into pre-written sentence templates. The bot’s core language model consists of a few hundred sentence templates, which it uses to create articles. For instance, an article about an animal might start with the sentence “The X is a Y that belongs to the Z family,” with lsjbot filling in the blanks.
Controversy and Challenges
The use of lsjbot has been controversial, particularly within the Philippine Wikipedia community. Many Cebuano-language pages contained grammatical and factual errors due to imperfect translations. The sheer volume of articles created by the bot also posed maintenance challenges for the community. In 2018, there was even a proposal to delete the entire Cebuano Wikipedia, although it was ultimately rejected.
Irvin Sto. Tomas, a member of the Philippine Wikimedia community, notes that a small group of local Wikipedians has been working to improve the quality of Cebuano pages, including collaborating with Dr. Johansson on lsjbot. However, he acknowledges that volunteer editors alone cannot handle the task.
The Impact of Generative AI
Dr. Johansson distinguishes his bot from generative AI models like ChatGPT, stating that lsjbot “does not produce any new text; it only packages existing information into existing templates.” In contrast, AI large language models can generate new text that may not be reliable. Mr. Lim agrees with this distinction, noting that while the quality of lsjbot’s translations can be debated, the underlying facts are true.
As Wikipedia continues to evolve, the community will need to address the challenges and opportunities presented by automation and generative AI. The risk of model collapse, where AI-generated errors are perpetuated, is a concern. Wikipedians must balance the benefits of automation with the need for accuracy and quality control to maintain the site’s status as a trusted resource.