Close Menu
Breaking News in Technology & Business – Tech Geekwire

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech GeekwireBreaking News in Technology & Business – Tech Geekwire
    • New
      • Amazon
      • Digital Health Technology
      • Microsoft
      • Startup
    • AI
    • Corporation
    • Crypto
    • Event
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech Geekwire
    Home ยป Wikipedia Rolls Out Solution to Keep AI Bots at Bay
    AI

    Wikipedia Rolls Out Solution to Keep AI Bots at Bay

    techgeekwireBy techgeekwireApril 26, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email

    Wikipedia Tests New Way to Keep AI Bots Away

    The Wikimedia Foundation has partnered with Google-owned Kaggle to provide Wikipedia content in a “machine-readable format” to prevent AI bots from scraping the site. This move aims to address the issue of AI bots consuming a significant portion of Wikipedia’s bandwidth.

    AI bots are causing more problems than the average human user, as they tend to scrape even the most obscure corners of Wikipedia. The foundation reported that bandwidth for downloading multimedia grew by 50% since January 2024, primarily due to automated programs downloading openly licensed images to feed AI models.

    To tackle this issue, the Foundation and Kaggle have made Wikipedia content available in English and French in a developer-friendly format. “Instead of scraping or parsing raw article text, Kaggle users can work directly with well-structured JSON representations of Wikipedia content,” the foundation explained. This format is ideal for training models, building features, and testing NLP pipelines.

    Kaggle describes the offering, currently in beta, as “immediately usable for modeling, benchmarking, alignment, fine-tuning, and exploratory analysis.” AI developers using the dataset will get access to “high-utility elements” such as article abstracts, short descriptions, infobox-style key-value data, image links, and clearly segmented article sections.

    All the content is derived from Wikipedia and is freely licensed under two open-source licenses: the Creative Commons Attribution-ShareAlike 4.0 and the GNU Free Documentation License (GFDL). Some content may be licensed under alternative terms or be in the public domain.

    This collaborative approach differs from other organizations’ methods of dealing with AI bots. For instance, Reddit has implemented stricter controls to prevent bots from accessing its platform, while The New York Times has sued OpenAI over alleged unauthorized scraping of its articles to train AI models.

    By providing a structured and accessible format for Wikipedia content, the Wikimedia Foundation and Kaggle aim to reduce the strain caused by AI bots and promote more collaborative and transparent AI development.

    AI bots Kaggle natural language processing Wikimedia Foundation Wikipedia
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    techgeekwire
    • Website

    Related Posts

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025

    Andreessen Horowitz Backs Controversial Startup Cluely Despite ‘Rage-Bait’ Marketing

    July 4, 2025

    Invesco QQQ ETF Hits All-Time High as Tech Stocks Continue to Soar

    July 4, 2025

    ContractPodAi Partners with Microsoft to Advance Legal AI Automation

    July 4, 2025
    Leave A Reply Cancel Reply

    Top Reviews
    Editors Picks

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025

    Andreessen Horowitz Backs Controversial Startup Cluely Despite ‘Rage-Bait’ Marketing

    July 4, 2025
    Advertisement
    Demo
    About Us
    About Us

    A rich source of news about the latest technologies in the world. Compiled in the most detailed and accurate manner in the fastest way globally. Please follow us to receive the earliest notification

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Our Picks

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025
    Categories
    • AI (2,696)
    • Amazon (1,056)
    • Corporation (990)
    • Crypto (1,130)
    • Digital Health Technology (1,079)
    • Event (523)
    • Microsoft (1,230)
    • New (9,568)
    • Startup (1,164)
    © 2025 TechGeekWire. Designed by TechGeekWire.
    • Home

    Type above and press Enter to search. Press Esc to cancel.