Close Menu
    Trending
    • Meghan Markle & Prince Harry Mark 7 Year Wedding Anniversary
    • The Costliest Startup Mistakes Are Made Before You Launch
    • Trump Signs Controversial Law Targeting Nonconsensual Sexual Content
    • Museo facilita el regreso de un artefacto maya de la colección de un filántropo de Chicago
    • Eagles extend head coach Nick Sirianni
    • New book details how Biden’s mental decline was kept from voters : NPR
    • Regeneron buys 23andMe for $256m after bankruptcy | Business and Economy
    • Cheryl Burke Blasts Critics, Defends Appearance in Passionate Video
    Messenger Media Online
    • Home
    • Top Stories
    • Plainfield News
      • Fox Valley News
      • Sports
      • Technology
      • Business
    • International News
    • US National News
    • Entertainment
    • More
      • Product Review
      • Local Business
      • Local Sports
    Messenger Media Online
    Home»Technology»Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft
    Technology

    Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft

    DaveBy DaveDecember 12, 2024No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Along with the trove of books, the Institutional Information Initiative can also be working with the Boston Public Library to scan thousands and thousands of articles from totally different newspapers now within the public area, and it says it’s open to forming comparable collaborations down the road. The precise approach the books dataset shall be launched shouldn’t be settled. The Institutional Information Initiative has requested Google to work collectively on public distribution, and the corporate has pledged its help.

    Nonetheless IDI’s dataset is launched, it will likely be becoming a member of a bunch of comparable initiatives, startups, and initiatives that promise to present firms entry to substantial and high-quality AI coaching supplies with out the danger of working into copyright points. Companies like Calliope Networks and ProRata have emerged to concern licenses and design compensation schemes designed to get creators and rightholders paid for offering AI coaching information.

    There are additionally different new public-domain initiatives. Final spring, the French AI startup Pleias rolled out its personal public-domain dataset, Widespread Corpus, which incorporates an estimated 3 to 4 million books and periodical collections, in line with challenge coordinator Pierre-Carl Langlais. Backed by the French Ministry of Tradition, the Widespread Corpus has been downloaded over 60,000 occasions this month alone on the open supply AI platform Hugging Face. Final week, Pleias introduced that it’s releasing its first set of enormous language fashions educated on this dataset, which Langlais instructed WIRED represent the primary fashions “ever educated completely on open information and compliant with the [EU] AI Act.”

    Efforts are underway to create comparable mage datasets as effectively. AI startup Spawning released its personal this summer time known as Supply.Plus, which incorporates public-domain pictures from Wikimedia Commons in addition to quite a lot of museums and archives. A number of vital cultural institutions have lengthy made their very own archives accessible to the general public as standalone initiatives, just like the Metropolitan Museum of Artwork.

    Ed Newton-Rex, a former government at Stability AI who now runs a nonprofit that certifies ethically-trained AI instruments, says the rise of those datasets exhibits that there’s no have to steal copyrighted supplies to construct high-performing and high quality AI fashions. OpenAI beforehand instructed lawmakers in the UK that it could be “impossible” to create merchandise like ChatGPT with out utilizing copyrighted works. “Massive public area datasets like these additional demolish the ‘necessity protection’ some AI firms use to justify scraping copyrighted work to coach their fashions,” Newton-Rex says.

    However he nonetheless has reservations about whether or not the IDI and initiatives like it’s going to truly change the coaching establishment. “These datasets will solely have a constructive influence in the event that they’re used, most likely together with licensing different information, to exchange scraped copyrighted work. In the event that they’re simply added to the combination, one a part of a dataset that additionally consists of the unlicensed life’s work of the world’s creators, they will overwhelmingly profit AI firms,” he says.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSpringfield gets a city planner | News
    Next Article Right Now, You Can Get More Than 310 Hours of IT Training for Just $50
    Dave

    Related Posts

    Technology

    Trump Signs Controversial Law Targeting Nonconsensual Sexual Content

    May 19, 2025
    Technology

    A Silicon Valley VC Says He Got the IDF Starlink Access Within Days of October 7 Attack

    May 19, 2025
    Technology

    12 Ways to Upgrade Your Wi-Fi and Make Your Internet Faster (2024)

    May 19, 2025
    Add A Comment

    Comments are closed.

    Top Posts

    Towards a Trump Tower in Gaza? | Israel-Palestine conflict

    February 5, 2025

    Louder Than Guns – The Assignment with Audie Cornish

    October 25, 2024

    NPR for Donald Trump’s Inauguration Day coverage : NPR

    January 20, 2025

    His Business Sells Over 6k Units Daily — at Up to $25k Each

    January 24, 2025

    IEEE President Kathleen Kramer on Fostering One IEEE

    March 1, 2025
    Categories
    • Business
    • Entertainment
    • Fox Valley News
    • International News
    • Plainfield News
    • Sports
    • Technology
    • Top Stories
    • US National News
    Most Popular

    Army helicopter forces two jetliners to abort DCA landings : NPR

    May 3, 2025

    Carson Hocevar earns pole for Wurth 400 at Texas

    May 3, 2025

    Bulls offseason position analysis: Center of attention this summer

    May 3, 2025
    Our Picks

    Liam Lawson named as new Red Bull F1 driver alongside Verstappen for 2025 | Motorsports News

    December 19, 2024

    Move into March music | Music Features

    February 27, 2025

    No, 150-Year-Olds Aren’t Collecting Social Security Benefits

    February 17, 2025
    Categories
    • Business
    • Entertainment
    • Fox Valley News
    • International News
    • Plainfield News
    • Sports
    • Technology
    • Top Stories
    • US National News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Messengermediaonline.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.