OpenAI Is Using Media Websites To Train ChatGPT But CNN Says The Bot Needs A Paid License

Elicio Ember, https://www.flickr.com/photos/hlokenende/ https://creativecommons.org/licenses/by-sa/2.0/

Written by Dana Sanchez

Feb 20, 2023

Computational journalist Francesco Marconi asked the interactive chatbot and viral sensation ChatPT for a list of news sources it was trained on and ChatGPT replied with the names of 20 news outlets including Bloomberg, TechCrunch and Forbes.

The Wall Street Journal, CNN and others among the named media outlets believe OpenAI and its ChatGPT chatbot should be paying them to use their news articles in the training of the artificial intelligence software tool, Bloomberg reported.

Among the top criticisms of ChatGPT is its lack of transparency and that it does not currently name its sources for the information it sends out, inviting accusations that it’s just making things up.

OpenAI is the San Francisco startup that developed ChatGPT, which is financed by Microsoft.

Renowned linguist and cognitive scientist Noam Chomsky describes ChatGPT as a form of “high tech plagiarism.”

Many schools have banned ChatGPT, worrying students will use it to take tests or do their homework. Others have predicted that the technology will take over journalists’ jobs. ChatGPT has been accused of spreading misinformation. In recent weeks, publications including CNET and Men’s Journal have been forced to correct AI-written articles that were riddled with errors, Bloomberg reported.

ChatGPT is trained on a large amount of news data from top sources that fuel its AI. It's unclear whether OpenAI has agreements with all of these publishers. Scraping data without permission would break the publishers' terms of service. pic.twitter.com/RXEjMHWXiI
— Francesco Marconi (@fpmarconi) February 15, 2023

“Anyone who wants to use the work of Wall Street Journal journalists to train artificial intelligence should be properly licensing the rights to do so from Dow Jones,” said Jason Conti, general counsel for News Corp.’s Dow Jones unit, in a statement to Bloomberg News. “Dow Jones does not have such a deal with OpenAI.”

An array of media are subject to copyright protection, including news articles and photos. News licensing is the process of gaining copyright permission to reuse or republish news stories.

Computational journalism draws on technical aspects of computer science including AI, content analysis, social computing and information science.

“We take the misuse of our journalists’ work seriously, and are reviewing this situation,” Conti added. Dow Jones is the publisher of the Wall Street Journal.

It wouldn’t be the first time a news organizations questioned whether its content is being used without authorization by AI systems. In November, GitHub, Microsoft Corp. and OpenAI were sued in a case that alleged a tool called GitHub Copilot was essentially plagiarizing human developers in violation of their licenses, according to Bloomberg.

ChatGPT defines plagiarism as the act of using someone else’s work or ideas without giving proper credit to the original author, Sofia Barnett reported for Wired. “But when the work is generated by something rather than someone, this definition is tricky to apply.”

“If [plagiarism] is stealing from a person, then I don’t know that we have a person who is being stolen from,” said Emily Hipchen, a board member of Brown University’s Academic Code Committee.

In January, a group of artists sued AI generators Stability AI Ltd., Midjourney Inc. and DeviantArt Inc., claiming those companies downloaded and used billions of copyrighted images without the permission of the artists and without paying them.

CNN, owned by Warner Bros. Discovery Inc., plans to reach out to OpenAI about being paid to license its content, according to a person familiar with the matter. Using its articles to train ChatGPT violates CNN’s terms of service, the person said.