US Media Giants Block AI Crawlers to Protect Copyrights

Highlighting the tension between media companies and AI (AI) technology, a recent Wired report revealed that 88% of leading news outlets in the United States are actively blocking AI web bots. This move led by concerns over copyright infringement and uncompensated use of content, reflects growing opposition within the media industry to data collection activities by AI entities.

The survey, conducted by Ontario-based AI discovery startup Originality AI, covered 44 top news sites, including well-known organizations such as The New York Times, The Washington Post and The Guardian. These media houses have been found to have initiated measures to curb the data collection activities of AI companies. OpenAI’s GPTBot has been identified as the most widely blocked bot, with many media companies implementing restrictions, especially after OpenAI’s announcement in August 2023 that its bot would respect the robots.txt flags used by websites to control access to the web robot.

This escalating conflict reached a new high last December when The New York Times filed a lawsuit against OpenAI. The lawsuit alleges copyright infringement due to unauthorized use of published works by OpenAI to train chatbots. The New York Times claims that millions of its articles have been used in educational chatbots that now serve as alternative sources of information, potentially undermining the credibility and financial sustainability of traditional media. The media giant is seeking billions of dollars in statutory and actual damages, marking a pivotal moment in the legal landscape surrounding AI and media.

During a hearing of the Judiciary Committee’s Privacy and Technology Subpanel, a panel of witnesses representing local and national media organizations urged lawmakers to step in and prevent AI companies from using copyrighted news content without proper credit or compensation . They argued that AI companies were using the “fair use” provision of US intellectual property law to justify training their models on copyrighted news material. However, this interpretation of the fair use statute is contested by media organizations who argue that the use of their content to train AI models exceeds established legal safeguards.

As media companies strengthen their defenses against AI bots, the dispute highlights the complex interplay between technological advances and content protection. It raises critical questions about the future of information dissemination, journalistic integrity, and the democratization of knowledge in an age of technological disruption.

This development has implications beyond the immediate legal battles and technical measures. It addresses fundamental questions about the role and impact of AI in the media landscape, highlighting the need for a balanced approach to innovation and accountability in the digital age.

Image source: Shutterstock

Leave a Comment Cancel reply