Technology

Comment: Questions about little-known US web scraper could mean increased legal scrutiny for AI training data

By Mike Swift

Published 10 Feb 2024 00:53
Modified 10 Feb 2024 00:53

A large chunk of the data used to train AI systems such as ChatGPT comes from a little-known nonprofit with no paid employees whose address comes back to a nondescript building beside a parking deck in Beverly Hills, California.
While its 9.5 petabytes of data scraped from the web have

To view the latest version of this document and thousands of others like it, sign-in to MLex or register for a free trial.

TAKE A FREE TRIAL

Mike Swift

Chief Global Digital Risk Correspondent

Mike Swift is an award-winning journalist who has been at the forefront of covering data, privacy and cybersecurity regulatory news for more than a decade. As the Chief Global Digital Risk Correspondent for MLex, in addition to reporting, he coordinates MLex’s worldwide coverage in the practice area. Formerly chief Internet reporter for the San Jose Mercury News and SiliconValley.com, Mike has covered Google, Facebook, Apple, Microsoft, Twitter and other tech companies and has closely tracked technology and regulatory trends in Silicon Valley. He has wide ranging expertise from the business of professional sports to computer-assisted reporting. A former John S. Knight Fellow at Stanford University, he is a graduate of Colby College.

Comment: Questions about little-known US web scraper could mean increased legal scrutiny for AI training data

To view the latest version of this document and thousands of others like it, sign-in to MLex or register for a free trial.

Chief Global Digital Risk Correspondent

Discover MLex

Stay on top of global regulatory developments

Latest News

Company

Coverage

Contact Us

POLICIES