Artificial intelligence-powered systems are truly impressive, but what datasets are they trained on? OpenAI has kept the answer to the question behind closed doors, and now
YouTube has issued a warning to the company ahead of its release of Sora, its AI-powered text-to-video generation tool. Creators of AI models use large amounts of data to successfully train their tools into whatever they are designed for. However, there is a major problem with simply grabbing data off the internet and using it to train an AI model that will potentially be used to generate money - copyrighted IP. This problem isn't new, as The
New York Times and Getty Images have already filed lawsuits against AI creators for the theft of copyrighted data used to train models that are then used to generate profit. The copyright debate regarding AI models heated up again in March when OpenAI CTO Mira Murati told The Wall Street Journal that she wasn't sure if Sora's training included data from YouTube,
Instagram, or
Facebook. Now, in an interview with Bloomberg Originals, YouTube CEO Neal Mohan reminded OpenAI that any kind of data taken from the platform and used to train AI models is strictly against the platform's terms of service.