Technology behemoths like Meta (previously Facebook) are pushing limits in their unwavering quest to power their AI systems, even if it means negotiating the complex seas of copyright law. The New York Times recently reported that Meta was prepared to take a chance on possible copyright disputes in order to find essential data sources for its AI’s training.
There was a clear sense of urgency at Meta in March and April of last year, as executives met virtually every day to plan how to get critical data that would drive the AI arms race. The company looked into a number of options, including audacious plans like acquiring well-known publishing house Simon & Schuster, which was bought by private equity firm KKR for a handsome $1.62 billion in August of last year.
The conversations purportedly covered complete license rights purchases for upcoming book titles at $10 a book, demonstrating the extent to which Meta was prepared to go in order to gather important datasets. This strategy is a reflection of the increasing tendency amongst internet firms to search for data aggressively, which may give rise to copyright difficulties. This is similar to OpenAI’s alleged usage of YouTube to train its video generator, Sora.
Meta has previously branched out into summarizing articles, books, and web information, using African contractors to assemble summaries that included stuff protected by copyright. This strategy sparked ethical questions, since one manager acknowledged that copyrighted data collecting was inevitable.
There was discussion about avoiding formal license agreements and keeping collecting data from possibly copyrighted sources at these discussions. A lawyer expressed ethical concerns, but there was no response, which is a sign of how difficult it is to strike a balance between innovation and morality.
In the end, the decisions made by Meta’s decision-makers were influenced by past legal decisions, particularly the result of the 2015 case Authors Guild v. Google, in which the Supreme Court declined to revisit a judgment made by a lower court. This decision allowed Google to digitize and scan books in accordance with fair use regulations for Google Books. This precedent was used by Meta’s legal team as justification for teaching AI systems according to the same rules.
Meta emphasized the delicate and intricate nature of the subject by delaying answers in response to questions.
The search for data to train these systems will continue to be a difficult problem as technology advances and artificial intelligence becomes more and more integrated. The strategy used by Meta is a prime example of the extent to which tech companies would go—even at the risk of legal complications—in order to preserve their advantage in the field of artificial intelligence. This developing story emphasizes how urgently a comprehensive strategy that strikes a balance between innovation and moral and legal issues is needed.