we appear to be the first to write up the outrage coherently too. much thanks to the illustrious @self
Mistral isn’t trained on copy righted data. It’s based off selective databases that were open use. This article in general is full of false information. But I suppose most people only read the headlines.
https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/8#6527a6fca6eaf92e6c26fa59
Unfortunately we’re unable to share details about the training and the datasets (extracted from the open Web) due to the highly competitive nature of the field.
The “open web” is full of copyrighted material.
was this incorrect? https://www.patronus.ai/blog/introducing-copyright-catcher