Well, you’ve got a timestamped copy of much of the Web that existed up until latent-diffusion models at archive.org. That may not give you access to newer information, but it’s a pretty whopping big chunk of data to work with.
Hopefully archive.org have measures in place to stop people from yanking all their data too quickly. As least not without a hefty donation or something. As a user it can chug a bit, and I’m hoping that’s the rate-limiting I’m talking about and not that they’re swamped.
That would go against the principal of the archive imo but regardless, if you take away all means of acquiring data freely, you are just giving companies like OpenAI and Google who already have copies of it an insane advantage.
AI isn’t going away, we need to make sure we have free access to it as to not give our whole economy to a handful of companies.
As junk web pages written by AI proliferate, the models that rely on that data will suffer.
Good.
AI making itself sick and worthless after flooding the internet with trash just gives me a warm glow.
interdasting
Garbage in; Garbage out.