Why are there so many new AI-powered tools that scrape websites?(lemmy.world)

posted 23 days ago

DuckWrangler9000@lemmy.world

technology@lemmy.world

15 commentshide report

I feel like every day I come across 15-20 "AI-powered tool"s that “analyze” something, and none of them clearly state how they use data. This one seems harmless enough, put a profile in, it will scrape everything about them, all their personal information, their location, every post they ever made… Nothing can possibly go wrong aggregating all that personal info, right? No idea where this data is sent, where it’s stored, who it’s sold to. Kinda alarming

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

DuckWrangler9000@lemmy.worldOP

-9 points

22 days ago

A toy like that is easy to create and not that expensive to offer.

Right, and the developers of Bsky didn’t think to maybe block something that scrapes all that personal information?

permalink

report

parent

[ - ]

Todd Bonzalez@lemm.ee

2 points

22 days ago

Deleted by creator

permalink

report

parent

[ - ]

General_Effort@lemmy.world

7 points

22 days ago

If that’s what you want, you should join Facebook.

The fundamental thing to understand is that the internet - and really all information processing - is about copying. There is no such thing as “looking” at a profile or a post. The text and image data is downloaded to your device. You end up with multiple copies on your device.

Sending information out, but blocking people from storing it, is fundamentally a contradiction in terms.

Bsky - like Lemmy - made the choice to make the data widely available. It is available via API and does not need to be scraped. The alternative is to do it like Reddit or even Facebook or Discord. But they can’t stop scraping, either. They can make it slower and more laborious but not stop it. Services like Facebook protect the data as best as they can to “protect your privacy”. In reality, it’s about making it hard for you to leave the platform or anyone else to benefit from your data. Either way, you can trust Zuck to protect your data as if it was his own. Because it is.

permalink

report

parent

[ - ]

VerPoilu@sopuli.xyz

16 points

22 days ago

Like Lemmy or Mastodon, BlueSky was made with the idea of federation. While BlueSky is not there yet, federated services are inherently very easy to scrape.

Maybe it’s time for people to understand that anything they post/vote/comment/like should be considered public domain.

permalink

report

parent

[ - ]

Scipitie@lemmy.dbzer0.com

5 points

22 days ago

That would always by definition block all third parties.

Think of the reddit example from the person you replied to: there was a huge outcry when reddit announced shutting down their lower API tiers.

Either information is free to flow or not at all, there is no middle ground.

With that in mind: I’m sure they thought about it and decided to prioritize transparency she flexibility over security. Personally I support that decision.

permalink

report

parent

[ - ]

DuckWrangler9000@lemmy.worldOP

-4 points

22 days ago

I know how APIs on reddit work, but you can block people who misuse the API if they’re doing something nefarious. Some of these AI are in my honest opinion very taxing on hardware. Having to retrieve millions of posts, comments, pictures, text, on demand… and send that to who knows where for AI scraping… Sounds very costly.

permalink

report

parent

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

Community stats

15K
Monthly active users
6.7K
Posts
153K
Comments

Our Rules

Approved Bots

Community stats

Community moderators