Today, a prominent child safety organization, Thorn, in partnership with a leading cloud-based AI solutions provider, Hive, announced the release of an AI model designed to flag unknown CSAM at upload. It’s the earliest AI technology striving to expose unreported CSAM at scale.

1 point

I think all CSAM should be destroyed out of respect for the victims, not proliferated. I don’t care who is hanging onto this material or for what purpose.

permalink
report
reply
15 points

How is this proliferating csam? Also, how do you expect them to find csam without having known images? It gives a really nice way to check based on hashes without having someone look at every picture on someone’s harddrive. With this AI it should greatly help determining new or unknown images while minimizing the number of actual people that have to see that stuff, and who get scarred from looking at such images. The only reason to be against this is if you are looking at CP and want it to be harder to find, or if you don’t understand how this technology is being used.

permalink
report
parent
reply
2 points

How is this proliferating csam?

Sharing it with people and companies that it wasn’t being shared with before.

Also, how do you expect them to find csam without having known images?

The same way it is now: people reporting it and undercover police accounts. People recognise it.

without having someone look at every picture on someone’s harddrive

If it’s going to get used as evidence in court a human will have to review and confirm it. I don’t think “Because the AI said so” is going to convince juries.

The only reason to be against this is if you are looking at CP

Or if it’s you or someone you love who is in the CP. Having further copies of it on further hard drives, whether it’s so someone can bake it into their AI tool or any other purpose is wrong. That’s just my view though.

permalink
report
parent
reply
2 points

Sorry I cannot post a longer response but I’d suggest you look up how this type of forensic software is developed and used. There are a few good documentaries on it if you look, one I remember watching was on googles team for this stuff.

The images are not exactly shared in that very few people have access to them, and they treat it very much like classified information so that only select people can see them.

These models would be developed using normal images and then trained in closed systems with the real images where the accuracy is used and not the images. No need to scar the developers who just want to work.

Nothing about the reporting of people will change, the only difference is this will allow the FBI to have a list of suspected CP and a list of normal images from a computer allowing them to spend a fraction of the time looking at this stuff to document it. This is very important when you have people who have literally terrabytes of the stuff and probably even more normal images. In general we like to minimize the time spent looking at such stuff because it is so scarring.

As for showing the images in court, in the US hashes are acceptable evidence, again we don’t like to scar people by showing them this stuff. Additionally after you’ve been shown the 100th picture of a baby being abused and the FBI is telling you they have 1000000 more, you’ll just take their word for it.

Anyways, hope you have a good one

permalink
report
parent
reply

At this point how does it differ w/ generating AI powered CP? morons

permalink
report
reply
4 points

It differs in basically being something completely different. This is a classification model, doesn’t have generative capabilities. Even if you were to get the model and it’s weights, and you tried to reverse engineer an “input” that it would classify as CP, it would most likely look like pure noise to you.

Moron

permalink
report
parent
reply

Generate porn, classificate output, result very young looking models.

Moron

permalink
report
parent
reply
0 points

So you need to have a model that generates CP to begin with. Flawless reasoning there.

Look, it’s clear you have no clue what you’re talking about. Stop demonstrating it, moron.

permalink
report
parent
reply
10 points

Uh, well this one tells you if an image looks like it or not. It doesn’t generate images

permalink
report
parent
reply

If it knows if an image looks like it it can generate something like it, one step further

permalink
report
parent
reply
1 point

Correct, this kind of software is trained on CP data. So such models can be easily used to generate CP instead of recognizing it, which makes them very dangerous indeed.

Same idea as the current models that are trained to recognized cars, these models can also be used to generate a car from noise as a starting poiint.

permalink
report
parent
reply
14 points
*

Jesus Christ. If someone ever got their hands on this model they could use it to generate new material. The grossest possible AI model to date

permalink
report
reply
27 points
*
Deleted by creator
permalink
report
parent
reply
6 points
*

A generative model uses the classifier as part of its training. If you generate a picture of pure random noise, then iteratively pick random noise that the classifier says “looks” more like csam, then you can effectively generate images that the classifier says it’s 100% certain is csam. Whether or not that looks anything like what a human would consider to be csam depends on other factors but it remains a possibility.

permalink
report
parent
reply
10 points
*
Deleted by creator
permalink
report
parent
reply
1 point

I thought being able to do that was already a thing. This is designed to do the opposite.

I know, I know… bad actors and such.

permalink
report
parent
reply
1 point
*

…but if simple posession defines who a bad actor is…

The irony of this never ceases to amaze me.

permalink
report
parent
reply
14 points

This seems like a potential actual good use of AI. Can’t have been much fun to train it though.

And is there any risk of people turning these kinds of models around and using them to generate images?

permalink
report
reply
24 points

If AI was reliable, maybe. MAYBE. But guess what? It turns out that “advanced autocomplete” does a shitty job of most things, and I bet false positives will be numerous.

permalink
report
parent
reply
1 point

“detect new or previously unreported CSAM and child sexual exploitation behavior (CSE), generating a risk score to make human decisions easier and faster.”

False positives don’t matter if they stick to the stated intended purpose of making it easier to detect CSAM manually.

permalink
report
parent
reply
10 points

if they stick to the stated intended purpose

They never do.

permalink
report
parent
reply
11 points
*

The problem is that they won’t.

Yes, AI tools, in the hands of skilled people, can be very helpful.

But “AI” in capitalism doesn’t mean “more effective workers”, it means “fewer workers.” The issue isn’t technological so much as cultural. You fundamentally cannot convince an MBA not to try to automate away jobs.

(It’s not even a money thing; it’s about getting rid of all those pesky “workers rights” that workers like to bring with us)

permalink
report
parent
reply
12 points

This is not that kind of AI.

permalink
report
parent
reply
5 points

It’s possible to have a good AI system, but it takes millions of dollars and several thousand manhours to do, and most companies won’t put in the effort.

But, there should always be a human in the loop.

permalink
report
parent
reply
8 points

I think image generators in general work by iteratively changing random noise and checking it with a classifier, until the resulting image has a stronger and stronger finding of “cat” or “best quality” or “realistic”.

If this classifier provides fine grained descriptive attributes, that’s a nightmare. If it just detects yes or no, that’s probably fine.

permalink
report
parent
reply
14 points

And is there any risk of people turning these kinds of models around and using them to generate images?

There isn’t really much fundamental difference between an image detector and an image generator. The way image generators like stable diffusion work is essentially by generating a starting image that’s nothing but random static and telling the generator “find the cat that’s hidden in this noise.”

It’ll probably take a bit of work to rig this child porn detector up to generate images, but I could definitely imagine it happening. It’s going to make an already complicated philosophical debate even more complicated.

permalink
report
parent
reply
7 points

Nobody would have been looking directly at the source data. The FBI or whoever provides the dataset to approved groups, but after that you just say “use all the images in this folder” and it goes. But I don’t even know if they actually provide real full-resolution images, or just perceptual hashes, or downsampled images.

And while it’s possible to use the dataset to generate new images assuming the training data had full-res images, like I said, I know they investigate the people making the request before allowing access. And access is probably supervised and audited.

permalink
report
parent
reply
3 points

Available image generators are already capable of generating those images and they weren’t even trained on it. Once a neural network can detect/generate two separate concepts, it can detect/generate the overlap. It won’t be as fine-tuned obviously, but can still turn out scarily accurate.

permalink
report
parent
reply
0 points
*

me
no
rikey

permalink
report
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 15K

    Monthly active users

  • 6.8K

    Posts

  • 154K

    Comments