feddit.org

Local All Communities Log in Sign up

Local All Communities

1.6K

Make illegally trained LLMs public domain as punishment(www.theregister.com)

posted 9 days ago

by

🃏Joker@sh.itjust.works

in

technology@lemmy.world

It’s all made from our data, anyway, so it should be ours to use as we want

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ +- ]

NoForwardslashS@sopuli.xyz

3 points

9 days ago

But wouldn’t that mean making it open source, then it not functioning properly without the data while open, would prove that it is using a huge amount of unlicensed data?

Probably not “burden of proof in a court of law” prove though.

report

reply

[ +- ]

Bronzebeard@lemm.ee

8 points

9 days ago

Making it open source doesn’t change how it works. It doesn’t need the data after it’s been trained. Most of these AIs are just figuring out patterns to look for in the new data it comes across.

report

reply

[ +- ]

NoForwardslashS@sopuli.xyz

3 points

9 days ago

So you’re saying the data wouldn’t exist anywhere in the source code, but it would still be able to answer questions based on the data it has previously seen?

report

reply

[ +- ]

stephen01king@lemmy.zip

16 points

9 days ago

That is how LLM works, they don’t store the data as data, but as weight values.

report

reply

[ +- ]

NoForwardslashS@sopuli.xyz

1 point

9 days ago

So then why, if it were all open sourced, including the weights, would the AI be worthless? Surely having an identical but open source version, that would strip profitability from the original paid product.

report

reply

[ +- ]

Bronzebeard@lemm.ee

3 points

9 days ago

It wouldn’t be. It would still work. It just wouldn’t be exclusively available to the group that created it-any competitive advantage is lost.

But all of this ignores the real issue - you’re not really punishing the use of unauthorized data. Those who owned that data are still harmed by this.

report

reply

[ +- ]

stephen01king@lemmy.zip

2 points

9 days ago

It does discourages the use of unauthorised data. If stealing doesn’t give you competitive advantage, it’s not really worth the risk and cost of stealing it in the first place.

report

reply

Show more comments

Show more comments

Show more comments

Show more comments

Show more comments

[ +- ]

Bronzebeard@lemm.ee

1 point

8 days ago

Most AI are not built to answer questions. They’re designed to act as some kind of detection/filter heuristic to identify specific things about an input that leads to a desired output.

report

reply

[ +- ]

bloup@lemmy.sdf.org

2 points

9 days ago

*

in civil matters, the burden of proof is actually usually just preponderance of evidence and not beyond a reasonable doubt. in other words to win a lawsuit, you only need to have more compelling evidence than the other person.

report

reply

[ +- ]

just_another_person@lemmy.world

4 points

9 days ago

But you still have to have EVIDENCE. Not derivative evidence. The output of a model could be argued to be hearsay because it’s not direct evidence of originating content, it’s derivative.

You’d have to have somebody backtrack generations of model data to even find snippets of something that defines copyright material, or a human actually saying “Yes, we definitely trained on unlicensed data”.

report

reply

[ +- ]

bloup@lemmy.sdf.org

3 points

9 days ago

so like I am not making any comment on anything but the legal system here. but it’s absolutely the case that you can win a lawsuit on purely circumstantial evidence if the defense is unable to produce a compelling alternative set of circumstances which can lead to the same outcome.

report

reply

Technology

!technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

@L4s@lemmy.world
@autotldr@lemmings.world
@PipedLinkBot@feddit.rocks
@wikibot@lemmy.world

Community stats

14K
Monthly active users
6.8K
Posts
157K
Comments

Community moderators

L3s@lemmy.world
L3s@fry.gs
L4sBot@fry.gsB
L4sBot@lemmy.worldB
enu@lemmy.world

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)