Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web(www.theverge.com)

posted 6 months ago

some_guy@lemmy.sdf.org

technology@lemmy.world

139 commentshide report

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

sugar_in_your_tea@sh.itjust.works

3 points

6 months ago

Yes, it kind of is. A search engine just looks for keywords and links, and that’s all it retains after crawling a site. It’s not producing any derivative works, it’s merely looking up an index of keywords to find matches.

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues. Whether a particular generated result violates copyright depends on the license of the works it’s based on and how much of those works it uses. So it’s complicated, but there’s very much a copyright argument there.

permalink

report

parent

[ - ]

Halosheep@lemm.ee

7 points

6 months ago

My brain also takes information and creates derivative works from it.

Shit, am I also a data thief?

permalink

report

parent

[ - ]

sugar_in_your_tea@sh.itjust.works

2 points

6 months ago

That depends, do you copy verbatim? Or do you process and understand concepts, and then create new works based on that understanding? If you copy verbatim, that’s plagiarism and you’re a thief. If you create your own answer, it’s not.

Current AI doesn’t actually “understand” anything, and “learning” is just grabbing input data. If you ask it a question, it’s not understanding anything, it just matches search terms to the part of the training data that matches, and regurgitates a mix of it, and usually omits the sources. That’s it.

It’s a tricky line in journalism since so much of it is borrowed, and it’s likewise tricky w/ AI, but the main difference IMO is attribution, good journalists cite sources, AI rarely does.

permalink

report

parent

[ - ]

TheRealKuni@lemmy.world

6 points

6 months ago

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues.

Derivative works are not copyright infringement. If LLMs are spitting out exact copies, or near-enough-to-exact copies, that’s one thing. But as you said, the whole point is to generate derivative works.

permalink

report

parent

[ - ]

sugar_in_your_tea@sh.itjust.works

2 points

6 months ago

Derivative works are not copyright infringement

They absolutely are, unless it’s covered by “fair use.” A “derivative work” doesn’t mean you created something that’s inspired by a work, but that you’ve modified the the work and then distributed the modified version.

permalink

report

parent

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

Community stats

15K
Monthly active users
6.7K
Posts
153K
Comments

Our Rules

Approved Bots

Community stats

Community moderators